CYBERNETICS AND CONTROL THEORY
Unknown
Submitted 1965-01-01 | RussiaRxiv: ru-196501.15012 | Translated from Russian

Full Text

CYBERNETICS AND CONTROL THEORY

K. I. KURBAKOV

AN ADDRESSING METHOD USING COMPRESSED WORD CODES AS MEMORY ADDRESSES

(Presented by Academician V. M. Glushkov, January 19, 1965)

In papers \((^{1-3})\) the possibility is investigated of directly transforming the initial information into an address of a storage device (SD) with arbitrary access to any of its cells. The present paper also belongs to this line of work.

In \((^4)\) a method is described for compressing the codes of words of an initial dictionary (text), which consists in the fact that the code of each subsequent letter of a word* is shifted relative to the code of the preceding letter toward the higher-order positions by one position and is summed modulo 2 in each position; here the letter codes are chosen with allowance for the probabilistic-statistical characteristics in the language. As a result of such a transformation, a compressed word code of \(n\) positions is obtained.

In order to reduce \(n\) for long words whose compressed word-code length is greater than some prescribed value, the codes of the terminal letters of the word may be added modulo 2 in each position without shifting relative to some preceding letter, i.e., the shift is performed \(k\) times:

\[ k_{\phi}=n-m, \tag{1} \]

where \(k_{\phi}\) is the fixed number of shifts.

For example, let \(n=11\) binary positions, and let the codes of the letters correspond to those code combinations \(m\) that are given below in the example; then compression of the word ГАЗЕТА will proceed as follows:

\[ \begin{array}{rrrrrrrrrrrl} & & & 1&1&1&1&0&0&0&0 & \mathrm{A}\\ & & & 1&1&0&1&1&0&0&0 & \mathrm{T}\\ + & & & 0&0&0&1&0&1&1&1 & \mathrm{E}\\ (\text{mod. }2) & & & 0&1&1&0&0&0&1&1 & \mathrm{Z}\\ & & & 1&1&1&1&0&0&0&0 & \mathrm{A}\\ & & & &0&1&0&0&0&1&1&1 & \Gamma\\ \cline{4-11} & & & 0&0&1&1&1&0&1&0&0&1&1 \end{array} \qquad \begin{array}{c} L\\ \uparrow\\ n\leftarrow \end{array} \]

Thus, with \(n=11\) binary positions and \(m=8\) binary positions for the word ГАЗЕТА of length \(L=6\) letters, it is necessary to make only three shifts relative to the first letter of this word; the remaining letters (T and A) are added modulo 2 in each position without shifting relative to the letter shifted last (E).

To resolve the ambiguity of compression that arises in the process of letter-by-letter compression of information by the method described, various distinguishing features may be applied, determined directly from the initial word. The code of the word’s features is generated in the compression unit simultaneously with compression of the word code.

* By a word of the initial dictionary is meant a certain set of letters bounded by a space, and by a letter, any alphabet symbol in the Markov sense.

In the process of word-by-word coding by the method described, word-by-word compression of information is carried out, and as a result of this transformation a random number is produced which can be used as the address of the source word. The main purpose of coding the symbols of the alphabet with allowance for the probabilistic-statistical characteristics of the dictionary (text), and of applying this type of word-code compression, is that it is necessary to obtain the most uniform distribution of compressed word codes in some specified interval

\[ M = 0 - 2^n, \tag{2} \]

where \(M\) is the number of addresses in a particular storage device, \(n\) is the number of binary digits in the compressed word code, and to use the compressed word codes as addresses to a random-access memory.

The essence of the addressing method that uses compressed word codes as memory addresses is as follows (5). The code of the transformed word, obtained at the output of the compression unit or produced programmatically, consists of two parts: a) \(n_i\), the compressed word code, i.e., the address of the source word; b) \(t_i\), the code of the features of the source word.

The complete code of the transformed word \((n_i + t_i)\) uniquely characterizes any word in the entire set \(N\) of source words used (the dictionary), while

\[ 2^n \geq N,\qquad 2^{n+t} \gg N. \tag{3} \]

The set of transformed words \(N'\) is represented by two types of words: a) words with an address \(n_i\) that is not repeated for the given dictionary, and b) words with a \(k\)-fold repetition of the address \(n_i\) for the given dictionary, i.e., there are groups of ambiguous compressed word codes.

The words of any group of compression ambiguity \(\eta_i\) (\(i = 1, 2, \ldots, k\), where \(k\) is the last word in the group) are distinguished from one another by the word features \(t\). All words within each group of compression ambiguity are linked by ordinary readdressing. In this connection, for each address \(n_i\) a readdressing code \(\gamma_i\;(\equiv n_{i+j})\) is stored, pointing to the next word of the same group of compression ambiguity.

Let us briefly consider the search algorithm in an automatic dictionary, which is quite simply implemented programmatically on existing computers. After the transformed code of the source word \((n_i + t_i)\) has been created, access to a storage device with random access to each memory cell is made by the address code \(n_i\), i.e., by the part of the transformed code of the source word. As a result, at this address a “number” is read, which consists of three parts \(\alpha\), \(\beta\), and \(\gamma\), where \(\alpha_i\;(\equiv n_i)\) is information (I) associated with the source word (for example, the equivalent of translating a word from one language into another, together with the accompanying grammatical and similar information); \(\beta_i\;(\equiv t)\) is the code of the features of the source word; \(\gamma_i\;(\equiv n_{i+j})\) is the readdressing code, or the code used to change the address \(n_i\).

Before the information at the requested address is issued, the features of the source word are compared with the features of the word stored in the storage device and found at the address \(\alpha_i\;(\equiv n_i)\). If the features coincide \((t_{\text{src}, i} = t_{\text{SD}, i})\), the value \(I_{\alpha_i}\) is issued. If, however, \(t_{\text{src}, i} \ne t_{\text{SD}, i}\), the output of the value \(I_{\alpha}\) is blocked and readdressing is carried out to the address indicated in \(\gamma_i\;(\equiv n_{i+j})\). The process of comparing the word features within the \(\eta_i\)-th group of ambiguity in the compression of word codes continues until an equivalent to the source word* is found or until no

* By \(\alpha_i\;(\equiv n_i)\), when \(t_{\text{src}} = t_{\text{SD}}\), is also meant an address reference to some part of the internal storage device or to external storage devices (for example, storage units of the magnetic-tape, disk, and drum type).

the signal “There is no such word in the dictionary.” The indication that the source word is not in the dictionary is the absence of a code (a zero group or another special indication) in the readdressing positions \(\gamma_{i=k}(=0)\) of the last (\(k\)-th) word of the compression-ambiguity group (the group of words with readdressing).

The average time for searching for one word (\(T_{\mathrm{avg.s}}\)) in the machine dictionary by the compressed code of the source word for the addressing method under consideration is determined by the average length of the compression-ambiguity group \(\eta_i\) and, in general form, is determined as the ratio of the total number of cycles*, expended on searching for all the words of the dictionary (without readdressing, with one, two, etc., readdressings), to the number of all words in the dictionary, i.e.

\[ T_{\mathrm{avg.s}}= \left[\sum_{1}^{\eta_k}\frac{\eta_i(\eta_i+1)}{2\eta_i!}\right] \left[\sum_{1}^{\eta_k}\frac{\eta_i}{\eta_i!}\right]. \tag{4} \]

Since the ratio \(\frac{1}{\eta_i!}\) for \(\eta_k=10\) is practically small, then, limiting ourselves to the value \(\eta_{i=1,2,\ldots,k}=1,2,\ldots,10\), we obtain

\[ T_{\mathrm{avg.s}}=1.500 \text{ cycles.} \]

Using compressed word codes as memory addresses makes it possible to significantly reduce the average time for searching for one word in the dictionary in comparison with traditional search methods. This is achieved because, in the method under consideration, dictionary search is reduced mainly to accessing the dictionary by unambiguously compressed words (addresses), while the search within the ambiguity of word compression (i.e., with readdressing), on average for the entire dictionary and for the method of converting words into memory addresses under consideration, is insignificant.

Fig. 1. Distribution of ambiguity in the compression of word codes.

Fig. 1. Distribution of ambiguity in the compression of word codes.
\(1\) — \(n=n_{\max}=30\) binary positions;
\(2\) — \(n_{\mathrm{opt}}=12\) binary positions;
\(3\) — \(n=n_{\min}=8\) binary positions

Figure 1 shows the distribution of compression ambiguity for an almost optimal variant of letter coding, with a letter-code length \(m=8\) binary positions and a dictionary size \(N=3000\) word forms, for three fixed compression points of the codes of the source-dictionary words:

\[ 1)\ 2^{n\max}\gg N;\quad 2)\ 2^{n\mathrm{opt}}\geq N;\quad 3)\ 2^{n\min}\ll N. \tag{5} \]

As is seen from the figure, for case 2), \(80.4\%\) of all words of dictionary \(N\) are unambiguously converted into an address to the storage unit, and only \(19.6\%\) fall into compression ambiguity. For case 2), \(T_{\mathrm{avg.s}}=1.233\) cycles.

* By a dictionary-access or search cycle, in the general case, is understood the aggregate of arithmetic and logical operations necessary for one identification (one comparison of a given word code with the code of some word in the machine dictionary).

The total time required to search for one word in a machine dictionary by this method is also small in comparison with \(T_{\text{total}}\) in traditional methods of dictionary search (for example, the method of search by dividing the dictionary in half, the method using separators, etc.) and is equal to

\[ T_{\text{total}} = T_{\text{comp}} + T_{\text{av. s}}, \tag{6} \]

where \(T_{\text{comp}}\) is the time for compressing the word code (the time for converting the code of the source word into an address), which is on average equal to 8–9 shift operations and 8–9 modulo-2 addition operations, if this transformation is carried out by a program method and if the average length of a word form is taken to be \(L_{\text{av}} = 8\text{–}9\) letters.

Received
24 XII 1964

REFERENCES

  1. W. W. Peterson, IBM J. Res. and Developm., 1, No. 2, 130 (1957).
  2. L. R. Johnson, Comm. ACM, 4, No. 5, 218 (1961).
  3. G. S. Chaitin, N. Ravner, IBM J. Res. and Developm., 7, No. 2, 121 (1963).
  4. R. V. Smirnov, K. I. Kurbakov, Author’s Certificate No. 149264, 1961; Bulletin of Inventions, No. 15 (1962).
  5. K. I. Kurbakov, R. V. Smirnov, Author’s Certificate No. 153800, 1962; Bulletin of Inventions, No. 7 (1963).

Submission history

CYBERNETICS AND CONTROL THEORY