WO1988002144A1 - Arrangement for data compression - Google Patents

Arrangement for data compression Download PDF

Info

Publication number
WO1988002144A1
WO1988002144A1 PCT/SE1987/000406 SE8700406W WO8802144A1 WO 1988002144 A1 WO1988002144 A1 WO 1988002144A1 SE 8700406 W SE8700406 W SE 8700406W WO 8802144 A1 WO8802144 A1 WO 8802144A1
Authority
WO
WIPO (PCT)
Prior art keywords
link
code
value
current
word
Prior art date
Application number
PCT/SE1987/000406
Other languages
French (fr)
Inventor
Claes Håkan JEPPSSON
Tina Helen Jeppsson
Martin Vilhelm Ivan Jeppsson
Original Assignee
Inventronic Data Systems Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US06/905,237 external-priority patent/US4782325A/en
Application filed by Inventronic Data Systems Ab filed Critical Inventronic Data Systems Ab
Priority to NO881232A priority Critical patent/NO881232D0/en
Publication of WO1988002144A1 publication Critical patent/WO1988002144A1/en

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/22Arrangements for sorting or merging computer data on continuous record carriers, e.g. tape, drum, disc

Abstract

System (i) for putting in store an Entity Key (EK) set; (ii) for providing a response to a subsequent encoded input of a particular EK (INPUT SEQUENCE OF OCTET I/O CODE-WORDS REPRESENTING THE EK) - such response being an encoded output of a matching/non-matching condition, a pointer value and/or an EK Sequence number (EKS); and/or (iii) for providing a response to a subsequent encoded input of a particular EKS - such response being an output sequence of I/O code-words representing the corresponding particular EK. The system comprises (i) code conversion means (46) transforming the current EK input into a number of linked digital words (L1 through LZ), which thereby attain individual current link values and jointly a current overall link value representing the EK input in a compact form, compaction being performed using various means (e.g. BIT MAP MEMORY MEANS); (ii) a mandatory first link memory unit (12) holding previously stored EK related first link values (L1); (iii) a non-mandatory intermediate link memory unit (58) holding previously stored EK related intermediate link values (L2 and L3); (iv) a mandatory main link memory unit (14), holding an ordered set of previously stored main link values (L4), each member being related to at least one member of the EK set; and (v) a non-mandatory additional link memory unit (50) holding previously stored EK related additional link values, if any, (L5 through LZ). Use of various data compression techniques, particularly within input code conversion means, facilitates compact storage of large EK sets and contributes to very short response times in search operations, but also to fast sorting at input of additional, previously not stored, Entity Keys. Hence, searching and sorting speeds are reduced, e.g. in data base applications, EKS code-words may also be applied to represent data base file segments per se, e.g. words and/or phrases of a text file, for compact file storage and for fast file transfer.

Description

ARRANGEMENT FOR DATA COMPRESSION
A. DESCRIPTION
1. Definitions of Terms as used herein
(a) General (b) User Environment (c) System Environment (a) ATTRIBUTE: An inherent characteristic; ELEMENT: A distinct irreducable something; ENTITY: A distinct something posessing unique qualities because of its attributes; SET: A number of distinct 'somethings' that belong together; MEMBER: A distinct something that belongs to a set; ORDERED SET: A set having all its members arranged in a distinct sequential order; ITEM: A member of an ordered set; ORDINAL: An item sequence number; BIT OF INFORMATION: A unit of information equivalent to the result of a choice between two equally probable alternatives.
(b) SYMBOL: A visual symbolic representation of something; CODE SYMBOL: A member of a symbol code; SYMBOL CODE: A set of graphic and/or control code symbols designed for the purpose of graphic recording of data; LETTER:
Anyone of several graphic code symbols that combine to form graphic words,. such as textual and lexical words; NUMERAL: Anyone of several graphic code symbols that combine to form numbers; BINARY NUMERAL: Anyone of the two numerals 0 and 1 that combine to form binary numbers; DECIMAL NUMERAL: Anyone of the ten numerals 0,1,2,3,4,5,6,7,8 and 9 that combine to form decimal numbers; HEXADECIMAL NUMERAL: Anyone of the sixteen numerals 0 ,1 ,2,3,4,5,6,7,8,9,A,B,C,D,E and F that combine to form hexadecimal numbers - for clarity a radix indicator 'X' may be appended to numerals and numbers, e.g. Fx, FFx; ALPHADECIMAL NUMERAL: Anyone of the thirty-two numerals 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,G,H,I,J,K,L,M,N,P,Q,R,S,T,U,V AND W that combine to form alphadecimal numbers, for clarity a radix indicator 'α' may be appended to numerals and numbers, e.g. Wα, WWα; SIGN: Any other graphic code symbol used with letters and/or numerals for the purpose of graphic data recording, e.g. punctuation mark; CONTROL CODE SYMBOL: A code symbol representing a control function, e.g. an instruction to a printer to execute line feed, new page etc.; SP or SPACE: A control code symbol representing the control function of generating a separation between recorded code symbols; NUL or NULL: A control code symbol representing a nullity; RECORD: A graphic recording of data based upon a symbol code; SEGMENT: A sequence of code symbols constituting a record of a text segment, a data program segment or a data aggregate segment, e.g. a textual word or phrase of a running text; FILE: A series of segments related to each other and ordered in a distinct sequence; TEXT FILE: A file containing segments forming a running text; PROGRAM FILE: A file containing segments forming a data program; DATA FILE: A file containing segments forming a data aggregate; FREQUENT SEGMENT: A segment occurring frequently within a particular class of files; UNFREQUENT SEGMENT: A segment occurring unfrequently within a particular class of files; REFERENCE SEGMENT: A frequent segment selected as a reference; RSS: A reference segment sequence-number within a set of reference segments; ENTITY RECORD: A record about an entity; ENTITY RECORD SET: A set of entity records; EK (ENTITY KEY): an identifier of a concrete entity record within an entity record set, the entity key being identical to such entity record or a part of a larger such entity record that also contains further data; EKS: An entity key sequence-number, i.e. an EK ordinal within an EK set; EK ATTRIBUTE: An attribute distinguishing one or several members of a set of entity keys; EK ATTRIBUTE SET: A selected set of EK attributes, e.g. a set of textual word suffixes; I/O: Input to and/or output from a data system; I/O CODE: A set of relations between I/O code symbols and I/O code-words (see below), facilitating a user/system interface. (c) SYSTEM: Interacting arrangements including one or several means described herein; I/O CODE-WORD: A distinct system representation of an I/O code symbol in accordance with a specific I/O code; I/O CODE-WORD VALUE: A distinct numerical quantity denoted by symbol and name using e.g. binary, decimal or hexadecimal notation (Ex.: Using as I/O code the Extended American Standard Code for Information Interchange, the I/O code symbol '4', representing the decimal numeral named 'Four', would correspond to an I/O code-word with a binary value of '00110100', equivalent to a decimal value of '52' named 'Fiftytwo' and equivalent to a hexadecimal value '34x' named 'Thirtyfour HEX'; STORE (the verb): To put in store - not used in the context of 'hold in store', which implies that the term 'previously stored value' means a value that is still being held in store; The term 'store' and the term 'put in store', as used herein, does not exclude the context of storing data within read only memory devices, e.g. in a mass production process; STORING: An act of putting in store; STORAGE: A state of being held in store, also a place for holding in store; ENTITY DATA SET: A system representation of an entity record set; EK SYSTEM: An entity key system, i.e. a system for holding an EK set in store, facilitating retrieval of related entity records by a user and/or retrieval of related entity key representations for internal use within the system; COMPRESSED FILE: A compact file representation being held in store or transferred between separate locations; FILE HANDLING SYSTEM: A system for storage and transfer of file representations, e.g. storage and transfer of compressed files; DIGIT: a system representation of a numeral, the numerical value of which is being held within the system, either in a permanent read only memory or in an erasable read/write memory; BIT: A binary digit, i.e. a digit from a radix two number-system based on the two elements 0 and 1, hence a binary digit requires a two-state memory cell for storage; BIT OF STORAGE: A unit of storage capacity representing a capacity to hold one binary digit; BYTE (OF STORAGE): A unit of storage capacity representing a capacity to hold eigth binary digits; BYTE LOCATION: A physical storage device having a storage capacity of one byte; TERNARY DIGIT: A radix three digit, requiring a three-state memory cell for storage; NIBBLE: A radix 16 digit, requiring four bits of storage; ALPHIT: An alphadecimal digit, i.e. a radix 32 digit, requiring five bits of storage (the name alphit was chosen because a set of 32 alphits is enough to represent a complete set of lower or upper case letters of most alphabets currently in use); OCTET: A radix 256 digit, requiring one byte of storage; DIGITAL WORD: A sequence of digits being held within a functionally contiguous string of memory cells or being transferred from one location to another, whenever being referred to or shown in symbolic form the leftmost digit position of the sequence or string is supposed to hold the most significant digit; LINKED DIGITAL WORD (CONCATENATED WORD): Two or more separate digital words having been functionally combined into one contiguous sequence; VALUE: A distinct numerical quantity uniquely representing a digital word, the term value being used to denote anything that is being held within a string of memory cells or being transferred from one location to another, such value being denoted by symbol and name using e.g. binary, decimal, hexadecimal or alphadecimal notation; OVERALL VALUE: A value of a linked digital word; CODE-WORD: A digital word representing a member of a code; CODE-WORD SET: A set of fixed length or variable length code-words representing all members of a code; ORDINAL CODE-WORD: A sequence-number representation of a member of an ordered set; CODE-WORD VALUE: A value denoting a code-word; COMPACTION CODE: An system code facilitating conversion between one code-word set and another more compact, i.e. less redundant, code-word set representing the same code members; COMPACT CODE: A low redundancy code created by compaction code conversion of a non compact code; COMPACT CODE-WORD: A digital word representing a member of a compact code; LINK: A storage device in a chain of functionally linked storage devices; LINK VALUE: A value denoting the content of a link; ADDRESS WORD: A digital word representing an absolute or a relative address; ADDRESS VALUE: A value quantifying an absolute or a relative address; ADDRESS COMPONENT: A digital word used in combination with other address components for assembling an address word; ADDRESS COMPONENT VALUE: A value denoting an address component, i.e. a value used in combination with other address component values for computing an address value; TABLE: A read/write or a read only memory area functionally arranged in rows of one or more columns for storing values; BOUNDARY ADDRESS TABLE (FIRST LINK MEMORY UNIT): A table generating as an output one or several boundary addresses in response to an input of a first link value, the table also generating, if required, in response to input of a boundary address, an output of a relevant first link value as well as other boundary addresses related to such first link value, if any; REDUNDANCY: Unused capacity of a storage location or of a set of code-words, i.e. the number of possible values not being used; REDUNDANCY RATIO: The ratio of the number of values not being used to the number of values being used (Ex. : A byte of storage used to hold nothing but one nibble. If each one of sixteen possible nibble values are being used, the redundancy ratio is (256-16)/16=15). Some terms defined above in user environment are sometimes used also in system environment in the sense of a corresponding system representation, e.g. RECORD, SEGMENT, FILE, ENTITY KEY, EK ATTRIBUTE, EKS and RSS.
2. Introduction and background
The present invention relates to processes and apparatus for storage and transfer of data in computer systems and more particularly to arrangements for storage and transfer of compressed data. Two related arrangements, for use in combination or separately, facilitate compact storage of entity key sets for fast retrieval of data related to individual entity key inputs. Further, an entity key system may be used as a tool for compact storage as well as fast transfer of complete entity data sets. A file handling system may use any one or a combination of the two arrangements to provide non-redundant sets of entity key code- /ords to represent file segments, thereby enabling compact storage and fast transfer of files of various kinds. Known methods for achieving a compact data system storage of an entity key set include front-end and rear-end compression techniques, explained by C. J. Date in the third edition of 'An Introduction to Database Systems', Addison-Wesley, page 51-52, and by James Martin in the second edition of 'Computer Data-Base Organization', Prentice-Hall, page 517-526. Sofisticated word processing programs are using similar methods to access and compress a set of textual words, for use as a correct-spelling reference. Such programs typically use 150 000 bytes of storage to hold 50 000 different textual words and the time required to scan a text file representing an ordinary page of text amounts to several seconds.
3. Disclosure of Invention
A first type of entity key system according to the present invention produces an input response in form of an indication whether an individual entity key that is being input is identical to any member of a set of entity keys that is held in store. An example is the use of an entity key system as a correct-spelling reference, where the set of entity keys that is held in store is a set of frequently occurring textual words and the entity key input is an individual textual word from a running text that is being tested. Storage of 50 000 different textual words would require less than 150 000 bytes of storage and the time required to scan a text file would be substantially shorter than the time mentioned above referring to a typical word prosessing application.
A second type of entity key system according to the invention produces an input response in form of either (i) an EKS code-word, i.e. an Entity Key Sequence-number representation that uniquely identifies an individual entity key within an entity key set or (ii) a reject signal in case an invalid entity key has been input. This second type of entity key system can, for example, serve as a vocabulary reference, providing EKS representations of common textual words and phrases. The purpose with this use of the entity key system is to provide compact system representations of all or most segments of any file, each such compact system representation being an encoded EKS, for such use referred to as a Reference Segment Sequence-number code-word or RSS code-word. A substantial compression of text files can be achieved by substituting relevant RSS code-words for frequently occurring text segments, such as common words and phrases. The degree of compression may be enhanced by applying variable length code-words to represent more or less frequently occurring segments as well as more or less frequently occurring individual symbols of unfrequently occurring segments. When regenerating original text, the code-words are decoded to form relevant text segments, i.e. textual words and phrases.
A third type of entity key system according to the invention produces an input response in form of either (i) a pointer address uniquely related to the entity key being input or (ii) a reject signal in case an invalid entity key has been input. Any data associated with the entity key may be transferred to and/or output from an external device, the location within such a device being found by use of the pointer address. An example of use of this third type of entity key system would be as a tool to provide fast access to the entity records of a complete computer based encyclopedia. Each entry is assumed to be a lexical word, hence lexical words are being used as entity keys in this case.
Each one of the three system types mentioned above may also be operating in an entity key input state, facilitating storage of entity keys being input, thereby of necessity performing a sorting operation.
The efficiency of data compression arrangements constructed and operating according to the present invention may be enhanced by employment of conventional Huffman coding techniques. In 1952 David A. Huffman's paper 'A Method for the construction of Minimum-Redundancy Codes' was published in Proc. IRE, 40(9). More recently the Huffman coding technique was explained by Gilbert Held in a Wiley Heyden Publication 'Data Compression', copywright 1983 and reprinted 1984.
4. Brief Descriptions of the Drawings (a) FIG. 1A - 1F illustrate different basic features of entity key system embodiments using unsophisticated input means and only two linked digital words for interfacing input means with memory means.
- FIG. 1A represents a case where the EK input sequence is 'abc'.
- FIG. 1B shows how a previously not stored EK is being stored. - FIG. 1C illustrates the reversed process of outputting an EK I/O code-word representation in response to an input of an EKS, i.e. a particular EK sequence number.
- FIG. 1D - 1F relate to EK systems having means for outputting pointer addresses in response to EK I/O code-word inputs. The embodiment shown in FIG.- 1D is operating in an EK matching mode and the one shown in FIG. 1E & 1F in an EK non-matching mode.
(b) FIG. 2A - 2C illustrate three different coding schemes for code conversion from standard I/O codes to compact codes. FIG. 2A shows conversion to five-bit alphit code-words, FIG. 2B to four-bit nibble code-words and FIG. 2C to variable length code-words. These coding schemes are also applicable for retransforming compact code-word sequences into standard I/O code-word outputs. (c) FIG. 3A - 3C relate to EK system embodiments with input means employing code conversion to 5-bit alphit code-words.
- FIG. 3A shows an embodiment holding 11 different EKs in store.
The input sequence is 'etch', a 32 bit sequence compacted to a 20 bit compact code representation.
- FIG. 3B illustrates that the embodiment of FIG. 3A may be used to store the compounded word 'et cetera'. The figure also illustrates that an EK containing more symbols than can be represented still may produce a unique overall link value. - FIG. 3C shows an embodiment based on bit map memory means. The input sequence is 'etcetera'. The bit map holds a set of marked bits, representing the same eleven EKs as shown in FIG. 3A.
(d) FIG. 4A & 4B illustrate means for facilitating a proper sorting order when capital letters and compounded words are included. Code converion to alphit code-words is employed.
(e) FIG. 5A through 5C introduce the concept of additional links. Code conversion to alphit code-words is employed. FIG. 5B shows how a proper sorting order is maintained. FIG 5C illustrates the use of bit map memory means to remove redundancy within additional links. (f) FIG. 6A - 6D introduce the concept of intermediate links.Code conversion to alphit code-words is employed.
- FIG. 6A explains the conventions applied for designating variables and values.
- FIG. 6B & 6C show embodiment variants, both using as input the sequence 'etaonrih'. The two variants use two different principles for storing address component values within the intermediate link unit.
- FIG. 6D illustrates how an EK system employing bit map memory means is used as part of code conversion means in an EK system embodiment otherwise employing ordinary memory means. - FIG. 6E shows an embodiment coping with input sequences comprising up to fourteen symbols. Code conversion is employed in two steps.
(g) FIG. 7A - 7B relate to EK system embodiments with input means employing code conversion to 4-bit nibble code-words and variable length code-words, respectively. (h) FIG. 8A relates to a file handling system embodiment employing variable length code-words. 5. Detailed Description
With reference to FIG. 1A through 7B, the functioning of preferred embodiments of an EK system according to the invention will be explained by following in detail how responses to various inputs are produced. Specific EKs are assumed to have been input and stored beforehand as part of a large set of EKs. Such inputs may have been previously stored within read only memory devices in a mass production process or they may havebeen stored by inputting EKs into a working system. In the latter case the embodiment uses read/write type of memory devices and can be set in a mode for storing previously not stored EKs. Erasing stored EKs is an obvious reciprocal process, not further discussed herein.
As in most of the figures, FIG. 1A shows several strings of boxes, each one representing a string of storage locations. Such strings are drawn, either horizontally as within input means 10, or vertically as within memory units 12 & 14, always in an order of increasing address values from left to right or from top to bottom. Vertical strings 16 & 18 are placed underneath a horizontally drawn system bus 20. Address values locating particular boxes are increasing with increasing lengths of vertical lines, e.g. line 22 drawn between bus 20 and the top 24 of the box being located. Such a vertical line is pointing downwards if the box location is directly addressed and upwards to indicate that the address of the box location is registered as the result of a search operation. A value being read from a box is indicated by a line 26 coming out of the box and pointing at the bus. A search for a particular value within a string is illustrated by a vertical line 28, designated with the search argument value L2. Line 28 is related to a series of boxes within a particular searching range, each such box being marked along the left hand side of the string. Each test being executed is marked by an ordinal, '0' denoting the box first being tested for a match between the search argument L2 and the value L210 held in store within that box. A star '*' is used to mark a box that need not be examined because a match has already been found. A match at test number 2 is indicated by a short horizontal line crossing the left hand side of the box holding the matching value L212. In reality L2pq values are being read out of the numbered boxes one by one and transferred along line 28 for comparison with value L2 within control processor block 30.
A system of indexing is used in this specification to designate, in an orderly manner, address values, data values as well as tables holding such values. Mainly four index variables are used, p, q, r and s. The q, r and s indices are only used in their respective positions within the combinations pqrs, pqr and pq. Index variables may be exchanged for actual values using alphadecimal numerals. The symbol Z represents the ordinal number of the last item within a set . All four indices are ordinal indices, each one having values within a series 0.1.2...Z. Index p is the item sequence number within the set of L1P values related to the EK set representation being held in store. No L1P values are held in store as such, however, each L1P value is represented as a relative address of a storage location within string 16, each such location holding a related boundary address value B2P . The series of values from B2o to B2z+ determines the size of each particular table T2P . The B2P values have been assigned in ascending value order with increasing p values, with adequate steps to make each T2P table just large enough to hold all those individual main link L2Pq values that have previously been presented for storage concurrently with a valid L1P value. Index q is an item sequence number for each particular value of p, thus designating L2pq values individually within each table T2P. All L2pq values are held in one functionally consecutive string 18, within each table T2P in value ascending order with increasing address values. The series of composite pq numbers are thus forming a number series in ascending pq value order with ascending address values. Value L2oo is held at the lowest address B2o and value L2z7 at the highest adress B2z+-1. The B2z+ value is used to determine the end of the last table T2z. A plus or minus sign completing an index number designates the next higher or lower number in the index number series. For example, if L2p7 is held in store at the highest address within table T2P and the value of p is 1, then index p7+ is equivalent to pq=20.
The storage capacity of a box is shown by the width, i.e. 1/10 of an inch for each bit of storage capacity. Byte boxes are thus 8/10 of an inch wide. Broken frames are used to indicate that the storage capacity may be higher than the width shown.
FIG. 1A - 1F illustrate different basic features of system embodiments using unsophisticated input means and only two linked digital words for interfacing input means with memory means. Except for the reversed process shown in FIG 1C, input means are shown at the top of the figures in form of a block 10 including two short strings of storage locations, 32 & 34, string 32 for temporarily holding a representation of a current I/O codeword input and string 34 for temporarily holding two linked digital words. String 32 provides storage for octet values O0, O1 & O2, values directly obtained from three eight bit I/O code-words representing a three symbol EK input sequence. The values of the octets are directly transferred to the two linked digital words residing in the 16 plus 8 bit string 34. The 16 bit word attains the value L1=00*256+01 and the 8 bit word the value L2=O2. Let us assume that the I/O code used for input is the Extended American Standard Code for Information Interchange (EASCII), further that previously stored EKs comprise all 512 possible three letter combinations of the first eight lower case letters of the english alphabet. Link value L1 is used by the control processor 30 for producing address values to access the first link memory unit 12, wherefrom boundary address values B2P are being read out. The link value L2 is used for comparison with values held in store within the main link memory unit 14, within an address range determined by boundary address value pairs, B2P & B2P+. The control processor 30 is communicating with all other parts of the system via the system bus 20. Processor 30 is executing a control program for each input sequence and governs output of a response via bus 20, whereupon input means 10 are being reset, making them ready to receive a new input sequence. With the assumtions made there are 83=512 different EKs held in store, the storage of which has generated 82=64 different boundary address values B2o-B2z, where Z=63. These 64 values are the addresses of the first storage location 36 of 64 distinct tables T2o-T2z, where Z=63. Each such table is holding 8 values L2po- L2P7, where p ranges from 0 to 63, a total of 512 L2pq values representing 512 stored EKs.
FIG. 1A represents a case where the current EK input sequence is 'abc'. The decimal EASCII values representing the symbols 'abc' are 97,98 and 99, respectively. The symbols 'ab' are represented as L1=97*256+98=24930. Only one symbol pair, 'aa', that has generated a lower L1 value, hence index p assumes the value 0 for the pair 'aa' and the value 1 for the pair 'ab'. There are eight L2pq values held in store in table T21, i.e. 97 through 104 in decimal notation, the third one, decimal 99, matching the current main link value L2. As shown in the figure a match was obtained as a result of a search starting at the first storage location marked 0 within table T21, a location pointed at by a boundary address value B21, obtained from a two-byte storage location within table T1, at an off-set address governed by the current L1 value 24930. Said match was obtained in the eleventh byte storage location, counted from the lowest address B2o, and an EKS value of 10 is therefore output. The EKS value is an item sequence number within a series 0,1,2...511, numbers that uniquely represent the complement of 512 EKs having been previously input.
Within table T1 are also shown B2P values related to a group of L1P values from 25185 through 25188, L18 - L18 values that correspond to symbol pairs 'ba.bb.bc.bd'. Also shown is the highest L1p value 26728, corresponding to the symbol pair 'hh', and the last two B2z+ locations related to a not stored EK that would have generated an B2P value of 65535. All T1 locations relating to not stored EKs are holding the same B2 value as the one stored within a preceding T1 location. Index n is used in lieu of index p to emphasize that the value is not related to a valid EK. The designation L1n will be used later on as representing any possible L1 value, including unvalid ones. Apparently table T1 contains a lot of redundant information when used as in this example. When holding only 512 EKs in store, 257 byte of storage is required for each EK, provided a full length T1 table is used, having 65537 two byte rows. On the other hand, with such an embodiment very few computer instructions are required to obtain a response from the system.
The particular embodiment being described has a capacity to store all possible three-symbol combinations employing a 256 symbol code, i.e. the maximum possible in case of an octet code-word representation. The storing capacity would be 2563=16.8 million different EKs, requiring a three byte wide T1 table in lieu of the two byte wide table shown in FIG. 1A. Nevertheless, the amount of storage required by table T1 would be insignificant. In total 1.01 byte of storage would be required for each EK. On the other hand, the response time would increase as the search range within each T2P table would amount to 256 byte of storage. However, the response time could still be kept extremely short by searching such range using conventional binary search methods. A relevant notion is also that minimal response time requires all tables to be stored in fast access type of electronic memories, a requirement more easily met if a high degree of compression has been achieved.
FIG. 1B shows how a previously not stored EK is being stored. This figure represents the same embodiment as in FIG. 1A, but with the number of stored EKs having been reduced from 512 to 511. However, tables 16 & 18 will contain the same values as in FIG. 1A after storage of the EK sequence 'abb'.
Input of octet values representing 'abb', i.e. 97, 98 and 98, is generating a current L1 value of 24930. This value causes boundary addresses B21 and B22 to be read out of table T1. Processor 30 tests that B22 is greater than B21 and initiates a search for the current L2 value 98 within the T21 table, specified by said boundary addresses. Value L210=97 is read out of the first storage location 36 and found to be less than the current value L2=98. Next value L211=99 is found to be higher than 98 and the search is halted as all remaining L2pq values are even higher, having been stored in ascending value order. An EK reject signal 40 is output via bus 20 and the system user may command control processor 30, if not already done, to react upon such reject signal 40 executing an insert instruction sequence. Such sequence includes (i) moving to the next higher address each previously stored L2pq value having a pq index higher than 10, (ii) inserting the current L2 value within the storage location of the previously stored value L211 and (iii) increasing all B2P values by one unit for all p values greater than 1.
In case no EKs having 'ab' as their first two symbols had been previously stored, processor 30 would have found B22 to be equal to B21. No T21 table would exist and an EK reject signal 40 would be output based upon first link criteria. In such case the insert instruction sequence would include (i) moving to the next higher address each previously stored L2pq value having an address equal to B21 and higher, (ii) inserting the current L2 value within the storage location at address B21 and (iii) increasing all B2P values by one unit for all p values greater than 1.
To avoid a time consuming move of large number of values spare storage locations may be arranged at convenient intervals.
FIG. 1C illustrates the reversed process of outputting an EK I/O code- word representation in response to an input of an EKS, i.e. a particular EK sequence number. Processor 30 adds the current input EKS value 10, received via system bus 20, to the boundary address value B2o and reads from string 18 at the resulting address B20+10 a current main link value L2=99. In order to determine also the first link value L1 the T1 table 16 is searched for the highest B2P value that is equal to or lower than such resulting address. A binary search is employed in order to reduce the number of tests required to find such B2P value. After ten binary search tests, numbered 0 to 9 in the figure, a sequential search is started at B10+24928, because at this point the target location is close and must be found at an address lower than the one tested as number 8. At three sequential tests. A, B & C, values lower than B2o+10 are read out of table T1 and at number D B2o+16 is found to be higher, a value that is found at address B1o+L1+1. Processor 30 is now able to determine the current first link value L1=24930 by subtracting B1o+1 from the address found. Finally output means 42 transforms the current link values into a sequence of three I/O code-words. The first octet value O0 is computed as the integer part of 24930/256=97. The second octet value O1 is computed as 24930-00*256=98 and the third value O2 is equal to the current main link value 99. These three values represent the symbol sequence 'abc'.
FIG. 1D - 1F relate to EK systems having means for outputting pointer addresses in response to EK I/O code-word inputs. The embodiment shown in FIG. 1D is operating in an EK matching mode and the one shown in FIG. 1E & 1F in an EK non-matching mode. The term EK matching mode is used to emphasize the fact that any output from the system, such as pointer address output and EKS output, relates to an EK input that is matching a previously stored EK. The term EK non-matching mode is used to emphasize the fact that system output relates to any EK input generating overall link values also falling inbetween overall link values representing previously stored EKs. This latter mode is therefore in the following being referred to as a gateway mode of operation.
The embodiments shown in FIG. 1D - 1F are similar to the one in FIG. 1A, except that a second column 44 holding pointer addresses P2pq has been added to each T2P table, each such P2pq address vaflue being related to a particular first column L2pq value. In the gateway mode, however, each P2pq value is also related to any EK that, if being input, would cause L2pq values to be inserted inbetween such L2pq value and its closest neighbor L2pq+.
Any data associated with the EK may be transferred to and/or output from an external device, the location within such a device being found by use of the pointer value P2pq. Such data are supposed to be strictly related to specific EKs in the matching mode of operation, in gateway mode the system will relate EK groups to common external device addresses. It is also worth mentioning that at storage of previously not stored EKs, a new row is inserted within a relevant T2p table and L2pq/B2pq value pairs being held in rows at higher addresses have to be moved. A firm relationship between each main link value L2pq and its related pointer address value Ppq is thus maintained. However, memory space limitations may dictate that such second column 44 of table T2p is placed within a low cost external storage device, in which case the EKS code-word may be used as a pointer into such an external table.
For convenience a different method to present code-word values has been introduced in FIG 1D - 1F. Whenever input symbols appear inside a box or within brackets they should be interpreted as denoting the corresponding code-word value.
FIG. 1D illustrates how a pointer value P12 is found within column 44 of
Table T21 and read out for output via system bus 20. Manipulation of data associated with the EK 'abc' is facilitated using the pointer value for addressing.
FIG. 1E illustrates an embodiment identical with the one shown in FIG. 1D except that a pointer output response is facilitated also for an EK input sequence that has not been previously stored. Therefore this embodiment is operating in the gateway mode. The sequence 'aai' is used as input and generates a first link value that has been previously input as part of stored sequences, therefore the gateway mode of operation does not impact the functioning of the first link memory unit in this case. Upon a no match search in table T2o, the current L2 value [ i]is associated with a particular L209 value, being the highest possible but not higher than the value L2 itself. The figure uses the '>' sign to indicate that such condition is met at L207 and a double headed arrow is indicating that a related pointer address P07 is thereby found. Also L2 values searching table T21, being lower than L280, are being related to P07.
FIG. 1F shows an embodiment identical with the one shown in FIG. 1E except that the symbol sequence 'aia' is used as input. This input generates a first link value that has not been previously input as part of a stored sequences. A comparison being made between B2o for L1 and for L1+1 shows no difference as value B28 is being read out from string 16 for L1=24937 as well as for L1=24938. With the system operating in the gateway mode the control processor will then read out the pointer value P77 directly via system bus 20 from the second column 44 of T2P tables at an address B28-1. The same would happen for any L1 value from 24937 through 25184. The next value, L1=25185, is a value generated by previously stored symbol sequences, i.e. those beginning with the pair 'ba'. In the figure a double headed arrow indicates the location of the pointer address P77. A box with a pointer address Poo- has been added to cover L1 values below 24929.
The embodiments described so far are capable of storing only three-symbol EKs, which is a severe restriction in most applications. Conversion to compact codes facilitates accomodation of an increased number of code-words within string 32, whereby an increased number of input symbols per EK is allowed. A compact code representation has the additional advantage of reducing memory space requirements, which provides for shorter response times as more EK representations may be simultaneously held within a given working memory area. The compact representation leads to shorter response times also because fewer bits will become involved at each retrieval attempt. Furthermore, the use of compact codes may facilitate a shortening of string 32, which reduces first link memory requirements. A shortening by just one bit will cut in half the length of boundary address table 16.
FIG. 2A - 2C illustrate three different coding schemes for code conversion from standard I/O codes to compact codes, to be applied within input means 10. FIG. 2A shows conversion to five-bit alphit code-words, FIG. 2B to four-bit nibble code-words and FIG. 2C to variable length code-words. These coding schemes are also applicable for retransforming compact code-word sequences into standard I/O code-word outputs.
The alphit code of FIG. 2A is suitable for adaptation to sophisticated sorting order requirements. In FIG. 3A through 6E such code is used, illustrating various features, including sorting order features. Nibble codes as well as codes using variable length code-words are unusable in applications having specific sorting requirements. However, such codes have other qualities that make them suitable where sorting order is of no concern. In FIG. 7A and 7B such codes are used, emphasizing the importance of a high degree of compaction. The functioning of a. system embodiment including compact code conversion is basicly the same as explained with reference to FIG. 1A through 1F, once the link values are determined. The particular features illustrated in those figures are therefore applicable also to embodiments using compact codes. FIG. 3A shows an embodiment with an input sequence of four octet I/O code-words representing the EK 'etch'. Code conversion is employed, compacting the 32 bit input to a 20 bit compact code representation based on 5-bit alphit code-words. Code conversion means 46 are shown in the form of a look-up table 48, comprising a string of 256 alphit code-words having its first alphit location at address B0o. Alphit values An are red from table 48 in sequence from n=0 to n=3 at addresses BOo+On. The table below explains how the contents of the small string 32, wherein the succession of alphit values is temporarily held, is being transferred to string 34, temporarily holding the two linked digital words, here represented by their link values L1 and L2. Alphit values representing the code symbols are taken from the encoding scheme of FIG. 2A, shown in the table in alphadecimal notation as well as in the corresponding binary notation. The bit representation is shown as one contiguous succession placed within the middle row of the table. Link values are shown in hexadecimal notation.
Figure imgf000018_0001
The first link memory unit is block 12 in FIG. 3A. The T1 table inside comprises only 4097 rows in this case, as compared to 65537 rows in the embodiments of FIG. 1A through 1F. Control processor 30 reads B2p boundary address values from T1 storage locations, produced from the link value L1. For L1=4E0x B25 and B26 are read out, determining the search range within string 18, in which string eleven EKs are shown as previously stored. Respective L2pq values are shown in hexadecimal notation. EKs comprising less than four symbols have been allocated L2pq values under the assumption that each input sequence of octet I/O code-words has been filled up by adding code-words representing a nullity, i.e. NUL with an alphit value of 2. The main link value ECx, representing EK 'etch', is read out as the last item at the end of the searching range and found to match the current L2 value. As can be seen from the figure, those eleven previously stored EKs would be presented in proper sorting order if being output sequentially in EKS number order, using the technique of FIG. 1C. The long word 'etcetera' would be output as 'etce'.
FIG. 3B shows how an input of the previously not stored EK 'et cetera' would result in insertion of the current main link value L2=67x at an improper place, considering a desirable sorting order. A solution to this problem will be presented later with reference to FIG. 4B. For an explanation of the functioning at an insertion, reference is made to previous comments to FIG. 1B.
FIG. 3C shows an embodiment based on bit map memory means 52. It is presumed that the same eleven EKs as shown in FIG. 3A have been previously stored, one of them, i.e. 'etcetera', being used for current input. Input means 10, including code conversion means 46, are identical to corresponding means shown in FIG. 3A except for the small string 34 that has been increased in length. Section values, S1 through S8, are produced exactly as are link values. The new term section is introduced in lieu of the corresponding term link to facilitate description of EK systems wherein the same small string may hold section values as well as link values, section values being related to bit map memory means and link values to ordinary memory means. An example of such combined use will be discussed with reference to FIG. 6D. The embodiment shown in FIG. 3C has one section for each alphit, thereby making section values identical to alphit values, which facilitates use of just one set of storage locations to hold these values. In other words, the two short strings 32 & 34 may be replaced by one common string. A maximum of eight section values are accomodated for each EK, requiring a set of eight bit maps, i.e. tables T1 through T8. The sequences representing the eleven EKs vary in length from one to eight alphit values, however, in order to facilitate output of an EKS in response to an EK I/O code-word input, the code conversion means 46 adds to each such sequence an appropriate number of control function alphits to make them all eight alphits long. In order to achieve a proper sorting order such alphits has to be added after the alphits that represent the EK symbols. However, the embodiment shown in FIG. 3C illustrates a memory saving alternative, which is preferable in applications not demanding a proper sorting order. Control function alphits are added up front, whereby these alphits are represented only once within each bit map. Hence, the five symbols a,b,c,d and e are all represented whithin each one of tables T1 through T7 by just one bit marked as being in the 1 state. These marked bits are located at a relative bit address 02α within each table, i.e. the third bit position counted from right to left within the first row of each table. This bit address value corresponds to the alphit value 2α used to represent the system control function NUL.
Within table T8 a first group of 32 bit locations, at relative bit addresses 00α-0Wα, is used to represent previously stored S8 values related to such previously stored section values S1 through S7 that are represented by the lowest address bit mark within respective tables T1 through T7. Therefore, in FIG. 3C the first 32-bit group within table T8 is used to represent single letter symbol inputs. The five marked bits within this group represent the stored symbols 'a,b,c,d,e', e.g. the leftmost bit position within the first row, at relative bit address 07α, represents the symbol having a code-word value of 7α. This value represents the symbol 'c', as can be seen from the code table in FIG. 2A. It is interesting to note that any previously not stored single symbol EK may be added, just by marking the particular bit position within table T8 at a relative bit address equal to the symbol code-word value. The current EK input 'etcetera' generates eight alphit values that are placed within the small string 32, whereby current section values S1 through S8 are determined. These values are used to produce current relative bit location addresses within respective tables T1-T8. According to FIG. 3C each bit map table comprises a string of byte storage locations, the relative bit address of the least significant bit within each byte being indicated along the right hand side of each table using alphadecimal notation. The address of any other bit location is determined from the figure by adding 0-7 when counting from right to left. The values used for this purpose in the embodiment being described are designated S1
Figure imgf000020_0001
-S8
Figure imgf000020_0002
obtained by stripping off the three least significant bits from each of the values S1-S8. As can be seen from FIG. 2A the S1 value 9α equates 01001 in binary notation and S1
Figure imgf000020_0006
thus attains the value 001.
In table T1 the current relative byte address used for addressing is computed as S1/8=01. Shown in the figure is the corresponding relative bit address S1
Figure imgf000020_0004
B=S1-S1
Figure imgf000020_0003
=08α. The control processor reads the value M1o8 out of this byte location and produces a bit mark match as value S1
Figure imgf000020_0005
is found to point at bit position 001 which has a bit value 1. The match means that an S1 value equal to the current S1 value has been previously stored. For each of the succeeding tables, T2-T8, the current relative byte address is computed as S2/8+M1*4, S3/8+M2*4.....S8/8+M7*4, in Fig. 3C shown as the corresponding relative bit addresses S28 through S88, i.e. 1Pα, 10α, 18α, 2Pα, 38α , 4Gα and 50α respectively. The values M1 through M7 are produced as a count, within each table T1-T7, of the number of marked bits at bit location addresses lower than the current one. A corresponding M8 value equates the current EKS value used as output.
In order to facilitate a fast count of marked bits a mark-count table 54, also named TM table, is employed as shown in FIG. 3C. This table is a 4-bit wide look-up table using as an relative entry address the overall value of a group of eight bits, i.e. a byte value. The number of marked bits within the byte is read out of table 54 in one fast operation. On the right hand side of table 54 entry relative address values V are shown in hexadecimal notation.
In order to further speed up the counting process when many successive 8-bit groups have to be examined, an aggregate mark-count table 56 may be employed. Shown in FIG. 3C is such a table named table TM8, holding a series of aggregated mark-counts for groups of 32 bits. For example, the value M85 found at the relative byte storage address 5 represents the total marked bit count within the first five 32-bit groups of table T8. This particular value, M85=10, is read out as a first step to determine the EKS value related to the current EK 'etcetera'.
The second and last step to determine theEK S value is to produce a mark-count within the sixth 32-bit group of table T8, wherein a marked bit representing the last symbol 'a' has been found at relative bit address 55α. This marked bit is located within the first byte of the 32-bit group, therefore it only remains to test if there are any marked bits within this byte at bit addresses below 55α. The byte value M850 is read out and the three leftmost bit positions are masked and the remaining overall value which happens to become 00x is fed to the TM table 54 as a V80 entry value and M8o=0 is read out. The EKS value is determined asEK S=M85+M8o=10.
As the current EKS value is produced as a count of the number of marked bits within the T8 table at bit position addresses lower than the current one, EKS values are generated in such order that all single symbol EKs will appear, when presented in EKS order, as being sorted in proper alphabetic order. All two symbol EKs will next appear in sorted order, thereafter all three symbol EKs and so on. This is because, within each table succeeding the first one, groups of 32 bits have been allocated to represent previously stored EKs in an order governed by the count of marked bits within the immediately preceding table at bit location addresses below the bit location related to such 32 bit group. The current input EK 'etcetera', being the longest one within the set of elevenEK s, will appear as no. 10 within the series of EKS numbers 0-10. FIG. 4A & 4B refer to a modified embodiment, having means for facilitating a proper sorting order as follows: E, e, ET, Et, et, ETA, Eta, eta, ETC, Etc, etc, ETC., Etc., etc., et cetera, ETCETERA, Etcetera, etcetera, ETCH, Etch, etch. The modified embodiment differs from the one referred to in FIG. 3A & 3B by having an 8 bit wide TL0 look-up table 48 and a control program function that determines a Q value based on the values of the leftmost 3 bits shown in table 48. FIG. 4A & 4B provides the algorithm 49 to be used in order to make the overall link value such that a proper sorting order is obtained. This is achieved by including the Q value at the end of string 32, ignoring any single space or hyphen code-word when creating the preceding alphit value sequence. FIG. 5A through 5C refer to embodiments equivalent to the one shown in FIG. 3A, however, including also an additional link memory unit 50, for the purpose of fully and uniquely representing I/O code-word inputs of any length. Links L1 and L2 are mandatory links used for any input, L2 being the main link as in all above described embodiments. In FIG. 5A is shown the same set of eleven previously stored EKs as in FIG. 3A and the current EK input is 'etcetera'. In FIG. 5B the EK 'et cetera' has been added and in FIG. 5C previous and current input conditions are as in FIG. 5A.
An output in form of an EK match signal and/or an EKS value is generated if an overall link value matching condition has been produced. The EKS value is derived from the relative address of the main link storage location where the current main link value is found. In analogy with the embodiments of FIG. 1E & 1F, pointer address values may be output from a second column holding such values within each main link table T2p. In FIG. 5A code conversion means 46 append at the end of string 32 a terminator in form of a control code-word with an alphit value 1α .
The addition of 'et cetera' in FIG. 5B serves the purpose to explain how proper sorting order is achieved also when additional links are employed. To represent compounded EKs and EKs with capitals a two alphit sequence is used as a terminator, starting with the system control code-word 0α .
The purpose of FIG. 5C is to demonstrate the advantages of incorporating bit map memory means at the front end of the additional link memory unit 50.
The additional link memory unit 50 shown in FIG. 5A & 5B comprises in itself, a series of small EK systems, here employed as sub-systems. The first such sub-system in the series uses as input the EKS code-word concatenated with the third link alphit, transformed into two digital words. The second one of those two digital words comprises the last three bits from the EKS code-word combined with the third link alphit. The first digital word is added to a base address B3o and the resulting address is used to locate and read out a pair of boundary addresses from a boundary address table having its first row at address B3o. This first sub-system produces as an output a sequence-number T3S. The next sub-system in the series uses the T3S code-word concatenated with the next link alphit as an input which is transformed into two new digital words. The functioning is iterated until there are no more link alphits left.
Unless a match has been found, an overall link value matching condition is looked for by test of alternative overall additional link values, should previously stored link values have generated such alternatives. Such alternatives may or may not relate individually to duplicated main link storage locations holding identical current main link values. The case of such duplication is shown in FIG. 5B, where storage of the additional EK 'et cetera' has caused a L254 value to be stored within the main link memory unit, a value identical with the L255 value representing the EK 'etcetera'. By comparing the contents of string 32 in FIG. 5A and FIG. 5B it can be seen that the open compound 'et cetera' is stored exactly as the corresponding solid compound 'etcetera', except for the terminators that are designed to adjust the least significant part of the overall link value to facilitate a proper sorting order. In case one or more capital letters are part of an EK, each capital positioning alternative is represented by means of a specially selected L8 value. As shown in FIG. 5B the overall link value matching condition has been produced already upon examining the first one of the two boxes holding the value E9x, i.e. number 4 in the sequential search of L25q values. EKS=9 is derived from the relative address of the main link storage location related to this first matching alternative, i.e. the L254 location.
In FIG. 5C the bit map memory means 52 use as an input the EKS output produced from the main link memory unit 14. Hence, S1=EKS. The functioning of the map is equivalent to what has been explained with reference to FIG. 3C. However, in FIG. 5C is shown just one large map accomodating WWWx bit positions, i.e. WWWx EKS values. The map output M1 is an ordinal within a set comprising all EKs that require more than 4 alphits. Only two of the eleven code-words that are presumed to have been previously stored have caused a bit mark within the map as the others require 4 or less alphits to be fully identified. In a search operation the bit map is always tested for a bit mark condition. In case the relevant bit position is not marked, this is an indication that no tail to the four first alphits has been stored - no time is wasted consulting an additional link memory unit like the one discussed with reference to FIG. 5A & 5B. As will be shown below with reference to other figures, the additional link memory unit may take effect at a later point in the I/O code-word sequence.
FIG. 6A through 6E relate to embodiments including intermediate links. Code conversion to alphit code-words is employed. FIG. 6A illustrates a basic concept applied in all the embodiments shown in FIG. 6B through 6E. Following conventions are used for designating values, addresses, tables et cetera: p is an ordinal index (0,1,2,..) designating all previously stored boundary address value pairs, B2 & B4, in ascending address value order. Each B2 and B4 value represents the address of the first row of a table, address B2p pointing at table T2p located within the intermediate link memory unit and address B4poo pointing at table T4poo within the main link memory unit, q is an ordinal index (0,1,2,..) for each particular value of p, designating previously stored L2p values in acsending value order as L2pq values. r is an ordinal index (0,1,2,..) for each particular value of pq, designating previously stored L3Pq values in acsending value order as L3Pqr values. s is an ordinal index (0,1,2,..) for each particular value of pqr, designating previously stored L4Pqr values in acsending value order as L4Pqrs values.
Previously stored L4 values are held in one functionally contiguous string within the main link memory unit 14, each such designation carrying a pqrs index when shown in a figure. In FIG. 6A values L4pqro through L4PqrN are stored in value ascending order with ascending address values. As this convention is being followed for all pqr groups the series of composite pqrs numbers are forming a number series in ascending pqrs value order with ascending address values. The embodiment of FIG. 6A include input means 10, having eight current alphit values Ao-A7 within string 32 and four current link values L1-L4 within string 34. A first link memory unit 12 provides current boundary address values B2P and B4Poo, related to the current first link value L1. A main link memory unit 14 holds in store previously stored main link values L4pqrs. The intermediate link memory unit 58 has a plurality df storage locations functionally arranged in two column relation tables, 60 & 62. These tables hold value pairs, each comprising a link value and a related address component value. The value pairs are accomodated in two sets of tables, The set of T2p tables 60 and one set of T3pq tables 62, each table holding one or several value pairs in separate rows thereof. A current second link boundary address value B2p is used to locate a current second link table and the link values L2pq held within such current second link table are read out for comparison with the current second link value L2. An L2 value matching condition is produced if a previously stored identical second link value Lpq is found and an address value is produced, pointing at a current third link table T3pq containing all third link values L3pqr having previously been put in store as related to a previously stored second link value L2. The address value is produced using related address component values C2pq read out from the second link table 60 for determining an off-set value relative the location of the seecond link table 60. The link values L3pqr held within such current third link table are read out for comparison with the current third link value L3 and an L3 value matching condition is produced if a previously stored identical third link value L3pqr is found. A current main link table 64 can now be located, i.e. a table that holds all main link values L4pqrs having previously been put in store as related to the previously stored third link value L3pqr. Current boundary address values B4pqr and B4pqr + are produced by adding to the current boundary address B4poo off-set values, each such off-set value being determined from one or more address component values C3pqr, read out from the current third link table. The link values L4pqrs that are held within such current fourth link table, are read out for comparison with the current fourth link value L4 and an L4 value matching condition is produced if a previously stored identical main link value L4pqrs is found. Table 64 is searched sequentially according to the alphadecimal numbering shown in FIG. 6A. At number Iα the match is obtained and the related pointer address PpqrI is output.
FIG. 6B & 6C illustrate two arrangements using different methods for holding in store previously stored address component values within the intermediate link unit 58. Hence different procedures are followed for determining the off-set values. The I/O code-word input represents a sequence formed by the eight most frequently occurring letter symbols of the english alphabeth, 'etaonrih'. It is presumed that previously stored link values and address component values represent all possible eight symbol combinations of these eight specific letter symbols, corresponding to 16,777,216 different EKs. The main link memory unit 14 comprises a string of almost 17 million byte storage locations holding a corresponding number of L4pqrs values, along with a string holding related pointer address values. To conveniently cover the wide address range of these strings addressing by means of an off-set from the current boundary address B4Poo is employed as described above with reference to FIG. 6A. Both arrangements produce, using different procedures, identical off-set address values relative the B4Poo address value.
FIG. 6B shows how a match between L2 and L2P9 is obtained after a binary search within table 60, comprising the four steps numbered 0,1,2 & 3. From the second column are read out all address component values held within rows 0 through 9 and these C2pq values are added, as the integral sign is meant to illustrate. The sum, used as a relative address value, points at table T3P9 wherein a value L3p9Q, matching the current value L3, is found after a four step binary search. The B4p9Q+ value, equivalent to the B4p9R value, can now be produced by a summation of C3pqr values from C3poo through C3p9Q and including B4poo in the summation. B4p9Q is produced by excluding C3p9Q from the summation. Boundary address value B2p was used to determine where to start the first binary search and where to start the summation of C2pq values. The value B2p+ is used only in case table T3PN is involved, determining the end of the T3pN searching range. As shown in FIG. 6B as well as in FIG. 6C C2po and C2PA values are used to determine the ends of respective current searching ranges.
FIG. 6C shows 16-bit wide second columns within tables 60 & 62, introduced in order to avoid the two summations described above with reference to FIG. 6B. The location of table T3p9 is thereby found much faster by direct use of values C2P9 and C2pA as pointers relative table T2P. Further, values C3p9P and C3p9Q are used directly as off-set values to be added to value B4poo, whereby the output pointer Pp9Qr is found with minimal delay. As can be seen from FIG. 6B as well as from FIG. 6C, the binary search technique is used also within table 64 in order to avoid unnecessary delays.
FIG. 6D shows bit map memory means used as a sub-system comprising part of code conversion means. The bit map sub-system facilitates elimination of all redundant boundary address storage locations within the first link memory unit 12. The set of L1P values that are represented within memory unit 12 is compressed into a small set of ordinal code-words. The highest possible L1P value for the embodiments shown in FIG. 6A through 6D will thereby be reduced from 65535 to a substantially lower value, depending on the characteristics of the EK set having been previously stored. With the assumptions made above with reference to FIG. 6B & 6C, the highest L1p value is reduced to 1023, whereby first link storage requirements are reduced by a factor 64. The bit map memory would require a 256 byte T1 table, a 64 byte TM1 table, a 512 byte T2 table, a 128*2 byte TM2 table and the small ordinary TM table 54. In FIG. 6D it is presumed that the same eleven EKs as shown in FIG. 3A have been previously stored, one of them, i.e. 'etcetera', being used for current input. For this case the bit map memory requirements are the same except for table T2 and TM2, now reduced to 24 and 6 bytes respectively. The detailed functioning of the bit map memory system has been illustrated and described in relation to FIG. 3C. The value of the ordinal code-word referred to above, being assigned as the current link value L1, is equivalent to the EKS output of the bit map sub-system. Hence L1 is produced as a count of the number of marked bits within table T2 at bit addresses below the relative bit address 05Eα, shown in FIG. 6D to represent the current input. As can be concluded from FIG. 6D
L1=M25+M2o+M21 =7. Without addition of a bit map sub-system L1 would equate ( 9*32*32+24*32+7)*2=19982.
FIG. 6E shows an embodiment coping with input sequences generating fourteen alphit values. A possible additional link unit is not shown in FIG. 6E because of space limitations. Code conversion is made in two steps like in FIG. 6D, in FIG. 6E illustrated as being performed within first input means 66 and second input means 68, respectively. Within second input means 68 two EK systems in parallel are employed as sub-systems for producing ordinal code-word values, i.e. EKS values, which, when linked together within string 32, uniquely represent the current EK input.
FIG. 7A - 7B relate to EK system embodiments with input means employing code conversion to 4-bit nibble code-words and variable length code-words, respectively. The two embodiments both have code conversion means facilitating compact representation of EK prefix attributes as well as EK suffix attributes.
FIG. 7A illustrates how an input code-word sequence representing the EK 'etching' is converted to a nibble sequence 3D1569F1x, all in accordance with the code schema shown in FIG. 2B. The code conversion control program also executes a search for a compact code-word representation of EK prefix attributes within look-up table 70 and of EK suffix attributes within look-up table 72. The algorithm is shown in the lower left corner of the figure. The first four nibble values 3D15x are fed as a search argument to the prefix table and a value E is returned indicating the number of nibbles that have been allocated a compact nibble value N8. The E=0 response means no prefix found and Na is returned with the value 0. A value U=4 is returned from the suffix table along with N9=1, indicating that the value N9=1 shall substitute the last four nibbles of the search argument, i.e. the whole of 69F1x, which represents the suffix 'ing'. The four nibble values 3D15x are allocated to N4 through N7 and remaining nibble pairs not required to represent the EK input sequence are allocated a system control code-word value FDx, representing a nullity. Any singular unused nibble would be located in position 7 and given the value Fx, also representing a nullity when placed in the last position prior to the nibbles used to represent prefixes and suffixes. The FDx value is also used as an additional link terminator.
FIG. 7B illustrates how an input code-word sequence representing the EK 'etcetera' is converted to a variable length code-word representation in accordance with the code schema shown in FIG. 2C. No prefix or suffix is found, nevertheless the eight symbol EK can easily be represented by the first three link values. The last two bit positions prior to the nibble positions used to represent prefixes and suffixes are filled up with 11B, any sequence of binary ones representing a nullity in this position.
With reference to FIG. 8A, FIG. 7A, FIG. 2C and FIG. 1C the functioning of a preferred embodiment of a file handling, system for storing and transferring compressed text files will be explained. FIG. 8A relates to a text file handling system embodiment employing variable length code-words. In the following such file handling system will be referred to as an FH system for short. FIG. 8A shows the schema used for encoding and decoding textual words, i.e. ordinary graphic words as they appear within a running text. The text is input to the FH system in form of long sequences of standard I/O code-words, e.g. octet I/O code-words according to EASCII standard.
Each sub-sequence inbetween space separators, i . e . a textual word, is identified by the FH system as an entity, in the following referred to as a file segment. Capitalization of letters and also special signs at the beginning and end of a textual word, e.g. quotation marks, punctuation marks and parenthesis, are taken care of separately using techniques falling outside the scope of the embodiment being described.
For the particular type of text files to be handled by the FH system, e.g. ordinary english texts for general use, the expected frequencies of occurrence have been determined for all frequent segments. The 80 most common segments have been allocated 8-bit RSS code-words 10x-5Fx as is shown in FIG. 8A. Also shown are 1280 12-bit RSS code-words for allocation to less frequently occurring frequent segments and 20480 16-bit RSS code-words for allocation to least frequently occurring frequent segments. Each of the three RSS code-word groups is represented within the embodiment by anEK system embodiment according to FIG. 7A, in the following referred to as segment reference means. Each such segment reference means provides anEK S code-word, in the following named RSS code-word, in response to any valid segment input, i.e. an input equivalent to a previously stored EK. The FH system inputs each file segment in a prescribed order to the three segment reference means until a valid RSS code-word has been produced. In case none of the three segment reference means provides such a response, the FH system selects the 4-bit RSS code-word 0x, which means that a particular number of succeeding code-words shall be encoded and decoded according to the code schema shown in FIG. 2C, i.e. a variable length code capable of encoding and decoding any unfrequent file segment on a symbol by symbol basis. As is indicated in FIG, 8A, immediately following the RSS code-word 0x is one more 4-bit code-word determining the number of code-words to follow, representing individual unfrequent segment symbols.
All FH codewords have the prefix-free property and may be concatenated in strings of any length when held in store or being transmitted. Single spaces between segments does not require any special representation.
Decoding is performed using the embodiment according to FIG. 7A, however, with the control program set to operate in a reversed mode, basicly as has been described with reference to FIG. 1C.

Claims

B. CLAIMS
1. A system for storing a set of entity keys and for providing a response to a subsequent input of an individual entity key code representation, comprising input means for transforming part of a current entity key I/O code-word input into two or more linked digital words, which thereby each attains a current link value and jointly a current overall link value uniquely representing such part of such I/O code-word input, memory means for holding in store link values and address component values related to previously stored entity keys, and control means connected to said input means and to said memory means for determining a current address range for each link succeeding the first one, using at least one current link value to find previously stored address component values to determine the boundaries of such current range, for comparing the current value of each link succeeding the first one with at least one link value previously stored within such current address range, and for producing (i) an overall link value matching condition if, for each link succeeding the first one, the current link value matches a previously stored link value or (ii) an overall link value non-matching condition if the current overall link value is found to fall inbetween overall link values related to previously stored entity keys.
2. A system as defined in claim 1, having storage locations within said memory means provided by read/write type memory devices, thereby facilitating storage of previously not stored entity key I/O code-word inputs, wherein said control means include means for presenting a first link reject signal, should the current first link value not generate a succeeding link address range, means for presenting a succeeding link reject signal, should a current succeeding link value not match any previously stored link value within the current address range, means for executing write instructions in response to a first link reject signal, storing within said memory means, at locations related to the current first link value, address component values essential for determining new address ranges, at least for the next link, inserting within such new address range, for each one of all succeeding links, the current succeeding link value and, except for the last link, storing address component values related to each inserted link value, essential for determining new address ranges, at least for the next link, means for executing write instructions in response to a succeeding link reject signal, inserting within said memory means, at a location within such current address range where a current link value turned out not to match any previously stored link value, such non-matching current link value, also inserting within a new address range, for each one of any remaining succeeding links, the current succeeding link value and, except for the last link, storing address component values related to each inserted link value, essential for determining new address ranges, at least for the next one of any remaining succeeding links, and means for executing move instructions prior to the insertion of current link values, thereby providing accomodation for such current link values as well as for any related address component values and for adjusting previously stored address component values to conform to any address changes resulting from the execution of such move instructions.
3. A system as defined in claim 1, wherein said control means include means for outputting, if the overall link value matching condition has been produced, an address component value previously put in store within said memory means as uniquely related to such overall link value.
4. A system as defined in claim 1, wherein said control means include means for outputting, if an overall link value non-matching condition has been produced, an address component value previously put in store as related to an interval including the current overall link value, inbetween adjacent overall link values representing previously stored entity keys.
5. A system as defined in claim 1, wherein said input means include code conversion means for substituting one or more compact code-words for the entity key I/O code-word input, each compact code-word being drawn from a compact code, such compact code not necessarily being the same for each individual position within a succession of compact code-words, for temporarily allocating the overall link value of the compact code-word representation to functionally successive memory cells, and for assigning the contents of such successive memory cells to the linked digital words, not excluding, prior to such assigning, a substitution of one or more ordinal code-words for the contents of one or more sections of such functionally successive memory cells.
6. A system as defined in claim 1, wherein said input means include code conversion means for producing one ordinal code-word to replace and represent at least one section of a string representation of part of the entity key I/O code-word input, for iteratively producing at least one further ordinal code-word to replace and represent a concatenated digital word formed by linking, in any prescribed order, the ordinal code-word just having been produced with at least one more section of such string, for temporarily allocating the overall value of a last concatenated digital word to functionally successive memory cells, such last digital word being formed by linking, in any prescribed order, the ordinal code-word just having been produced with any remaining sections of the string, and for assigning the contents of such successive memory cells to at least one of the linked digital words.
7. A system as defined in claim 1, wherein mandatory links, i.e. a fixed minimum number of links, are used for any input, the last one of which is a main link, wherein at least one additional link is required to fully and uniquely represent such I/O code-word inputs that generate more digits than can be accomodated within the mandatory links, and wherein said input means include means for inserting, within at least one of the mandatory links, specific digits representing a nullity as far as the entity key I/O coding is concerned, should the entity key I/O code-word input not generate sufficient number of digits for the mandatory links to attain defined link values, and wherein said control means include means for outputting, if the overall link value matching condition has been produced, an entity key reference number derived from the relative address of the main link storage location where the current main link value was found.
8. A system as defined in claim 1, wherein mandatory links, i.e. a fixed minimum number of links, are used for any input, the last one of which is a main link, wherein at least one additional link is required to fully and uniquely represent such I/O code-word inputs that generate more digits than can be accomodated within the mandatory links, and wherein said memory means include bit map memory means for holding in store a bit mark representation of the set of main link reference numbers that relate to previously stored I/O code-word inputs requiring such additional links.
9. A system as defined in claim 7, wherein said memory means include means for producing the overall link value matching condition by examining alternative overall additional link values, should previously stored link values have generated such alternatives in order to uniquely represent different inputs generating identical overall mandatory link values, and wherein said control means include means for outputting, if the overall link value matching condition has been produced upon such examining, a two part entity key reference number, a main part derived from the relative address of the storage location where the current main link value was found and an additional part uniquely identifying the matching overall additional link value.
10. A system as defined in claim 7, wherein said memory means include means for producing the overall link value matching condition by examining alternative overall additional link values, should previously stored link values have generated such alternatives individually related to duplicated main link storage locations holding identical current main link values, and wherein said control means include means for outputting, if the overall link value matching condition has been produced upon such examining, an entity key sequence number derived from the relative address of the particular main link, storage location being related to the matching alternative.
11. A system as defined in claim 10, having means for generating an entity key I/O code-word output in response to an input of a code representation of a current entity key sequence number, wherein said control means include means for determining the current absolute address of a particular main link storage location, the current relative address of which equals the current entity key sequence number, and for reading from such particular storage location a current main link value, means for searching said memory means for a current main link address range that is encompassing the current absolute address, and for producing current preceding link values as related to the current main link address range, as well as any current additional link values as related to the current main link absolute address, and output means for retransforming the overall link value of the complete series of current link values to a corresponding entity key I/O code-word output.
12. A system as defined in claim 1, transforming an entity key I/O code-word input into at least three digital words, wherein said memory means include a first link memory unit for providing current boundary addresses related to current first link values, a main link memory unit for holding in store previously stored main 1 ink values, the intermediate link memory unit, having a plurality of storage locations functionally arranged in two column relation tables for holding value pairs, each such pair comprising a link value and a related address component value, for accomodating the value pairs in one or several sets of tables, one set for each intermediate link, each table holding one or several value pairs in separate rows thereof, and for using at least one current second link boundary address to locate a current second link table, and wherein said control means include means for comparing the link values held within such current second link table with the current second link value, for producing a second link value matching condition if a previously stored identical second link value is found, and for computing an address value pointing at a current third link table containing all third link values having previously been put in store as related to a previously stored second link value, the address value being produced using related address component values extracted from the current second link table for computing an off-set value relative the location of the current second link table or relative a current third link boundary address being obtained from said first link memory unit, for comparing the link values held within such current third link table with the current third link value, for producing a third link value matching condition if a previously stored identical third link value is found, and, provided the link in question is not the main link, an address value pointing at a current fourth link table containing all fourth link values having previously been put in store as related to a previously stored third link value, the address value being produced using related address component values extracted from the current third link table, not excluding the alternative of extracting address component values from preceding third link tables within a sub-set of third link tables related to the current second link table, for computing an off-set value relative the location of the current third link table, relative the location of the current second link table or relative a current fourth link boundary address being obtained from said first link memory unit, and for repeating such functions in an analogous manner link by link until the main link is supposedly being reached.
13. A system as defined in claim 5, using compact code-words to represent lowercase letters, preferably having been allocated ascending values in progressing alphabetic order, wherein said code conversion means include means for including at the end of the succession of compact code-words a control function code-word to make such succession represent also capital letters, such control function code-word comprising at least one bit and having one value, preferably the highest, when governing an all-lowercase letter sequence and various other values, preferably lower, for alternative capital letter constellations, in particular an extreme value to make the sequence represent an all-capital entity key, whereby such entity key will be positioned correctly within a sequence of entity keys being sorted in overall link value order, and/or means for including at the end of the succession of compact code-words a control function code-word to make such succession represent also an open or hyphened compound, such control function code-word having one value, preferably the highest, when governing a solid compound and other values when governing an exact space or hyphen position, in particular an extreme value to make the sequence represent an open compound with a single letter as the first element, such as 'a priori', and a value next to such an extreme value to make the same succession of compact code-words represent the corresponding hyphened compound, i.e. 'a-priori', or means for including at the end of the succession of compact code-words a control function code-word to make such succession represent also capitalized compounds, still preserving a desirable sorting order by utilizing a wide spectrum of control function code-word values, using an extreme value, to govern an all-capital compound open after the first letter, such as 'A PRIORI'.
14. A system as defined in claim 5, wherein said code conversion means include means for generating a compact code representation, whereby compact code-words representing frequently occurring entity key I/O code-words, pairs of entity key I/O code-words or larger groups of entity key I/O code-words are made up of fewer bits and in case of less frequent occurrences of larger number of bits.
15. A system as defined in claim 5, wherein said code conversion means include means for holding in store a compact code-word representation of at least one entity key attribute set, each such set representing a unique selection of entity key attributes, and means for comparing a succession of compact code-words with the compact code-word representation of at least one entity key attribute set and for substituting one specific compact code-word for the specific part of such succession of compact code-words that is equalling the compact code-word representation of a specific entity key attribute, such specific compact code-word being drawn from a compact code being unique for the individual code-word position within the refurbished succession of compact code-words or being unique as governed by a code shift control code-word.
16. A system for storing a set of entity keys and for providing a response to a subsequent input of an individual entity key code representation, comprising input means for transforming part of a current entity key I/O code-word input into one digital word comprising at least one section, whereby each such section attains a current section value and when linked together a current overall section value uniquely representing such part of such I/O code-word input, bit map memory means for holding in store bit representations of section values related to previously stored entity keys, and control means connected to said input means and to said memory means for finding a current bit location using a current first section value as a relative address value, for reading out the bit value of such location to test if a previously stored entity key has produced a bit mark or not, for producing, upon a bit mark match, a section ordinal code-word from a count of the number of marked bit locations within a bit location address range below or above the address of the current bit location, for outputting an input response if no current section value remains unused, else producing a further relative address value by concatenating, in any prescribed order, the current contents of at least one remaining section with the preceding section ordinal code-word just having been produced, for finding, within an individually section related part of the bit map, a further current bit location using such further relative address value and for repeating iteratively the functioning, section by section, as long as any current section value remains unused and the bit mark match is produced.
17. An arrangement as defined in claim 16, wherein said control means include means for producing marked bit counts by reading out overall binary word values from consecutive bit map sub-areas, each comprising a succession of bit locations, and for using such overall values as entries into a look-up table providing sub-area counts as outputs.
18. A system as defined in claim 16, wherein said memory means include means for holding in store, at least within one section related part of the bit map, a set of marked bit counts, each member of such set representing a count taken within a limited address range, for providing, related to a bit location close to the current bit location, an aggregate count value produced from at least one such limited address range count, and for producing the current section ordinal code-word from such aggregate count value, adding or subtracting a marked bit count produced for the interval between the close and the current bit location.
19. A file handling system for storing and transferring compressed files, each file containing at least one segment, any such segment being either a frequent segment or an unfrequent segment, each segment being input to and output from the system encoded into I/O code-words drawn from a distinct I/O code, comprising segment reference means for holding at least one reference segment data set, each such reference segment data set comprising a system representation of a special selection of frequent segments, and for allocating a distinct RSS code-word to each reference segment, such distinct RSS code-word being retrievable for any frequent segment, using the frequent segment as a key, and conversely, any frequent segment being retrievable using a distinct RSS code-word as a key, coding means for substituting one distinct RSS code-word for each distinct sequence of I/O code-words that is representing a distinct frequent segment, each such distinct RSS code-word being retrieved from a specific reference segment data set and applied inside the file handling system as a file code-word, and for substituting a distinct sequence of file code-words for each sequence of I/O code-words that is representing a distinct unfrequent segment, such file code words being drawn from a distinct variable length type of code using fewer digits to represent frequently occurring I/O code words, pairs of I/O code-words or larger groups of I/O code-words and a larger number of digits in cases of less frequent occurrences, and decoding means for outputting an I/O code-word representation of a file being retrieved from the system storage or being transferred from another system location, thereby substituting I/O code-words for the file code-words.
20. A file handling system as defined in claim 19, wherein said coding means include means for employing variable length RSS code-words to represent frequent segments, using fewer digits to represent more frequently occurring frequent segments and a larger number of digits in case of a less frequently occurring frequent segments.
21. A method for storing a set of entity keys and for providing a response to a subsequent input of an individual entity key code representation, including the steps of transforming part of a current entity key I/O code-word input into at least two linked digital words, which thereby each attains a current link value and jointly a current overall link value uniquely representing such part of such I/O code-word input, determining a current address range for each link succeeding the first one, using at least one current link value to find previously stored address component values to determine the boundaries of such current range, comparing the current value of each link succeeding the first one with at least one link value previously stored within such current address range, and producing (i) an overall link value matching condition, if for each link succeeding the first one, the current link value matches a previously stored link value or (ii) an overall link value non-matching condition if the current overall link value is found to fall inbetween overall link values related to previously stored entity keys.
PCT/SE1987/000406 1986-09-09 1987-09-09 Arrangement for data compression WO1988002144A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
NO881232A NO881232D0 (en) 1986-09-09 1988-03-21 DATA COMPRESSION ARRANGEMENT.

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US06/905,237 US4782325A (en) 1983-12-30 1986-09-09 Arrangement for data compression
US905,237 1986-09-09

Publications (1)

Publication Number Publication Date
WO1988002144A1 true WO1988002144A1 (en) 1988-03-24

Family

ID=25420474

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE1987/000406 WO1988002144A1 (en) 1986-09-09 1987-09-09 Arrangement for data compression

Country Status (4)

Country Link
EP (1) EP0326560A1 (en)
JP (1) JPH02500693A (en)
AU (1) AU7915387A (en)
WO (1) WO1988002144A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU624205B2 (en) * 1989-01-23 1992-06-04 General Electric Capital Corporation Variable length string matcher

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3821711A (en) * 1972-12-26 1974-06-28 Ibm Self adaptive compression and expansion apparatus for changing the length of digital information
US4428063A (en) * 1980-10-03 1984-01-24 Thomson-Csf Data time compression device and decompression device
US4491934A (en) * 1982-05-12 1985-01-01 Heinz Karl E Data compression process
US4558302A (en) * 1983-06-20 1985-12-10 Sperry Corporation High speed data compression and decompression apparatus and method
US4560976A (en) * 1981-10-15 1985-12-24 Codex Corporation Data compression
WO1986000479A1 (en) * 1984-06-19 1986-01-16 Telebyte Corporation Data compression apparatus and method
US4586027A (en) * 1983-08-08 1986-04-29 Hitachi, Ltd. Method and system for data compression and restoration
GB2172127A (en) * 1985-03-06 1986-09-10 Ferranti Plc Data compression

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3821711A (en) * 1972-12-26 1974-06-28 Ibm Self adaptive compression and expansion apparatus for changing the length of digital information
US4428063A (en) * 1980-10-03 1984-01-24 Thomson-Csf Data time compression device and decompression device
US4560976A (en) * 1981-10-15 1985-12-24 Codex Corporation Data compression
US4491934A (en) * 1982-05-12 1985-01-01 Heinz Karl E Data compression process
US4558302A (en) * 1983-06-20 1985-12-10 Sperry Corporation High speed data compression and decompression apparatus and method
US4558302B1 (en) * 1983-06-20 1994-01-04 Unisys Corp
US4586027A (en) * 1983-08-08 1986-04-29 Hitachi, Ltd. Method and system for data compression and restoration
WO1986000479A1 (en) * 1984-06-19 1986-01-16 Telebyte Corporation Data compression apparatus and method
GB2172127A (en) * 1985-03-06 1986-09-10 Ferranti Plc Data compression

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
IBM Technical Disclosure Bulletin, Vol 24 No 5 October 1981, p. 2497-2510, 'HighSpeed Computer Processor for Data Compression of Electrocardiogram Signals" C W COKER, Jr. *

Also Published As

Publication number Publication date
AU7915387A (en) 1988-04-07
EP0326560A1 (en) 1989-08-09
JPH02500693A (en) 1990-03-08

Similar Documents

Publication Publication Date Title
US4782325A (en) Arrangement for data compression
US6876774B2 (en) Method and apparatus for compressing data string
US6671694B2 (en) System for and method of cache-efficient digital tree with rich pointers
US7937413B2 (en) Self-adaptive prefix encoding for stable node identifiers
Gagie et al. Colored range queries and document retrieval
US4625295A (en) Textual comparison system for locating desired character strings and delimiter characters
JPH03204232A (en) Encoding of compressed data
US5913209A (en) Full text index reference compression
JPH11505052A (en) System and method for reducing search range of lexical dictionary
CN105260354A (en) Chinese AC (Aho-Corasick) automaton working method based on keyword dictionary tree structure
JPH0634260B2 (en) Word dictionary matching device
US5585793A (en) Order preserving data translation
CA2413055C (en) Method and system of creating and using chinese language data and user-corrected data
JPH03204233A (en) Data compression method
US4941124A (en) Text comparator with counter shift register
US3938105A (en) Sequentially encoded data structures that support bidirectional scanning
US5225833A (en) Character encoding
US5610603A (en) Sort order preservation method used with a static compression dictionary having consecutively numbered children of a parent
US4531201A (en) Text comparator
US3918027A (en) Scanning and error checking apparatus for address development utilizing symmetric difference encoded data structure
EP3136607A1 (en) A method and a system for encoding and decoding of suffix tree and searching within encoded suffix tree
JPH03204234A (en) Restoration of compressed data
WO1988002144A1 (en) Arrangement for data compression
JPH056398A (en) Document register and document retrieving device
US5119327A (en) Text comparator with counters for indicating positions of correctly decoding text elements within an ordered sequence of text elements

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU DK FI JP NO

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE FR GB IT LU NL SE

WWE Wipo information: entry into national phase

Ref document number: 1987906026

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1987906026

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1987906026

Country of ref document: EP