CA2195817C - System for maintaining a data base log - Google Patents
System for maintaining a data base log Download PDFInfo
- Publication number
- CA2195817C CA2195817C CA002195817A CA2195817A CA2195817C CA 2195817 C CA2195817 C CA 2195817C CA 002195817 A CA002195817 A CA 002195817A CA 2195817 A CA2195817 A CA 2195817A CA 2195817 C CA2195817 C CA 2195817C
- Authority
- CA
- Canada
- Prior art keywords
- data
- dictionary
- initial
- establishing
- starting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 230000006835 compression Effects 0.000 claims abstract description 14
- 238000007906 compression Methods 0.000 claims abstract description 14
- 238000011084 recovery Methods 0.000 claims abstract description 8
- 230000006837 decompression Effects 0.000 claims description 5
- 238000000034 method Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 101150058765 BACE1 gene Proteins 0.000 description 1
- 101100289061 Drosophila melanogaster lili gene Proteins 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
- H03M7/3088—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing the use of a dictionary, e.g. LZ78
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99942—Manipulating data structure, e.g. compression, compaction, compilation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99951—File or database maintenance
- Y10S707/99952—Coherency, e.g. same view to multiple users
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99951—File or database maintenance
- Y10S707/99952—Coherency, e.g. same view to multiple users
- Y10S707/99953—Recoverability
Abstract
The invention relates to a system for maintaining a log of incoming records for a database system. Seek points are inserted into the compressed data log in a manner that allows recovery to start from a specified point without a need for decompressing earlier portions of the log. The initial block of data is used as the compression dictionary. A new compression sequence using the same initial compression dictionary is started at each seek point.
Description
21 ~58 1 7 SYSTEM FOR MAINTAINING A
D~TA BASE LOG
Fiel-l of the Inventioll This invention relates to ~t~ba~e systems, and the like, where incoming data 5 is kept in a log or journal so that entries can be recovered in the event of a power failure or other unanticipated shut down.
R~~~olmll of thP InvPntio~l Databases are expected to provide durable storage of data, resilient against system and power failures. The standard mechanism used to enable recovery from 10 such failures is to create a jourDal or log of updates made to the database from some known initial or checkpoint state. Before any change to the ~t~b~e is committ~ or processed, a log record in(lic~ting this change must be written out to disk or other non-volatile memory. At recovery time, the log records can be traversed to recapture the effects of all committed updates.
Disk VO is typically the biggest bottleneck in database performance, and the writing of log records accounts for a signifi~ ~nt fraction of the disk VO in most database systems. In main-memory ~t~ba~e systems in particular, log writes may account for nearly all of the disk VO activity. A reduction in the amount of loginformation that needs to be recorded will boost overall performance.
20 S~mm~-y Of tllç ~nv~ntion In the system according to the invention the log records are compressed prior to writing out to disk. A small CPU price is paid for the compression and decompression, but this is usually far outweighed by the savings in disk I/O.
Compression on a record-by-record basis is usually too short a data stream to be25 effective. Universal compression algo~ llls are effective on long data streams but generally must decompress starting at the beginning of the log. This approach isimpractical in the typical situation where a long data stream is involved and only a relatively small recent segment need be decompressed. To avoid these problems, 21 95~1 7 seek points are inserted in the system according to the invention in a manner that allows recovery to start from a specified point, without a need for decompressing earlier portions of the log.
At the time that a particular update is co.~ rcl the log record S coll~spol ding to it must, in co,~lessed form, be completely on disk. In addition, partial colllplessed log records of additional updates may also have reached the disk due to the granularity of the collll)lession process. With the system for incremental compression according to the invention log records are written out one page at atime, while achieving co~ s~ion ratios similar to what could have been obtained 10 if the entire log sequence had been co,nplessed at once.
In a typical dat~b~ce system according to the invention, compression of log records is made increm~nt~lly from one seek point to the next. The initial block of data is used as the co~ ession dictionary. A new colll~r.,ssion se~luence using the;
same initial colll~lession dictionary is started after each seek point. This approach 15 enables independent access to the portion of the log ~t~.,n seek points.
EApe~illlcll~ in~icatç that co~ ~ssed log records occupy only 10-15% of the space occupied by the uncollll,l.,ssed records. For workloads with a large number of updates, pelr~lnlance incleascs by almost an order of m~gnitllde can be achieved by colllplession accolding to the invention.
In some cases, log records are co......... ~ni~ d to remote sites for backup.Colllpl.,.,sion accordillg to the invention can reduce co.. ~ ;on costs in addition to I/O costs, and thus can become more valuable.
4~ 3 nr~s The following drawings together with the det~ d ~cirlcation disclose a 25 preferred embodiment of the invention, wL~I~,n:
Figure 1 is a scll~m~tic diagram illu~ ing the operation of the Lempel-Ziv data collll"ession algo~ill~;
Figure 2 is a sch~m~til diagram illus~ ing the data co.~ ,ssion technique according to the invention; and Figure 3 is a flow diagram illu~lla~ g data flow according to the invention in a d~t~ba~e system.
r . ~ n of thP ~nv~nt~
Although the invention is more generally applicable, it will be described in S the context of the popular windowed Lempel-Ziv algo,i~ . The Lempel-Ziv algorithm is described in the article, "A Universal Algolillllll for Sequential Data Col,lplession", by Jacob Zh and Abraham Lempel, ~FFF. Tr~nc~ion~ on Tnforrn~ti-n Throry~ Vol. IT-23, No. 3, May 1977. The idea behind the Lempel-Ziv algo,i~ is to se~ect a window size of K bytes, and use the latest K bytes of10 input data to code the next portion of the incoming data. This next portion of input data is coded in accordance with the longest string for which an exact match is found in the current window being used as a "dictionary", provided this is larger than some threshold. If a match is found, the data is leplaced by a pointer to the start of the match and an in-lir~tor for the length of the match. If a match is not 15 found, the single character le.~l&ills. The window is then a~va~ced by the number of characters encoded so that it is once more the last K bytes of input already seen.
Thus, data col,lpl. ssion is achieved by d~t~c~;ng data sc~ enres that are re~lilive within a certain di~t~nre. Where a rep~ sequence is de~ct~cl, the sequence is replaced by a pointer that points to the prior oc~;u". .lce and the length 20 of the ,~,~lili~e se~ e.
Operation of the Lempel-Ziv algo,ill~ is illu~LIat~d in Fig. 1 with respect to a data stream 10 which starts on the right and moves left to right. The initial block of data in datd stream 10 is of length k bytes and makes up the initial dictionary 12. ln the colllpatisoll of the initial dictionary with the following data in the data 25 stream assume that there are seven bytes 22 that are an exact match to seven bytes 24 in the dictionary starting from posilio~ P2. Under these ch~ res the moving window dictionary moves fo,~ard seven places. In other words, the dictionary window advances to include the seven bytes of the prior match and seven bytes fall off the trailing end. After the window has passed the area of the match, the seven bytes are replaced by the pointer to position P2 and by the length seven.
The algolill..-. continues in this fashion with the window dictionary moving along the data stream according to the length of exact matches and by 5 length/pointer insertions made in place of data m~tching a moving dictionary sequence. If, for example, after the dictionary is moved seven bytes as a result of the first match, the next eleven bytes 26 in the example are found to be an exact match to eleven bytes 28 in the moving dictionary. These eleven bytes are replaced by a pointer to position P27 with a length in~ tion of eleven.
10 Col.lp,ession continues in this fashion until the end of the data stream is reached.
Database ~y~t~lns are cu~lol~lârily consLIucled to include a log or journal to store all of the incoming data on a non-volatile ,,,~ " before any of the incoming data is processed. In the event of a power failure or other system failure, any received data which has not been processed due to the failure can be recovered 15 from the last part of the journal. Data col.,l.,.,ssion of the journal has not been practical because most data COlll~l.,S~iOIl t~chni~l~es require decolllplession of the data stream &om the beginning where the d~cQ...~ ssion dictionary is found. In large ~y~t~,lls involving millions of records, for examplel such coll-pression techni~ es would not be practicable because of the long time required to 20 decomp,.,ss the entire data stream to ,.,co~,r the most l~ce~ly received data. On the other hand, com~ression on record-by-record basis is also impracticable ~callse the records are generally too short to provide .nP~ni~rul compression since much of the record would be taken up by the dictionary that cannot be col.... ...l.~ssed.
In the system accolding to the invention, data blocks of almost any desired size can be colll~ssed and ~coveled without decoll-p~ssing the entire data stream. As shown in Fig. 2, data stream 30 is divided into data blocks 31, 32, 33, etc. A first seek point is located at the end of the first data block prior to the start of the second data block. A seek point is simil~rly located at the start of each of 30 the other data blocks.
The initial k bytes of the first data block 31 are used as the initial dictionary 40. This initial dictionary is stored in a non-colllplt ssed form and is used toinitiate the co",pression of the rem~in~r of the first data block 31 according to the Lempel-Ziv algo,i~ l. At the start of the second data block 32 a dictionary 42 5 which is the same as initial dictionary 40 is used to start co",pression of the data block. Since dictionary 42 is the same as initial dictionary 40 there is no need to store dictionary 42. Also since the initial portion of the second data block is not used as a dictionary the entire data block can be colllp~ssed. Likewise the sameinitial dictionary is used as dictionary 44 to start cc,-llpresjion of the third data 10 block and all subsequent data blocks. Using the same initial dictionary for all of the data blocks provides co",ples~ion err,cie~;y nearly as great as that provided by sepa,a~e initial dictionaries from each data block.
The techni~ue according to the hl~ n~ion has the advantage of having to store only one initial dictionary regardless of the length of the co~l),essed data 15 stream. With this allangeme,l~ storage space for dictionaries is reduced and the portion of the data stream that can be co,.lpressed is i~creased. The pl~cipal advantage, ho~. ~er, is that deco.llp~ssion leCOVe~ can commence at any seek point and deco-llpression from the begi~g of the data stream is ~ cess~ry.
Fig. 3 is a data flow diagram of a ~lqt~h~e system according to the 20 invention. The il~co.lli~g data is first p~ocessed at 40 to create an update log which stores the incoll,ing data in a coll.l,lessed form according to the order received.
The i~o...i~g data is stored in a buffer memory 41 until the incoming data in log or journal folm is stored in non-volatile disk .1l. lllol~ 44.
The first step in c~eating the update log is step 45 which extracts the initial 25 dictionary from the bc~;i"ning of the first data block. This initial dictionary is used to colll~less the rem~in-l-or of the first data block in step 46. The same initial dictionary is used to co~press subse.lu~..t data blocks in their e~ e~y in steps 48 and 50. A first seek point 47 is noted at the end of the first data block and a new co.lll,ression sequenre using the initial dic~ionary stored in step 45 is used for 30 colll~ssion of the second data block in step 48. The same sequen~ e is followed for s~lbse.luent data blocks in each case using the same initial dictionary stored in step 45 to start the data col,lpl~,ssion for the data block.
The invention has been described according to a prere.l~d embodiment in a ~t~bace system but is useful in llulllc~vus other applications where data 5 col~l~lession can be incorporated. The invention is more particularly defined in the appended claims.
D~TA BASE LOG
Fiel-l of the Inventioll This invention relates to ~t~ba~e systems, and the like, where incoming data 5 is kept in a log or journal so that entries can be recovered in the event of a power failure or other unanticipated shut down.
R~~~olmll of thP InvPntio~l Databases are expected to provide durable storage of data, resilient against system and power failures. The standard mechanism used to enable recovery from 10 such failures is to create a jourDal or log of updates made to the database from some known initial or checkpoint state. Before any change to the ~t~b~e is committ~ or processed, a log record in(lic~ting this change must be written out to disk or other non-volatile memory. At recovery time, the log records can be traversed to recapture the effects of all committed updates.
Disk VO is typically the biggest bottleneck in database performance, and the writing of log records accounts for a signifi~ ~nt fraction of the disk VO in most database systems. In main-memory ~t~ba~e systems in particular, log writes may account for nearly all of the disk VO activity. A reduction in the amount of loginformation that needs to be recorded will boost overall performance.
20 S~mm~-y Of tllç ~nv~ntion In the system according to the invention the log records are compressed prior to writing out to disk. A small CPU price is paid for the compression and decompression, but this is usually far outweighed by the savings in disk I/O.
Compression on a record-by-record basis is usually too short a data stream to be25 effective. Universal compression algo~ llls are effective on long data streams but generally must decompress starting at the beginning of the log. This approach isimpractical in the typical situation where a long data stream is involved and only a relatively small recent segment need be decompressed. To avoid these problems, 21 95~1 7 seek points are inserted in the system according to the invention in a manner that allows recovery to start from a specified point, without a need for decompressing earlier portions of the log.
At the time that a particular update is co.~ rcl the log record S coll~spol ding to it must, in co,~lessed form, be completely on disk. In addition, partial colllplessed log records of additional updates may also have reached the disk due to the granularity of the collll)lession process. With the system for incremental compression according to the invention log records are written out one page at atime, while achieving co~ s~ion ratios similar to what could have been obtained 10 if the entire log sequence had been co,nplessed at once.
In a typical dat~b~ce system according to the invention, compression of log records is made increm~nt~lly from one seek point to the next. The initial block of data is used as the co~ ession dictionary. A new colll~r.,ssion se~luence using the;
same initial colll~lession dictionary is started after each seek point. This approach 15 enables independent access to the portion of the log ~t~.,n seek points.
EApe~illlcll~ in~icatç that co~ ~ssed log records occupy only 10-15% of the space occupied by the uncollll,l.,ssed records. For workloads with a large number of updates, pelr~lnlance incleascs by almost an order of m~gnitllde can be achieved by colllplession accolding to the invention.
In some cases, log records are co......... ~ni~ d to remote sites for backup.Colllpl.,.,sion accordillg to the invention can reduce co.. ~ ;on costs in addition to I/O costs, and thus can become more valuable.
4~ 3 nr~s The following drawings together with the det~ d ~cirlcation disclose a 25 preferred embodiment of the invention, wL~I~,n:
Figure 1 is a scll~m~tic diagram illu~ ing the operation of the Lempel-Ziv data collll"ession algo~ill~;
Figure 2 is a sch~m~til diagram illus~ ing the data co.~ ,ssion technique according to the invention; and Figure 3 is a flow diagram illu~lla~ g data flow according to the invention in a d~t~ba~e system.
r . ~ n of thP ~nv~nt~
Although the invention is more generally applicable, it will be described in S the context of the popular windowed Lempel-Ziv algo,i~ . The Lempel-Ziv algorithm is described in the article, "A Universal Algolillllll for Sequential Data Col,lplession", by Jacob Zh and Abraham Lempel, ~FFF. Tr~nc~ion~ on Tnforrn~ti-n Throry~ Vol. IT-23, No. 3, May 1977. The idea behind the Lempel-Ziv algo,i~ is to se~ect a window size of K bytes, and use the latest K bytes of10 input data to code the next portion of the incoming data. This next portion of input data is coded in accordance with the longest string for which an exact match is found in the current window being used as a "dictionary", provided this is larger than some threshold. If a match is found, the data is leplaced by a pointer to the start of the match and an in-lir~tor for the length of the match. If a match is not 15 found, the single character le.~l&ills. The window is then a~va~ced by the number of characters encoded so that it is once more the last K bytes of input already seen.
Thus, data col,lpl. ssion is achieved by d~t~c~;ng data sc~ enres that are re~lilive within a certain di~t~nre. Where a rep~ sequence is de~ct~cl, the sequence is replaced by a pointer that points to the prior oc~;u". .lce and the length 20 of the ,~,~lili~e se~ e.
Operation of the Lempel-Ziv algo,ill~ is illu~LIat~d in Fig. 1 with respect to a data stream 10 which starts on the right and moves left to right. The initial block of data in datd stream 10 is of length k bytes and makes up the initial dictionary 12. ln the colllpatisoll of the initial dictionary with the following data in the data 25 stream assume that there are seven bytes 22 that are an exact match to seven bytes 24 in the dictionary starting from posilio~ P2. Under these ch~ res the moving window dictionary moves fo,~ard seven places. In other words, the dictionary window advances to include the seven bytes of the prior match and seven bytes fall off the trailing end. After the window has passed the area of the match, the seven bytes are replaced by the pointer to position P2 and by the length seven.
The algolill..-. continues in this fashion with the window dictionary moving along the data stream according to the length of exact matches and by 5 length/pointer insertions made in place of data m~tching a moving dictionary sequence. If, for example, after the dictionary is moved seven bytes as a result of the first match, the next eleven bytes 26 in the example are found to be an exact match to eleven bytes 28 in the moving dictionary. These eleven bytes are replaced by a pointer to position P27 with a length in~ tion of eleven.
10 Col.lp,ession continues in this fashion until the end of the data stream is reached.
Database ~y~t~lns are cu~lol~lârily consLIucled to include a log or journal to store all of the incoming data on a non-volatile ,,,~ " before any of the incoming data is processed. In the event of a power failure or other system failure, any received data which has not been processed due to the failure can be recovered 15 from the last part of the journal. Data col.,l.,.,ssion of the journal has not been practical because most data COlll~l.,S~iOIl t~chni~l~es require decolllplession of the data stream &om the beginning where the d~cQ...~ ssion dictionary is found. In large ~y~t~,lls involving millions of records, for examplel such coll-pression techni~ es would not be practicable because of the long time required to 20 decomp,.,ss the entire data stream to ,.,co~,r the most l~ce~ly received data. On the other hand, com~ression on record-by-record basis is also impracticable ~callse the records are generally too short to provide .nP~ni~rul compression since much of the record would be taken up by the dictionary that cannot be col.... ...l.~ssed.
In the system accolding to the invention, data blocks of almost any desired size can be colll~ssed and ~coveled without decoll-p~ssing the entire data stream. As shown in Fig. 2, data stream 30 is divided into data blocks 31, 32, 33, etc. A first seek point is located at the end of the first data block prior to the start of the second data block. A seek point is simil~rly located at the start of each of 30 the other data blocks.
The initial k bytes of the first data block 31 are used as the initial dictionary 40. This initial dictionary is stored in a non-colllplt ssed form and is used toinitiate the co",pression of the rem~in~r of the first data block 31 according to the Lempel-Ziv algo,i~ l. At the start of the second data block 32 a dictionary 42 5 which is the same as initial dictionary 40 is used to start co",pression of the data block. Since dictionary 42 is the same as initial dictionary 40 there is no need to store dictionary 42. Also since the initial portion of the second data block is not used as a dictionary the entire data block can be colllp~ssed. Likewise the sameinitial dictionary is used as dictionary 44 to start cc,-llpresjion of the third data 10 block and all subsequent data blocks. Using the same initial dictionary for all of the data blocks provides co",ples~ion err,cie~;y nearly as great as that provided by sepa,a~e initial dictionaries from each data block.
The techni~ue according to the hl~ n~ion has the advantage of having to store only one initial dictionary regardless of the length of the co~l),essed data 15 stream. With this allangeme,l~ storage space for dictionaries is reduced and the portion of the data stream that can be co,.lpressed is i~creased. The pl~cipal advantage, ho~. ~er, is that deco.llp~ssion leCOVe~ can commence at any seek point and deco-llpression from the begi~g of the data stream is ~ cess~ry.
Fig. 3 is a data flow diagram of a ~lqt~h~e system according to the 20 invention. The il~co.lli~g data is first p~ocessed at 40 to create an update log which stores the incoll,ing data in a coll.l,lessed form according to the order received.
The i~o...i~g data is stored in a buffer memory 41 until the incoming data in log or journal folm is stored in non-volatile disk .1l. lllol~ 44.
The first step in c~eating the update log is step 45 which extracts the initial 25 dictionary from the bc~;i"ning of the first data block. This initial dictionary is used to colll~less the rem~in-l-or of the first data block in step 46. The same initial dictionary is used to co~press subse.lu~..t data blocks in their e~ e~y in steps 48 and 50. A first seek point 47 is noted at the end of the first data block and a new co.lll,ression sequenre using the initial dic~ionary stored in step 45 is used for 30 colll~ssion of the second data block in step 48. The same sequen~ e is followed for s~lbse.luent data blocks in each case using the same initial dictionary stored in step 45 to start the data col,lpl~,ssion for the data block.
The invention has been described according to a prere.l~d embodiment in a ~t~bace system but is useful in llulllc~vus other applications where data 5 col~l~lession can be incorporated. The invention is more particularly defined in the appended claims.
Claims (5)
1. A method of compressing a data stream so as to permit decompression recovery of only selected portions of the data stream, comprising the steps of:
establishing an initial compression dictionary consisting of initial data in said data stream;
compressing the data stream according to a moving window dictionary starting with said initial compression dictionary by inserting into the data stream the location and length of any exact matches with data sequences in said moving window dictionary;
establishing seek points to identify the start of data blocks in the data streamset for selective recovery decompression;
using said initial dictionary for compression starting at each seek point so that a selected one of said data blocks can be recovered by decompression starting at a seek point and using said initial compression dictionary.
establishing an initial compression dictionary consisting of initial data in said data stream;
compressing the data stream according to a moving window dictionary starting with said initial compression dictionary by inserting into the data stream the location and length of any exact matches with data sequences in said moving window dictionary;
establishing seek points to identify the start of data blocks in the data streamset for selective recovery decompression;
using said initial dictionary for compression starting at each seek point so that a selected one of said data blocks can be recovered by decompression starting at a seek point and using said initial compression dictionary.
2. A method for incremental compression of data blocks in a data stream, comprising the steps of:
establishing data blocks in the data stream for selected recovery with a seek point identifying the start of each of said data blocks;
establishing an initial dictionary consisting of initial data in the data stream;
compressing each data block according to a moving window dictionary starting with said initial dictionary; and decompression recovery of selected data blocks starting at one of said seek points and using said initial dictionary.
establishing data blocks in the data stream for selected recovery with a seek point identifying the start of each of said data blocks;
establishing an initial dictionary consisting of initial data in the data stream;
compressing each data block according to a moving window dictionary starting with said initial dictionary; and decompression recovery of selected data blocks starting at one of said seek points and using said initial dictionary.
3. A database system comprising:
means for receiving incoming data for storage in records of the database;
means for compressing said incoming data as received to provide compressed data; and means for storing said compressed data on a non-viable medium to provide a journal for said incoming data prior to processing said incoming data in the database system;
whereby, in the event of a power failure or service interruption, selected portions of said incoming data can be recovered by reading from said medium and decompressing selected portions of said stored compressed data.
means for receiving incoming data for storage in records of the database;
means for compressing said incoming data as received to provide compressed data; and means for storing said compressed data on a non-viable medium to provide a journal for said incoming data prior to processing said incoming data in the database system;
whereby, in the event of a power failure or service interruption, selected portions of said incoming data can be recovered by reading from said medium and decompressing selected portions of said stored compressed data.
4. A system according to claim 3 further comprising means for establishing an initial dictionary from the incoming data and for using a moving window dictionary starting with said initial dictionary to compress said incoming data.
5. A system according to claim 4 further comprising means for establishing seek points identifying the location of different data blocks and wherein said means for compressing incoming data compresses data using the same initial dictionary at each such point.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US607,987 | 1996-03-04 | ||
US08/607,987 US5956504A (en) | 1996-03-04 | 1996-03-04 | Method and system for compressing a data stream in a database log so as to permit recovery of only selected portions of the data stream |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2195817A1 CA2195817A1 (en) | 1997-09-04 |
CA2195817C true CA2195817C (en) | 2000-01-11 |
Family
ID=24434558
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002195817A Expired - Fee Related CA2195817C (en) | 1996-03-04 | 1997-01-23 | System for maintaining a data base log |
Country Status (4)
Country | Link |
---|---|
US (1) | US5956504A (en) |
EP (1) | EP0795965A3 (en) |
JP (1) | JPH1027120A (en) |
CA (1) | CA2195817C (en) |
Families Citing this family (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0839352B1 (en) * | 1995-07-20 | 2002-10-16 | Novell, Inc. | Transaction log management in a disconnectable computer and network |
FI962381A (en) * | 1996-06-07 | 1997-12-08 | Nokia Telecommunications Oy | Compressing data on a communication connection |
JP3337633B2 (en) * | 1997-12-03 | 2002-10-21 | 富士通株式会社 | Data compression method and data decompression method, and computer-readable recording medium recording data compression program or data decompression program |
US6052696A (en) * | 1998-04-27 | 2000-04-18 | International Business Machines Corporation | Adaptive time-based journal bundling |
US6473776B2 (en) * | 1999-04-16 | 2002-10-29 | International Business Machines Corporation | Automatic prunning for log-based replication |
US6834290B1 (en) | 1999-11-15 | 2004-12-21 | Quest Software, Inc. | System and method for developing a cost-effective reorganization plan for data reorganization |
US7805423B1 (en) | 1999-11-15 | 2010-09-28 | Quest Software, Inc. | System and method for quiescing select data modification operations against an object of a database during one or more structural operations |
US7065538B2 (en) * | 2000-02-11 | 2006-06-20 | Quest Software, Inc. | System and method for reconciling transactions between a replication system and a recovered database |
US7231391B2 (en) * | 2001-02-06 | 2007-06-12 | Quest Software, Inc. | Loosely coupled database clusters with client connection fail-over |
US20020129146A1 (en) | 2001-02-06 | 2002-09-12 | Eyal Aronoff | Highly available database clusters that move client connections between hosts |
US20040162836A1 (en) * | 2001-02-06 | 2004-08-19 | Eyal Aronoff | System and method for altering database requests and database responses |
AU2003205899A1 (en) * | 2002-02-21 | 2003-09-09 | British Telecommunications Public Limited Company | Video processing |
EP1526541A1 (en) * | 2003-10-23 | 2005-04-27 | Deutsche Thomson-Brandt Gmbh | Method and apparatus for recording data and recovering recorded data |
US7447710B2 (en) * | 2003-12-11 | 2008-11-04 | Sybase, Inc. | Database system providing self-tuned parallel database recovery |
US8769311B2 (en) | 2006-05-31 | 2014-07-01 | International Business Machines Corporation | Systems and methods for transformation of logical data objects for storage |
WO2007138602A2 (en) * | 2006-05-31 | 2007-12-06 | Storwize Ltd. | Method and system for transformation of logical data objects for storage |
JP4888566B2 (en) * | 2007-10-30 | 2012-02-29 | 日本電気株式会社 | Data compression method |
US8060476B1 (en) | 2008-07-14 | 2011-11-15 | Quest Software, Inc. | Backup systems and methods for a virtual computing environment |
US8135930B1 (en) | 2008-07-14 | 2012-03-13 | Vizioncore, Inc. | Replication systems and methods for a virtual computing environment |
US8046550B2 (en) * | 2008-07-14 | 2011-10-25 | Quest Software, Inc. | Systems and methods for performing backup operations of virtual machine files |
US8429649B1 (en) | 2008-09-25 | 2013-04-23 | Quest Software, Inc. | Systems and methods for data management in a virtual computing environment |
US8996468B1 (en) | 2009-04-17 | 2015-03-31 | Dell Software Inc. | Block status mapping system for reducing virtual machine backup storage |
US8179291B2 (en) * | 2009-05-04 | 2012-05-15 | International Business Machines Corporation | Method and system for compression of logical data objects for storage |
US9778946B2 (en) | 2009-08-07 | 2017-10-03 | Dell Software Inc. | Optimized copy of virtual machine storage files |
US8453145B1 (en) | 2010-05-06 | 2013-05-28 | Quest Software, Inc. | Systems and methods for instant provisioning of virtual machine files |
US9569446B1 (en) | 2010-06-08 | 2017-02-14 | Dell Software Inc. | Cataloging system for image-based backup |
US9547562B1 (en) | 2010-08-11 | 2017-01-17 | Dell Software Inc. | Boot restore system for rapidly restoring virtual machine backups |
US8898114B1 (en) | 2010-08-27 | 2014-11-25 | Dell Software Inc. | Multitier deduplication systems and methods |
US8788712B2 (en) | 2012-01-06 | 2014-07-22 | International Business Machines Corporation | Compression block input/output reduction |
US9311375B1 (en) | 2012-02-07 | 2016-04-12 | Dell Software Inc. | Systems and methods for compacting a virtual machine file |
US9237014B2 (en) | 2013-05-28 | 2016-01-12 | Hong Kong Applied Science & Technology Research Institute Company, Limited | Partial CipherText updates using variable-length segments delineated by pattern matching and encrypted by fixed-length blocks |
US9727244B2 (en) | 2015-10-05 | 2017-08-08 | International Business Machines Corporation | Expanding effective storage capacity of a data storage system while providing support for address mapping recovery |
US10382056B2 (en) | 2015-11-10 | 2019-08-13 | International Business Machines Corporation | Fast evaluation of predicates against compressed data |
US9588694B1 (en) | 2016-01-21 | 2017-03-07 | International Business Machines Corporation | Storage device optimization |
CN117271469B (en) * | 2023-11-20 | 2024-02-02 | 新风光电子科技股份有限公司 | Energy storage data distributed storage method of energy storage power station |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4507751A (en) * | 1982-06-21 | 1985-03-26 | International Business Machines Corporation | Method and apparatus for logging journal data using a log write ahead data set |
US5065311A (en) * | 1987-04-20 | 1991-11-12 | Hitachi, Ltd. | Distributed data base system of composite subsystem type, and method fault recovery for the system |
JP2667039B2 (en) * | 1990-05-18 | 1997-10-22 | 株式会社東芝 | Data management system and data management method |
US5333313A (en) * | 1990-10-22 | 1994-07-26 | Franklin Electronic Publishers, Incorporated | Method and apparatus for compressing a dictionary database by partitioning a master dictionary database into a plurality of functional parts and applying an optimum compression technique to each part |
US5280611A (en) * | 1991-11-08 | 1994-01-18 | International Business Machines Corporation | Method for managing database recovery from failure of a shared store in a system including a plurality of transaction-based systems of the write-ahead logging type |
US5499367A (en) * | 1991-11-15 | 1996-03-12 | Oracle Corporation | System for database integrity with multiple logs assigned to client subsets |
US5555371A (en) * | 1992-12-17 | 1996-09-10 | International Business Machines Corporation | Data backup copying with delayed directory updating and reduced numbers of DASD accesses at a back up site using a log structured array data storage |
US5581750A (en) * | 1993-03-15 | 1996-12-03 | International Business Machines Corporation | System and method for improving data recovery performance |
US5530645A (en) * | 1993-06-30 | 1996-06-25 | Apple Computer, Inc. | Composite dictionary compression system |
US5574906A (en) * | 1994-10-24 | 1996-11-12 | International Business Machines Corporation | System and method for reducing storage requirement in backup subsystems utilizing segmented compression and differencing |
-
1996
- 1996-03-04 US US08/607,987 patent/US5956504A/en not_active Expired - Lifetime
-
1997
- 1997-01-23 CA CA002195817A patent/CA2195817C/en not_active Expired - Fee Related
- 1997-02-25 EP EP97301212A patent/EP0795965A3/en not_active Withdrawn
- 1997-03-04 JP JP9048337A patent/JPH1027120A/en active Pending
Also Published As
Publication number | Publication date |
---|---|
US5956504A (en) | 1999-09-21 |
EP0795965A3 (en) | 2000-09-27 |
CA2195817A1 (en) | 1997-09-04 |
EP0795965A2 (en) | 1997-09-17 |
JPH1027120A (en) | 1998-01-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2195817C (en) | System for maintaining a data base log | |
EP0794530B1 (en) | System for compression and buffering of a data stream | |
US6657565B2 (en) | Method and system for improving lossless compression efficiency | |
US6542906B2 (en) | Method of and an apparatus for merging a sequence of delta files | |
US5530645A (en) | Composite dictionary compression system | |
EP0439569B1 (en) | Apparatus for run and string data compression | |
US6309424B1 (en) | Content independent data compression method and system | |
KR100602394B1 (en) | Dual mode data compression for operating code | |
US5473326A (en) | High speed lossless data compression method and apparatus using side-by-side sliding window dictionary and byte-matching adaptive dictionary | |
US5933104A (en) | Method and system for compression and decompression using variable-sized offset and length fields | |
EP0878914A3 (en) | Data compression method and apparatus | |
EP0797158A3 (en) | Document managing apparatus, data compressing method, and data decompressing method | |
CN105846825A (en) | Compression method, decompression method, compression device and decompresssion device | |
EP0903865A1 (en) | Method and apparatus for compressing data | |
US6225922B1 (en) | System and method for compressing data using adaptive field encoding | |
US7379940B1 (en) | Focal point compression method and apparatus | |
US5394143A (en) | Run-length compression of index keys | |
JPH04359315A (en) | Data compression controller and data restoration controller | |
JP3236152B2 (en) | File compression processor | |
US8244677B2 (en) | Focal point compression method and apparatus | |
US20080001790A1 (en) | Method and system for enhancing data compression | |
JPH07135471A (en) | Data compressor and data expander | |
JPH06274279A (en) | Mirrored disk and its control system | |
JPH01314430A (en) | Single item encoding of start, step and stop for compacting data | |
US7146054B2 (en) | Method of digital image data compression and decompression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
MKLA | Lapsed |