Suche Bilder Maps Play YouTube News Gmail Drive Mehr »
Erweiterte Patentsuche | Abbildungen der Seite | Webprotokoll | Anmelden

Patente

  

Illllllllllllllllllllllllllllllllllllllllllllllllll

US007281006B2

(12) United States Patent ao) Patent No.: Us 7,281,006 B2

Hsu et al. (45) Date of Patent: Oct. 9,2007

(54) SYSTEM AND METHOD FOR DIVIDING
DATA INTO PREDOMINANTLY
FIXED-SIZED CHUNKS SO THAT
DUPLICATE DATA CHUNKS MAY BE
IDENTIFIED

(75) Inventors: Windsor Wee Sun Hsu, Sunnyvale, CA (US); Shauchi Ong, San Jose, CA (US)

(73) Assignee: International Business Machines Corporation, Armonk, NY (US)

( * ) Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 U.S.C. 154(b) by 686 days.

(21) Appl. No.: 10/693,284

(22) Filed: Oct. 23, 2003

(65) Prior Publication Data

US 2005/0091234 Al Apr. 28, 2005

(51) Int. CI.

G06F17/30 (2006.01)
G06F11/00 (2006.01)

(52) U.S. CI 707/6; 707/7; 707/101;

707/102; 707/103 Z; 707/104.1; 714/6

(58) Field of Classification Search 707/200,

707/6, 7, 101, 102, 103 Z, 104.1; 711/159, 711/162; 714/6; 370/230.1,470; 726/26 See application file for complete search history.

(56) References Cited

U.S. PATENT DOCUMENTS

5,020,058 A 5/1991 Hoden et al 370/109

5,600,316 A 2/1997 Moll 341/67

5,875,477 A * 2/1999 Hasbun et al 711/162

6,038,231 A 3/2000 Dolby et al 370/394

6,535,994 Bl * 3/2003 Kedem 714/6

(Continued)

OTHER PUBLICATIONS

T. D. Moreton, I. A. Pratt, and T. L. Harris, "Storage, Mutability and Naming in Pasta," Proceedings of the International Workshop on Peer-to-Peer Computing at Networking 2002, Pisa, Italy, May 2002. A. Muthitacharoen, B. Chen, And D. Mazieres, "ALow-Bandwidth network file system", Proceedings of the 185h Operating Systems Review, (New York), pp. 174-187, ACM Press, Oct. 21-24, 2001. Andrew Tridgell, "The rsunc algoritm", Technical Report TR-CS96-05, The Australian National University, Jun. 1996.

Primary Examiner—Thuy Pardo

(74) Attorney, Agent, or Firm—Cantor Colburn LLP; Marc D. Mc Swain

[blocks in formation]

A data chunking system divides data into predominantly fixed-sized chunks such that duplicate data may be identified. The data chunking system may be used to reduce the data storage and save network bandwidth by allowing storage or transmission of primarily unique data chunks. The system may also be used to increase reliability in data storage and network transmission, by allowing an error affecting a data chunk to be repaired with an identified duplicate chunk. The data chunking system chunks data by selecting a chunk of fixed size, then moving a window along the data until a match to existing data is found. As the window moves across the data, unique chunks predominantly of fixed size are formed in the data passed over. Several embodiments provide alternate methods of determining whether a selected chunk matches existing data and methods by which the window is moved through the data. To locate duplicate data, the data chunking system remembers data by computing a mathematical function of a data chunk and inserting the computed value into a hash table.

39 Claims, 7 Drawing Sheets

[graphic]

Page 2

U.S. PATENT DOCUMENTS

6,615,209 Bl * 9/2003 Gomes et al 707/5

6,661,775 Bl* 12/2003 Nakayama et al 370/230.1

6,778,558 B2 * 8/2004 Balachandran et al 370/470

6,816,872 Bl* 11/2004 Squibb 707/200

6,978,384 Bl * 12/2005 Milliken 726/26

2002/0169934 Al* 11/2002 Krapp et al 711/159

cited by examiner

[graphic]
[graphic]
« ZurückWeiter »