US20070276789A1 - Methods and apparatus for conversion of content - Google Patents

Methods and apparatus for conversion of content Download PDF

Info

Publication number
US20070276789A1
US20070276789A1 US11/438,770 US43877006A US2007276789A1 US 20070276789 A1 US20070276789 A1 US 20070276789A1 US 43877006 A US43877006 A US 43877006A US 2007276789 A1 US2007276789 A1 US 2007276789A1
Authority
US
United States
Prior art keywords
storage system
content
computer
stored
format
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/438,770
Inventor
Kaleb Keithley
Jiri Schindler
Jonathan B. Hall
Michael Kilian
Stephen J. Todd
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
EMC Corp
Original Assignee
EMC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by EMC Corp filed Critical EMC Corp
Priority to US11/438,770 priority Critical patent/US20070276789A1/en
Assigned to EMC CORPORATION reassignment EMC CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HALL, JONATHAN B., KEITHLEY, KALEB S., KILIAN, MICHAEL, SCHINDLER, JIRI, TODD, STEPHEN J.
Priority to PCT/US2007/012115 priority patent/WO2007139757A2/en
Publication of US20070276789A1 publication Critical patent/US20070276789A1/en
Priority to US12/804,349 priority patent/US8489559B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Definitions

  • the present invention relates to the conversion of content.
  • Virtually all computer application programs rely on storage that may be used to store computer code and data manipulated by the computer code.
  • a typical computer system includes one or more host computers that execute such application programs and one or more storage systems that provide storage.
  • the host computers may access data by sending access requests to the one or more storage systems.
  • Some storage systems require that the access requests identify units of data to be accessed using logical volume and block addresses.
  • Such storage systems are known as “block I/O” storage systems.
  • block I/O storage systems.
  • the logical volumes presented by the storage system to the host may not map in a one-to-one manner to physical storage devices, they are perceived by the host as corresponding to physical storage devices, and the specification of a logical volume and block address indicates where the referenced data is physically stored within the storage system.
  • some storage systems receive and process access requests that identify a data or other content unit using an object identifier, rather than an address that specifies where the data unit is physically or logically stored in the storage system.
  • object addressable storage a content unit may be identified (e.g., by host computers requesting access to the content unit) using its object identifier and the object identifier may be independent of the physical or logical location at which the content unit is stored (although it is not required to be).
  • the object identifier does not control where the content unit is stored.
  • the identifier by which host computers access the unit of content may remain the same.
  • host computers accessing the unit of content may need to be made aware of the location change and then use the new location of the content unit of content for future accesses.
  • an OAS system is a content addressable storage (CAS) system.
  • the object identifiers that identify content units are content addresses.
  • a content address is an identifier that is computed, at least in part, from at least a portion of the content of its corresponding unit of content, which can be data and/or metadata.
  • a content address for a unit of content may be computed by hashing the unit of content and using the resulting hash value as the content address.
  • Storage systems that identify content by a content address are termed content addressable storage (CAS) systems.
  • One embodiment of the invention is directed to a method for use in a computer system comprising at least one storage system and at least one host computer that is coupled to the at least one storage system and executes an application program that writes a plurality of content units to the at least one storage system, wherein the at least one storage system stores the plurality of content units in a first stored format.
  • the method comprises: (A) executing on at least one computer other than the at least one host computer at least one utility that reads at least some of the plurality of content units and stores the at least some of the plurality of content units on the at least one storage system in a second stored format that is different from the first stored format.
  • Another embodiment of the invention is directed to a method for use in a computer system comprising at least one storage system and at least one host computer that is coupled to the at least one storage system and executes an application program that writes a plurality of content units to the at least one storage system, wherein the at least one storage system stores the plurality of content units in a first stored format.
  • the method comprises: (A) installing on at least one computer other than the at least one host computer at least one utility that can read at least some of the plurality of content units and store the at least some of the plurality of content units on the at least one storage system in a second stored format that is different from the first stored format.
  • a further embodiment is directed to a method for use in a computer system comprising at least one storage system that stores a plurality of content units and has the ability to respond to a request to access at least one of the plurality of content units by returning the at least one of the plurality of content units in any of at least two formats comprising at least a first format and a second format that is different than the first format, wherein the at least one storage system comprises an interface that enables each of the plurality of content units to be stored on the at least one storage system, associated with an identifier and later retrieved by providing the at least one storage system with the identifier.
  • the method comprises acts of: (A) in response to at least one request to read the at least one of the plurality of content units being received at the at least one storage system, selecting one of the formats of the at least one of the plurality of content units based, at least in part, on information associated with the at least one request; and (B) providing, from the at least one storage system, the at least one of the plurality of content units in the format selected in the act (A).
  • Another embodiment is directed to a storage system comprising: at least one storage device to store a plurality of content units written to the at least one storage system; and at least one processor programmed to; provide an interface to the at least one storage system that enables each of the plurality of content units to be stored on the at least one storage system, associated with an identifier and later retrieved by providing the at least one storage system with the identifier, wherein each of the plurality of content units is stored in a stored format; and in response to at least one request to read the at least one of the plurality of content units being received at the at least one storage system, select one of the formats of the at least one of the plurality of content units based, at least in part, on information associated with the at least one request and provide the at least one of the plurality of content units in the format selected.
  • a further embodiment is directed to a method for use in a computer system comprising at least one storage system that stores a plurality of content units and comprises an interface that enables each of the plurality of content units to be stored on the at least one storage system, associated with an identifier and later retrieved by providing the at least one storage system with the identifier.
  • the method comprises acts of: installing on at least one storage system at least one utility that provides the at least one storage system with the ability to perform acts of: (A) in response to at least one request to read at least one of the plurality of content units being received at the at least one storage system, selecting any of at least two formats, comprising at least a first format and a second format that is different than the first format, in which the at least one of the plurality of content units can be provided, the act of selecting being based, at least in part, on information associated with the at least one request; and (B) providing, from the at least one storage system, the at least one of the plurality of content units in the format selected in the act (A).
  • Another embodiment is directed to a method for use in a computer system comprising at least one storage system that stores a plurality of content units and has the ability to respond to a request to access at least one of the plurality of content units by returning the at least one of the plurality of content units in any of a plurality of formats comprising at least a first format and a second format that is different than the first format, wherein the at least one storage system comprises an interface that enables each of the plurality of content units to be stored on the at least one storage system, associated with an identifier and later retrieved by providing the at least one storage system with the identifier.
  • the method comprises acts of: (A) providing from the at least one host computer, in association with at least one request to read at least one of the plurality of content units, information that enables the at least one storage system to select one of the plurality of formats in which to provide the at least one of the plurality of content units to the at least one host computer in response to the at least one request.
  • a further embodiment is directed to a method for use in a computer system comprising at least one storage system that stores a plurality of content units and has the ability to respond to a request to access at least one of the plurality of content units by returning the at least one of the plurality of content units in any of a plurality of formats comprising at least a first format and a second format that is different than the first format, wherein the at least one storage system comprises an interface that enables each of the plurality of content units to be stored on the at least one storage system, associated with an identifier and later retrieved by providing the at least one storage system with the identifier.
  • the method comprises an act of: (A) creating mapping information that specifies, based on at least some information associated with at least one request to access the at least one content unit, which of the at least two of the formats of the at least one content unit should be provided in response to the request; and (B) storing the mapping information on the computer system.
  • FIG. 1 is a block diagram of a computer system in which content units may be converted to different data formats by a storage system, in accordance with one embodiment of the invention
  • FIG. 2 is a block diagram of a computer system in which content units may be converted to different data formats by a utility computer, in accordance with one embodiment of the invention
  • FIG. 3 is a diagram of a content unit being converted to a different data format, in accordance with one embodiment of the invention.
  • FIG. 4 is a block diagram of multiple versions of a content unit being stored on a storage system that stores content in a CDF/blob arrangement, in accordance with one embodiment of the invention
  • FIG. 5 is a block diagram of a computer system on which an application program executing on a host computer communicates with a storage system via an application programming interface (API);
  • API application programming interface
  • FIG. 6 is a flow chart of a process by which a content unit may be converted to a different data format in response to a read request for the content unit, in accordance with one embodiment of the invention
  • FIG. 7 is a flow chart of a process by which a content unit may be converted to a different data format in place, in accordance with one embodiment of the invention.
  • FIG. 8 is a flow chart of a process to determine which of multiple versions of a content unit may be returned in response to a read request for the content unit, in accordance with one embodiment of the invention.
  • Application programs frequently generate content in a format that is understandable to the application program.
  • word processing application programs generate content units (e.g., documents) that have a specific data format, such that the word processing application program may read the content unit and properly display the document to a user.
  • content units e.g., documents
  • Microsoft WordTM may store content units in one data format
  • Corel WordPerfect may store content units in another data format.
  • a user may wish to read, using an application program, content units stored on a storage system that were created by a different application program or an earlier version of the application program that stored the content unit in a different format. The user's application program may not be capable of reading these content units when they are stored in a format that is foreign to the application program.
  • a content unit stored on a storage system be returned to a host computer requesting the content unit in a data format different from that in which it was stored (e.g., a format desired by the requesting host computer).
  • reasons for performing such a conversion before the data is returned to the host may include reducing the amount of content that is transferred from the storage box to the host computer and/or to conserve processing resources on the host computer (i.e., so that the host computer need not perform the data conversion).
  • a content unit may include an image stored in TIFF format, which is a lossless format (i.e., there is no image quality loss due to compression).
  • a host computer accessing the content unit may desire that the content unit be returned in JPEG format, a lossy data format, so that the amount of content transferred from the storage system is reduced and/or so that the host computer need not convert the TIFF image to a JPEG image using its own processing resources.
  • the “host computer” requesting the content unit may be a cellular phone with limited resources to process and display the content of the content unit.
  • the storage system may return the content unit in a “stripped down” format so that the content unit is more easily processed by the cellular phone.
  • a storage system may be populated with content units that are stored in a particular data format (e.g., by a legacy version of an application program).
  • a user may desire to read these content units using an application program (e.g., a more recent version of the legacy application program) that is incapable of processing the content units in the data format in which they are stored.
  • the content units may be converted to a data format in which the user's application program is capable of reading them.
  • Such conversion may take place on any suitable computer in the computer system, such as, for example, the storage system on which the content units are stored, on a utility computer in the computer system, or one the host computer on which the application program executes.
  • a utility that executes on the storage system or a utility computer may convert content units stored in a particular data format to a different data format.
  • the utility may be installed by a system administrator and may be configured to locate content units in a first data format, convert the content units to a second data format, and store the content units in the second data format.
  • a utility may convert content units stored on the storage system in a first data format to a second data format “on the fly,” i.e., in response to a read request for a content unit stored in the second data format.
  • an application program executing on a host computer may send an access request for a content unit stored on the storage system in a data format that is foreign to the application program.
  • the utility may convert the content unit to a data format in which the application program is capable of reading the content unit and the converted content unit may be returned in response to the request.
  • the decision of which version of a content unit to return in response to a read request for the content unit may be made in any suitable way.
  • the information may be provided in an access request for a content unit that enables a utility to select a format in which to provide the content unit in response to the request.
  • the content unit may be returned from the storage system in the selected format.
  • mapping information may be created that maps the information provided in the access request to a particular data format. This mapping information may be used to select a data format in which to return a stored content unit when an access request for the content unit is received.
  • the metadata may be stored on any suitable computer in the computer system, such as, for example, the storage system on which the requested content unit is stored, a utility computer, and/or the host computer on which the application program that originated the request executes.
  • the conversion of a content unit from one format to another format may be performed in any suitable way, and at any suitable time, as the invention is not limited in these respects.
  • a content unit may be converted from the format in which it is stored to a different format in response to a request to access the stored content unit.
  • computer system 100 includes a storage system 101 and a host computer 103 .
  • Storage system 101 may include storage devices 105 and a content unit converter 107 .
  • Storage system 101 may be any suitable type of storage system, such as for example, a block I/O storage system, an object addressable storage (OAS) system, and/or any other type of storage system.
  • Host computer 103 may be any suitable type of computer, such as, for example, a server, a personal computer, and/or any other type of computer.
  • Host computer 103 may send a request to access a content unit stored on storage system 101 .
  • storage system 101 may access the content unit from one or more of storage devices 105 , use content unit converter 107 to convert the content unit to a data format different from the format in which it is stored, and return the converted content unit to host computer 103 .
  • the storage system itself converts a requested content unit to a data format different from its stored format prior to returning the content unit in response to the request.
  • the invention is not limited in this respect, as the conversion of the content unit may be performed by a computer outside the storage system. This may be done in any suitable way, as the invention is not limited in this respect.
  • computer system 200 includes a storage system 203 , a utility computer 201 , and a host computer 103 .
  • Utility computer 201 has a content unit converter 105 that converts content units from one data format to a different data format.
  • host computer 103 may send an access request for a content unit.
  • the request may be sent to utility computer 201 , and utility computer 201 may forward the access request to storage system 203 .
  • the request may be directed to the storage system, but may be intercepted by the utility computer and then forwarded from the utility computer to the storage system so that it appears to the storage system that request originated from the utility computer.
  • the storage system may return the content unit, in the format in which it is stored (or any other format), to utility computer 201 .
  • Utility computer 201 may receive the requested content unit from the storage system, convert the content unit to a different data format (e.g., using content unit converter 105 ), and may return the converted content unit to host computer 103 .
  • host computer 103 provides an access request directly to utility computer 201 , which forwards the access request to the storage system.
  • the invention is not limited in this respect, as host computer 103 may bypass utility computer 201 and provide the access request to the storage system.
  • the storage system may provide the requested content unit to utility computer 201 for conversion and utility computer 201 may either return the converted content unit to host computer 103 or may return the converted content unit to the storage system, which may return the converted content unit to host computer 103 .
  • Utility computer 201 may be any suitable type of computer, as the invention is not limited in this respect.
  • utility computer 201 may be a server, a personal computer, an appliance, a network switch, or any other suitable type of computer.
  • utility computer 201 may be a computer that implements a CAS layer. Computers that implement a CAS layer are described in greater detail in application Ser. Nos. 10/836,415, 10/837,311, and 10/836,502, listed below in Table 1, each of which is incorporated herein by reference in its entirety.
  • Metadata 109 that specifies rules for converting a content unit from one data format to a different data format may be used. These metadata rules may be stored on any suitable computer in the computer system, such as for example a storage system, a utility computer, and/or a host computer, as indicated by the dashed lines in FIGS. 1 and 2 .
  • Content unit 301 may store patient contact information for a medical practice in a first data format.
  • the first data format includes four fields: name; date of birth; phone number; and address.
  • Content units 301 may be input to content unit converter 105 , which applies the rules specified in metadata 109 to convert content unit 301 to a different format.
  • Content unit converter 105 may output content unit 303 , which is an example of content unit 301 converted to a second data format different from the first data format
  • the second data format includes five fields: name, date of birth, phone number; address; and e-mail address.
  • Metadata may be stored that provides rules for converting a content unit in the first data format (e.g., content unit 301 ) to a content unit in the second data format (e.g., content unit 303 ).
  • the metadata may indicate that to convert a content unit from the first data format to the second data format, a fifth field for e-mail address may be added to the end of the content of the content unit and the field may be given a value of unknown.
  • the metadata that provides rules for converting a content unit from a first data format to a second data format may be stored in any suitable format, as the invention is not limited in this respect.
  • the metadata may be stored as an extensible stylesheet language (XSL) stylesheet, although other formats are possible.
  • XSL extensible stylesheet language
  • Content unit converter 105 may be implemented in any suitable way, as the invention is not limited in this respect.
  • content unit converter 105 may be a hardware or software utility.
  • the utility may be implemented as a layered software driver on the storage system, utility computer, and/or host computer, although other implementations are possible.
  • the content unit in the second data format may be returned to the requesting entity (e.g., an application program executing on host computer 103 ).
  • the content unit having the second data format may, in addition, be saved on storage system 101 , so that storage system 101 stores the content unit in both the first data format and the second data format.
  • the content unit having the second data format may not be saved to the storage system, so that only the content unit having the first data format is stored on the storage system.
  • the content unit having the second data format may be stored on the storage system and the content unit having the first data format may be deleted from the storage system, so that only the content unit having the second data format is stored on the storage system.
  • the determination whether to convert a content unit to a different format and to what format to convert a content unit may be made in any suitable way, as the invention is not limited in this respect.
  • the application program when an application program (e.g., executing on a host computer or other computer) logs in to the storage system, the application program may specify a profile. Metadata that indicates the format in which a content unit is to be returned may be associated with the profile. For example, the metadata may indicate that, for a particular profile, content units having a certain data format are to be converted to another data format.
  • an application program that is logged into the storage system under a certain profile may send an access request to the storage system that includes information identifying the application program that originated the request.
  • the storage system may determine (e.g., based on the information in the access request provided by the host computer), the identity of the application program that originated the request, determine the profile under which the application is logged in, locate the metadata associated with that profile, and determine, based on the metadata associated with the profile, the format in which to return the requested content unit.
  • information provided in the access request for the content unit may be used to determine whether to convert a content unit and to what format to convert the content unit in response to a read request for the content unit.
  • the information may directly specify the format in which the content unit is to be returned.
  • Metadata may be stored that maps the information in the access request to a particular data format.
  • the metadata may be stored on any suitable computer in the computer system, as the invention is not limited in this respect. For example, this metadata may be stored on storage system 101 , utility computer 201 , and/or host computer 103 .
  • the metadata may map any suitable information in the access request to a data format in which the content unit is to be returned in response to the access request, as the invention is not limited in this respect.
  • the access request may include format-related metadata keywords.
  • the metadata may map these format-related keywords to particular data formats of the content unit.
  • the access request may specify “word processing version 5.”
  • the metadata may map these keywords to the data format of version 5 of a particular word processing application program.
  • the content unit may be returned to the host computer in the version 5 data format in response to the request.
  • FIG. 6 shows an illustrative process by which a content unit may be converted to a different format in response to a read request, in accordance with one embodiment.
  • the process begins at act 601 , where a system administrator installs a utility that is capable of converting content units from a first data format to a second data format.
  • the administrator may also create metadata for mapping a profile and/or information in an access request to a particular data format.
  • the utility and the metadata map may be installed on any suitable computer in the computer system, such as the storage system, a utility computer, and/or a host computer.
  • the process then continues to act 603 , where the host computer sends an access request for a content unit to the storage system.
  • the access request may include any suitable information, such as, for example, the identity of an application program and/or the host that originated the request, and/or format-related metadata keywords.
  • the process continues to act 607 where the utility determines whether to convert the content unit to a different format. As discussed above, this determination may be made in any suitable way, as the invention is not limited in this respect. For example, the utility may use information in the metadata map and/or information in the access request to make the determination. If the utility determines that the content unit should be converted, the process continues to act 609 where the utility converts the content unit to the different data format and the converted content unit is returned in response to the request.
  • the process continues to act 611 where the original content unit is returned in the format in which it is stored in response to the request.
  • a content unit may be converted from a first data format to a second data in place. That is, for example, a content unit having a first data format may be stored on the storage system. The content unit may be converted to the second data format and the content unit having the second data format may be stored on the storage system, either in addition to or as a replacement for the content unit stored in original format.
  • the storage system may provide the version of the content unit in the second data format and need not convert the content unit on the fly. This may be done in any suitable way, as the invention is not limited in this respect.
  • a utility that converts content units from a first data format to a second data format may iterate over the content units stored on a storage system, locate content units in the first data format, and convert the content units to a second data format.
  • the content unit in the second data format may be in stored in addition to or instead of the content unit in the first data format.
  • the utility may be located on any suitable computer and may be implemented in any suitable way, as the invention is not limited in this respect.
  • the utility is located on a computer (e.g., the storage system or utility computer) other than the host computer that executes the application program that stored the content units on the storage system.
  • FIG. 7 is an example of a process by which a content unit may be converted in place.
  • the process begins at act 701 where an administrator installs a utility that converts content units from a first data format to a second data format.
  • the process then continues to act 703 , where the utility accesses a content unit stored on the storage system.
  • the process next continues to act 705 , where the utility determines if the accessed content unit is in the first data format.
  • the process continues to act 707 , where the utility converts the content unit to the second data format and stores the second content unit on the storage system.
  • Acts 703 - 709 may be repeated for all or some of the content units stored on the storage system.
  • the utility may be a software utility that is installed by a system administrator on storage system 101 .
  • the utility may be software utility that is installed by an administrator on utility computer 201 .
  • the software utility may request content units from storage system 101 , determine if the content units are stored in the first data format, convert content units in the first data format to the second data format, and store the content units converted to the second data format on storage system 101 .
  • the original version of the content unit i.e., the version having the first data format
  • the original version of the content unit may be deleted.
  • the original version of the content unit may be kept, so that the storage system stores both versions of the content unit.
  • storage system 101 may store at least two types of content units: blobs and content descriptor files (CDFs).
  • Content units that store metadata are referred to herein as CDFs, and may include a reference to one or more separate content units that store the data to which the metadata pertains.
  • Content units that store original (or independent) data are referred to herein as blobs.
  • CDFs may reference and store metadata for any suitable number of blobs, as the embodiments of the invention that employ blobs and CDFS are not limited in this respect.
  • a host computer may access a CDF that references the blob, determine an address (e.g., an object identifier or content address) for the blob from the reference to the blob included in the CDF, and use the address to access the blob.
  • a blob 403 stored in a first data format may be referenced by a CDF 401 .
  • the blob 403 may be converted to a blob 405 in a second data format in accordance with any of the embodiments described herein, and the blob 405 may be stored on the storage system.
  • a CDF 407 may be created that includes a reference to both blob 403 and blob 405 .
  • a reference to CDF 407 may be added to CDF 401 to signify that there is a newer version of CDF 401 that exists.
  • a host computer when a host computer attempts to access blob 403 via CDF 401 , it may determine that a newer version of CDF 401 exists (i.e., CDF 407 ). The host computer may then request access to CDF 407 and determine, based on the information in CDF 407 , that two versions of the desired blob exist (i.e., blob 403 and blob 405 ). The host computer may then request access to the version of the blob that is desired.
  • CDF 407 a newer version of CDF 401 exists
  • the host computer may then request access to CDF 407 and determine, based on the information in CDF 407 , that two versions of the desired blob exist (i.e., blob 403 and blob 405 ).
  • the host computer may then request access to the version of the blob that is desired.
  • the host computer may perform these operations in any suitable way. In one embodiment, these operations may be performed by an application programming interface (API) on the host computer, such that selection of a version of the content unit is transparent to the application program.
  • API application programming interface
  • host computer may execute an application program 501 and an API 503 .
  • Application program 501 may communicate with storage system 101 through API 503 (e.g., by calling various routines provided by API 503 ).
  • API 503 e.g., by calling various routines provided by API 503 .
  • application program 501 need not be aware of the communication protocol used by storage system 101 , but rather may use the routines provided by API 503 to communicate with storage system 101 .
  • application program 501 may call a routine of API 503 that causes API 503 to send an access request to the storage system for CDF 401 .
  • API 503 may send the request for CDF 401 to the storage system and receive the CDF in response.
  • API 503 may recognize, based on the reference to CDF 407 included in CDF 401 , that a newer version of CDF 401 exists (i.e., CDF 407 ).
  • API 503 may then request access to CDF 407 and determine, based on the information in CDF 407 , that two versions of the desired blob exist (i.e., blob 403 and blob 405 ).
  • API 503 may determine whether to return blob 403 or blob 405 to application program 501 , send an access request to storage system 101 for the desired blob, and return this blob to application program 501 .
  • blob/CDF arrangement is but one example of a way that content units may be stored and the invention is not limited to this or any other arrangement.
  • the determination of which version of the content unit to provide in response to a read request for the content unit may be made in any suitable way, as the invention is not limited in this respect.
  • FIG. 8 shows an example of a process by which one of multiple versions of a content unit may be selected and returned in response to a read request for the content unit.
  • the process begins at act 801 , where an administrator creates metadata that maps information provided in an access request to a particular data format.
  • act 803 where a host computer issues an access request for a content unit of which there are stored multiple versions in different data formats.
  • the access request may include information that may be used to select (directly or indirectly as discussed above) one of the data formats to be returned.
  • the process then continues to act 805 , where the storage system receives the access request.
  • the process next continues to act 807 , where it is determined which version to return. As discussed below in greater detail, this determination may be made by any suitable computer in the computer system and in any suitable way.
  • act 809 where the selected content unit is returned in response to the request.
  • information provided in the access request for the content unit may be used to determine which version to provide.
  • Metadata may be stored that maps the information in the access request to a particular version of the content unit.
  • the metadata may be stored on any suitable computer in the computer system, as the invention is not limited in this respect.
  • this metadata may be stored on the storage system, utility computer 201 , and/or host computer 103 , as discussed above.
  • the metadata may map any suitable information in the access request to a data format of the content unit, as the invention is not limited in this respect.
  • the access request may identify the host computer and/or application program that sent the request.
  • the metadata may map the identity of the host computer and/or application program to a particular data format and the version of the content unit in that data format may be returned.
  • the access request may include timestamp information.
  • the metadata may map the timestamp information to a particular version of the content unit.
  • the metadata may map a time range to each version of the content unit, where the beginning of the time range for a particular content unit corresponds to the date of creation of the content unit and the end of the time range corresponds to the date of creation of the subsequent version of the content unit.
  • the version of the content unit to be returned may be selected based on in which time range in the metadata map the timestamp falls.
  • the access request may include format-related metadata keywords which indicate a particular data format of the requested content unit, as discussed above.
  • the metadata may map these keywords to particular data formats of the content unit.
  • the access request may specify “word processing version 5.”
  • the metadata may map these keywords to the version of the content unit that has the data format of version 5 of a particular word processing application program.
  • the determination as to which version of a content unit should be provided to the application program may be made on any suitable computer in the computer system, such as for example, storage system 101 , utility computer 201 , and/or host computer 103 .
  • Storage system 101 may any suitable type of storage system, as the invention is not limited in this respect.
  • the storage system may be a block I/O storage system.
  • storage system 101 may be an OAS system.
  • storage system 101 may be a CAS system in which the object identifier for a content unit is a content address that is computed, at least in part, from at least a portion of the content of the content unit.
  • the host computer may verify that the content unit has not been modified or corrupted by recomputing the content address from the content of the received content unit and determining whether the recomputed content address matches the content address used to request the content unit from the storage system.
  • the host computer may not be able to verify that the content unit has not been corrupted or modified using the content address because the content address was computed from the content of the original version of the content unit.
  • the storage system may generate a new content address for the new version of the content unit.
  • the storage system may verify that the content unit has not been modified or corrupted using the content address computer for that version of the content unit.
  • the above-described embodiments of the present invention can be implemented in any of numerous ways.
  • the embodiments may be implemented using hardware, software or a combination thereof.
  • the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers.
  • any component or collection of components that perform the functions described above can be generically considered as one or more controllers that control the above-discussed functions.
  • the one or more controllers can be implemented in numerous ways, such as with dedicated hardware, or with general purpose hardware (e.g., one or more processors) that is programmed using microcode or software to perform the functions recited above.
  • one implementation of the embodiments of the present invention comprises at least one computer-readable medium (e.g., a computer memory, a floppy disk, a compact disk, a tape, etc.) encoded with a computer program (i.e., a plurality of instructions), which, when executed on a processor, performs the above-discussed functions of the embodiments of the present invention.
  • a computer-readable medium e.g., a computer memory, a floppy disk, a compact disk, a tape, etc.
  • a computer program i.e., a plurality of instructions
  • the computer-readable medium can be transportable such that the program stored thereon can be loaded onto any computer environment resource to implement the aspects of the present invention discussed herein.
  • the reference to a computer program which, when executed, performs the above-discussed functions is not limited to an application program running on a host computer. Rather, the term computer program is used herein in a generic sense to reference any type of computer code (e.g., software or microcode) that can be employed to program a processor to implement the above-discussed aspects of the present invention.
  • the computer implemented processes may, during the course of their execution, receive input manually (e.g., from a user).

Abstract

In one embodiment of the invention, a utility may be installed (e.g., by a system administrator) on a storage system. The utility may read content units on the storage system that are stored in one data format, and convert the content units to a second data format. In one embodiment, in response to a read request for a content unit, a data format in which to return the content unit may be selected and the content unit may be returned in that data format. In another embodiment, mapping information may be created that specifies in which data format a content unit should be returned in response to a request for the content unit. The mapping information may be stored either on the storage system that stores the content unit, the computer requesting access to the content unit, or any other computer or device in the computer system.

Description

    FIELD OF THE INVENTION
  • The present invention relates to the conversion of content.
  • DESCRIPTION OF THE RELATED ART
  • Virtually all computer application programs rely on storage that may be used to store computer code and data manipulated by the computer code. A typical computer system includes one or more host computers that execute such application programs and one or more storage systems that provide storage.
  • The host computers may access data by sending access requests to the one or more storage systems. Some storage systems require that the access requests identify units of data to be accessed using logical volume and block addresses. Such storage systems are known as “block I/O” storage systems. Although the logical volumes presented by the storage system to the host may not map in a one-to-one manner to physical storage devices, they are perceived by the host as corresponding to physical storage devices, and the specification of a logical volume and block address indicates where the referenced data is physically stored within the storage system.
  • In contrast to block I/O storage systems, some storage systems receive and process access requests that identify a data or other content unit using an object identifier, rather than an address that specifies where the data unit is physically or logically stored in the storage system. Such storage systems are referred to as object addressable storage (OAS) systems. In object addressable storage, a content unit may be identified (e.g., by host computers requesting access to the content unit) using its object identifier and the object identifier may be independent of the physical or logical location at which the content unit is stored (although it is not required to be). However, from the perspective of the host computer (or user) accessing a content unit on an OAS system, the object identifier does not control where the content unit is stored. Thus, in an OAS system, if the physical or logical location at which the unit of content is stored changes, the identifier by which host computers access the unit of content may remain the same. In contrast, in a block I/O storage system, if the physical or logical location at which the unit of content is stored changes, host computers accessing the unit of content may need to be made aware of the location change and then use the new location of the content unit of content for future accesses.
  • One example of an OAS system is a content addressable storage (CAS) system. In a CAS system, the object identifiers that identify content units are content addresses. A content address is an identifier that is computed, at least in part, from at least a portion of the content of its corresponding unit of content, which can be data and/or metadata. For example, a content address for a unit of content may be computed by hashing the unit of content and using the resulting hash value as the content address. Storage systems that identify content by a content address are termed content addressable storage (CAS) systems.
  • SUMMARY OF THE INVENTION
  • One embodiment of the invention is directed to a method for use in a computer system comprising at least one storage system and at least one host computer that is coupled to the at least one storage system and executes an application program that writes a plurality of content units to the at least one storage system, wherein the at least one storage system stores the plurality of content units in a first stored format. The method comprises: (A) executing on at least one computer other than the at least one host computer at least one utility that reads at least some of the plurality of content units and stores the at least some of the plurality of content units on the at least one storage system in a second stored format that is different from the first stored format.
  • Another embodiment of the invention is directed to a method for use in a computer system comprising at least one storage system and at least one host computer that is coupled to the at least one storage system and executes an application program that writes a plurality of content units to the at least one storage system, wherein the at least one storage system stores the plurality of content units in a first stored format. The method comprises: (A) installing on at least one computer other than the at least one host computer at least one utility that can read at least some of the plurality of content units and store the at least some of the plurality of content units on the at least one storage system in a second stored format that is different from the first stored format.
  • A further embodiment is directed to a method for use in a computer system comprising at least one storage system that stores a plurality of content units and has the ability to respond to a request to access at least one of the plurality of content units by returning the at least one of the plurality of content units in any of at least two formats comprising at least a first format and a second format that is different than the first format, wherein the at least one storage system comprises an interface that enables each of the plurality of content units to be stored on the at least one storage system, associated with an identifier and later retrieved by providing the at least one storage system with the identifier. The method comprises acts of: (A) in response to at least one request to read the at least one of the plurality of content units being received at the at least one storage system, selecting one of the formats of the at least one of the plurality of content units based, at least in part, on information associated with the at least one request; and (B) providing, from the at least one storage system, the at least one of the plurality of content units in the format selected in the act (A).
  • Another embodiment is directed to a storage system comprising: at least one storage device to store a plurality of content units written to the at least one storage system; and at least one processor programmed to; provide an interface to the at least one storage system that enables each of the plurality of content units to be stored on the at least one storage system, associated with an identifier and later retrieved by providing the at least one storage system with the identifier, wherein each of the plurality of content units is stored in a stored format; and in response to at least one request to read the at least one of the plurality of content units being received at the at least one storage system, select one of the formats of the at least one of the plurality of content units based, at least in part, on information associated with the at least one request and provide the at least one of the plurality of content units in the format selected.
  • A further embodiment is directed to a method for use in a computer system comprising at least one storage system that stores a plurality of content units and comprises an interface that enables each of the plurality of content units to be stored on the at least one storage system, associated with an identifier and later retrieved by providing the at least one storage system with the identifier. The method comprises acts of: installing on at least one storage system at least one utility that provides the at least one storage system with the ability to perform acts of: (A) in response to at least one request to read at least one of the plurality of content units being received at the at least one storage system, selecting any of at least two formats, comprising at least a first format and a second format that is different than the first format, in which the at least one of the plurality of content units can be provided, the act of selecting being based, at least in part, on information associated with the at least one request; and (B) providing, from the at least one storage system, the at least one of the plurality of content units in the format selected in the act (A).
  • Another embodiment is directed to a method for use in a computer system comprising at least one storage system that stores a plurality of content units and has the ability to respond to a request to access at least one of the plurality of content units by returning the at least one of the plurality of content units in any of a plurality of formats comprising at least a first format and a second format that is different than the first format, wherein the at least one storage system comprises an interface that enables each of the plurality of content units to be stored on the at least one storage system, associated with an identifier and later retrieved by providing the at least one storage system with the identifier. The method comprises acts of: (A) providing from the at least one host computer, in association with at least one request to read at least one of the plurality of content units, information that enables the at least one storage system to select one of the plurality of formats in which to provide the at least one of the plurality of content units to the at least one host computer in response to the at least one request.
  • A further embodiment is directed to a method for use in a computer system comprising at least one storage system that stores a plurality of content units and has the ability to respond to a request to access at least one of the plurality of content units by returning the at least one of the plurality of content units in any of a plurality of formats comprising at least a first format and a second format that is different than the first format, wherein the at least one storage system comprises an interface that enables each of the plurality of content units to be stored on the at least one storage system, associated with an identifier and later retrieved by providing the at least one storage system with the identifier. The method comprises an act of: (A) creating mapping information that specifies, based on at least some information associated with at least one request to access the at least one content unit, which of the at least two of the formats of the at least one content unit should be provided in response to the request; and (B) storing the mapping information on the computer system.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram of a computer system in which content units may be converted to different data formats by a storage system, in accordance with one embodiment of the invention;
  • FIG. 2 is a block diagram of a computer system in which content units may be converted to different data formats by a utility computer, in accordance with one embodiment of the invention;
  • FIG. 3 is a diagram of a content unit being converted to a different data format, in accordance with one embodiment of the invention;
  • FIG. 4 is a block diagram of multiple versions of a content unit being stored on a storage system that stores content in a CDF/blob arrangement, in accordance with one embodiment of the invention;
  • FIG. 5 is a block diagram of a computer system on which an application program executing on a host computer communicates with a storage system via an application programming interface (API);
  • FIG. 6 is a flow chart of a process by which a content unit may be converted to a different data format in response to a read request for the content unit, in accordance with one embodiment of the invention;
  • FIG. 7 is a flow chart of a process by which a content unit may be converted to a different data format in place, in accordance with one embodiment of the invention; and
  • FIG. 8 is a flow chart of a process to determine which of multiple versions of a content unit may be returned in response to a read request for the content unit, in accordance with one embodiment of the invention.
  • DETAILED DESCRIPTION
  • Application programs frequently generate content in a format that is understandable to the application program. For example, word processing application programs generate content units (e.g., documents) that have a specific data format, such that the word processing application program may read the content unit and properly display the document to a user. Thus, for example, Microsoft Word™ may store content units in one data format, while Corel WordPerfect may store content units in another data format. Sometimes, a user may wish to read, using an application program, content units stored on a storage system that were created by a different application program or an earlier version of the application program that stored the content unit in a different format. The user's application program may not be capable of reading these content units when they are stored in a format that is foreign to the application program.
  • In addition, it may be desirable for any of numerous reasons that a content unit stored on a storage system be returned to a host computer requesting the content unit in a data format different from that in which it was stored (e.g., a format desired by the requesting host computer). Examples of reasons for performing such a conversion before the data is returned to the host may include reducing the amount of content that is transferred from the storage box to the host computer and/or to conserve processing resources on the host computer (i.e., so that the host computer need not perform the data conversion). For example, a content unit may include an image stored in TIFF format, which is a lossless format (i.e., there is no image quality loss due to compression). A host computer accessing the content unit may desire that the content unit be returned in JPEG format, a lossy data format, so that the amount of content transferred from the storage system is reduced and/or so that the host computer need not convert the TIFF image to a JPEG image using its own processing resources. As another example, the “host computer” requesting the content unit may be a cellular phone with limited resources to process and display the content of the content unit. Thus, the storage system may return the content unit in a “stripped down” format so that the content unit is more easily processed by the cellular phone.
  • Some embodiments of the invention address the issues discussed above. However, it should be appreciated that not every embodiment of the invention addresses all of the above-discussed issues and some embodiments are not specifically directed to addressing any of them.
  • In one embodiment, a storage system may be populated with content units that are stored in a particular data format (e.g., by a legacy version of an application program). A user may desire to read these content units using an application program (e.g., a more recent version of the legacy application program) that is incapable of processing the content units in the data format in which they are stored. Thus, the content units may be converted to a data format in which the user's application program is capable of reading them. Such conversion may take place on any suitable computer in the computer system, such as, for example, the storage system on which the content units are stored, on a utility computer in the computer system, or one the host computer on which the application program executes.
  • In one embodiment, a utility that executes on the storage system or a utility computer may convert content units stored in a particular data format to a different data format. The utility may be installed by a system administrator and may be configured to locate content units in a first data format, convert the content units to a second data format, and store the content units in the second data format.
  • In another embodiment, a utility may convert content units stored on the storage system in a first data format to a second data format “on the fly,” i.e., in response to a read request for a content unit stored in the second data format. Thus, for example, an application program executing on a host computer may send an access request for a content unit stored on the storage system in a data format that is foreign to the application program. In response to the host computer issuing the access request, the utility may convert the content unit to a data format in which the application program is capable of reading the content unit and the converted content unit may be returned in response to the request.
  • The decision of which version of a content unit to return in response to a read request for the content unit (i.e., in embodiments in which there are multiple versions of a content unit stored on the storage system in different data formats) and/or the decision of whether to convert a content unit to a different data format in response to a read request for the content unit (i.e., in embodiments in which content units are converted on the fly) may be made in any suitable way. In one embodiment, the information may be provided in an access request for a content unit that enables a utility to select a format in which to provide the content unit in response to the request. The content unit may be returned from the storage system in the selected format.
  • In one embodiment, mapping information may be created that maps the information provided in the access request to a particular data format. This mapping information may be used to select a data format in which to return a stored content unit when an access request for the content unit is received. The metadata may be stored on any suitable computer in the computer system, such as, for example, the storage system on which the requested content unit is stored, a utility computer, and/or the host computer on which the application program that originated the request executes.
  • The conversion of a content unit from one format to another format may be performed in any suitable way, and at any suitable time, as the invention is not limited in these respects.
  • In one embodiment, a content unit may be converted from the format in which it is stored to a different format in response to a request to access the stored content unit. For example, in FIG. 1, computer system 100 includes a storage system 101 and a host computer 103. Storage system 101 may include storage devices 105 and a content unit converter 107. Storage system 101 may be any suitable type of storage system, such as for example, a block I/O storage system, an object addressable storage (OAS) system, and/or any other type of storage system. Host computer 103 may be any suitable type of computer, such as, for example, a server, a personal computer, and/or any other type of computer.
  • Host computer 103 may send a request to access a content unit stored on storage system 101. In response, storage system 101 may access the content unit from one or more of storage devices 105, use content unit converter 107 to convert the content unit to a data format different from the format in which it is stored, and return the converted content unit to host computer 103.
  • In the example above, the storage system itself converts a requested content unit to a data format different from its stored format prior to returning the content unit in response to the request. However, the invention is not limited in this respect, as the conversion of the content unit may be performed by a computer outside the storage system. This may be done in any suitable way, as the invention is not limited in this respect. For example, as shown in FIG. 2, computer system 200 includes a storage system 203, a utility computer 201, and a host computer 103. Utility computer 201 has a content unit converter 105 that converts content units from one data format to a different data format.
  • In the example of FIG. 2, host computer 103 may send an access request for a content unit. In one embodiment, the request may be sent to utility computer 201, and utility computer 201 may forward the access request to storage system 203. In another embodiment, the request may be directed to the storage system, but may be intercepted by the utility computer and then forwarded from the utility computer to the storage system so that it appears to the storage system that request originated from the utility computer.
  • The storage system may return the content unit, in the format in which it is stored (or any other format), to utility computer 201. Utility computer 201 may receive the requested content unit from the storage system, convert the content unit to a different data format (e.g., using content unit converter 105), and may return the converted content unit to host computer 103.
  • In the example of FIG. 2, host computer 103 provides an access request directly to utility computer 201, which forwards the access request to the storage system. However, the invention is not limited in this respect, as host computer 103 may bypass utility computer 201 and provide the access request to the storage system. The storage system may provide the requested content unit to utility computer 201 for conversion and utility computer 201 may either return the converted content unit to host computer 103 or may return the converted content unit to the storage system, which may return the converted content unit to host computer 103.
  • Utility computer 201 may be any suitable type of computer, as the invention is not limited in this respect. For example, utility computer 201 may be a server, a personal computer, an appliance, a network switch, or any other suitable type of computer. In one embodiment, utility computer 201 may be a computer that implements a CAS layer. Computers that implement a CAS layer are described in greater detail in application Ser. Nos. 10/836,415, 10/837,311, and 10/836,502, listed below in Table 1, each of which is incorporated herein by reference in its entirety.
  • When it is determined that a requested content unit is to be converted to a different data format, the conversion may be performed in any suitable way, as the invention is not limited in this respect. For example, in one embodiment, metadata 109 that specifies rules for converting a content unit from one data format to a different data format may be used. These metadata rules may be stored on any suitable computer in the computer system, such as for example a storage system, a utility computer, and/or a host computer, as indicated by the dashed lines in FIGS. 1 and 2.
  • An example of a format conversion is shown in FIG. 3. Content unit 301 may store patient contact information for a medical practice in a first data format. The first data format includes four fields: name; date of birth; phone number; and address. Content units 301 may be input to content unit converter 105, which applies the rules specified in metadata 109 to convert content unit 301 to a different format. Content unit converter 105 may output content unit 303, which is an example of content unit 301 converted to a second data format different from the first data format The second data format includes five fields: name, date of birth, phone number; address; and e-mail address. Metadata may be stored that provides rules for converting a content unit in the first data format (e.g., content unit 301) to a content unit in the second data format (e.g., content unit 303). For example, the metadata may indicate that to convert a content unit from the first data format to the second data format, a fifth field for e-mail address may be added to the end of the content of the content unit and the field may be given a value of unknown.
  • The metadata that provides rules for converting a content unit from a first data format to a second data format may be stored in any suitable format, as the invention is not limited in this respect. In one embodiment, the metadata may be stored as an extensible stylesheet language (XSL) stylesheet, although other formats are possible.
  • Content unit converter 105 may be implemented in any suitable way, as the invention is not limited in this respect. For example, content unit converter 105 may be a hardware or software utility. In one embodiment, the utility may be implemented as a layered software driver on the storage system, utility computer, and/or host computer, although other implementations are possible.
  • When a content unit is converted from a first data format to a second data format, the content unit in the second data format may be returned to the requesting entity (e.g., an application program executing on host computer 103). In one embodiment, the content unit having the second data format may, in addition, be saved on storage system 101, so that storage system 101 stores the content unit in both the first data format and the second data format. In another embodiment, the content unit having the second data format may not be saved to the storage system, so that only the content unit having the first data format is stored on the storage system. In yet another embodiment, the content unit having the second data format may be stored on the storage system and the content unit having the first data format may be deleted from the storage system, so that only the content unit having the second data format is stored on the storage system.
  • When a request to access a content unit is received, the determination whether to convert a content unit to a different format and to what format to convert a content unit may be made in any suitable way, as the invention is not limited in this respect.
  • For example, in one embodiment, when an application program (e.g., executing on a host computer or other computer) logs in to the storage system, the application program may specify a profile. Metadata that indicates the format in which a content unit is to be returned may be associated with the profile. For example, the metadata may indicate that, for a particular profile, content units having a certain data format are to be converted to another data format.
  • Thus, an application program that is logged into the storage system under a certain profile may send an access request to the storage system that includes information identifying the application program that originated the request. The storage system may determine (e.g., based on the information in the access request provided by the host computer), the identity of the application program that originated the request, determine the profile under which the application is logged in, locate the metadata associated with that profile, and determine, based on the metadata associated with the profile, the format in which to return the requested content unit.
  • In another embodiment, information provided in the access request for the content unit may be used to determine whether to convert a content unit and to what format to convert the content unit in response to a read request for the content unit. The information may directly specify the format in which the content unit is to be returned. Alternatively, Metadata may be stored that maps the information in the access request to a particular data format. The metadata may be stored on any suitable computer in the computer system, as the invention is not limited in this respect. For example, this metadata may be stored on storage system 101, utility computer 201, and/or host computer 103.
  • The metadata may map any suitable information in the access request to a data format in which the content unit is to be returned in response to the access request, as the invention is not limited in this respect. For example, in one embodiment, the access request may include format-related metadata keywords. The metadata may map these format-related keywords to particular data formats of the content unit. For example, the access request may specify “word processing version 5.” The metadata may map these keywords to the data format of version 5 of a particular word processing application program. Thus, in response to the request, the content unit may be returned to the host computer in the version 5 data format in response to the request.
  • FIG. 6 shows an illustrative process by which a content unit may be converted to a different format in response to a read request, in accordance with one embodiment. The process begins at act 601, where a system administrator installs a utility that is capable of converting content units from a first data format to a second data format. The administrator may also create metadata for mapping a profile and/or information in an access request to a particular data format. The utility and the metadata map may be installed on any suitable computer in the computer system, such as the storage system, a utility computer, and/or a host computer.
  • The process then continues to act 603, where the host computer sends an access request for a content unit to the storage system. The access request may include any suitable information, such as, for example, the identity of an application program and/or the host that originated the request, and/or format-related metadata keywords.
  • The process next continues to act 605, where the storage system receives the request. After the storage system receives the request, the process continues to act 607 where the utility determines whether to convert the content unit to a different format. As discussed above, this determination may be made in any suitable way, as the invention is not limited in this respect. For example, the utility may use information in the metadata map and/or information in the access request to make the determination. If the utility determines that the content unit should be converted, the process continues to act 609 where the utility converts the content unit to the different data format and the converted content unit is returned in response to the request.
  • When, at act 607, the utility determines that the content unit is not to be converted to a different format, the process continues to act 611 where the original content unit is returned in the format in which it is stored in response to the request.
  • In another embodiment of the invention, rather than convert a content unit stored on the storage system to a different format when a read request for the content unit is received, a content unit may be converted from a first data format to a second data in place. That is, for example, a content unit having a first data format may be stored on the storage system. The content unit may be converted to the second data format and the content unit having the second data format may be stored on the storage system, either in addition to or as a replacement for the content unit stored in original format. Thus, when a request for the content unit in the second data format is received, the storage system may provide the version of the content unit in the second data format and need not convert the content unit on the fly. This may be done in any suitable way, as the invention is not limited in this respect.
  • In one embodiment, a utility that converts content units from a first data format to a second data format may iterate over the content units stored on a storage system, locate content units in the first data format, and convert the content units to a second data format. The content unit in the second data format may be in stored in addition to or instead of the content unit in the first data format. The utility may be located on any suitable computer and may be implemented in any suitable way, as the invention is not limited in this respect. In one embodiment, the utility is located on a computer (e.g., the storage system or utility computer) other than the host computer that executes the application program that stored the content units on the storage system.
  • FIG. 7 is an example of a process by which a content unit may be converted in place. The process begins at act 701 where an administrator installs a utility that converts content units from a first data format to a second data format. The process then continues to act 703, where the utility accesses a content unit stored on the storage system. The process next continues to act 705, where the utility determines if the accessed content unit is in the first data format. When it is determined that the accessed content unit is in the first data format, the process continues to act 707, where the utility converts the content unit to the second data format and stores the second content unit on the storage system. When it is determined at act 705 that the accessed content unit is not in the first data format, the process continues to act 709 where the utility does not convert the content unit. Acts 703-709 may be repeated for all or some of the content units stored on the storage system.
  • In one embodiment, the utility may be a software utility that is installed by a system administrator on storage system 101. In another embodiment, the utility may be software utility that is installed by an administrator on utility computer 201. The software utility may request content units from storage system 101, determine if the content units are stored in the first data format, convert content units in the first data format to the second data format, and store the content units converted to the second data format on storage system 101.
  • In one embodiment, when the version of a content unit having the second data format is stored on the storage system, the original version of the content unit (i.e., the version having the first data format) may be deleted.
  • In another embodiment, the original version of the content unit may be kept, so that the storage system stores both versions of the content unit.
  • In one embodiment, storage system 101 may store at least two types of content units: blobs and content descriptor files (CDFs). Content units that store metadata are referred to herein as CDFs, and may include a reference to one or more separate content units that store the data to which the metadata pertains. Content units that store original (or independent) data are referred to herein as blobs. CDFs may reference and store metadata for any suitable number of blobs, as the embodiments of the invention that employ blobs and CDFS are not limited in this respect. To access a blob, a host computer may access a CDF that references the blob, determine an address (e.g., an object identifier or content address) for the blob from the reference to the blob included in the CDF, and use the address to access the blob. As shown in FIG. 4, a blob 403 stored in a first data format may be referenced by a CDF 401. The blob 403 may be converted to a blob 405 in a second data format in accordance with any of the embodiments described herein, and the blob 405 may be stored on the storage system. In addition, a CDF 407 may be created that includes a reference to both blob 403 and blob 405. Further, a reference to CDF 407 may be added to CDF 401 to signify that there is a newer version of CDF 401 that exists.
  • Thus, when a host computer attempts to access blob 403 via CDF 401, it may determine that a newer version of CDF 401 exists (i.e., CDF 407). The host computer may then request access to CDF 407 and determine, based on the information in CDF 407, that two versions of the desired blob exist (i.e., blob 403 and blob 405). The host computer may then request access to the version of the blob that is desired.
  • The host computer may perform these operations in any suitable way. In one embodiment, these operations may be performed by an application programming interface (API) on the host computer, such that selection of a version of the content unit is transparent to the application program. For example as shown in FIG. 5, host computer may execute an application program 501 and an API 503. Application program 501 may communicate with storage system 101 through API 503 (e.g., by calling various routines provided by API 503). Thus, application program 501 need not be aware of the communication protocol used by storage system 101, but rather may use the routines provided by API 503 to communicate with storage system 101.
  • Thus, for example, application program 501 may call a routine of API 503 that causes API 503 to send an access request to the storage system for CDF 401. In one embodiment, API 503 may send the request for CDF 401 to the storage system and receive the CDF in response. API 503 may recognize, based on the reference to CDF 407 included in CDF 401, that a newer version of CDF 401 exists (i.e., CDF 407). API 503 may then request access to CDF 407 and determine, based on the information in CDF 407, that two versions of the desired blob exist (i.e., blob 403 and blob 405). API 503 may determine whether to return blob 403 or blob 405 to application program 501, send an access request to storage system 101 for the desired blob, and return this blob to application program 501.
  • It should be appreciated that blob/CDF arrangement is but one example of a way that content units may be stored and the invention is not limited to this or any other arrangement.
  • In embodiments in which there are multiple versions of a content unit stored on the storage system, the determination of which version of the content unit to provide in response to a read request for the content unit may be made in any suitable way, as the invention is not limited in this respect.
  • FIG. 8 shows an example of a process by which one of multiple versions of a content unit may be selected and returned in response to a read request for the content unit. The process begins at act 801, where an administrator creates metadata that maps information provided in an access request to a particular data format. The process then continues to act 803, where a host computer issues an access request for a content unit of which there are stored multiple versions in different data formats. The access request may include information that may be used to select (directly or indirectly as discussed above) one of the data formats to be returned. The process then continues to act 805, where the storage system receives the access request. The process next continues to act 807, where it is determined which version to return. As discussed below in greater detail, this determination may be made by any suitable computer in the computer system and in any suitable way. The process then continues to act 809, where the selected content unit is returned in response to the request.
  • In one embodiment, information provided in the access request for the content unit may be used to determine which version to provide. Metadata may be stored that maps the information in the access request to a particular version of the content unit. The metadata may be stored on any suitable computer in the computer system, as the invention is not limited in this respect. For example, this metadata may be stored on the storage system, utility computer 201, and/or host computer 103, as discussed above.
  • The metadata may map any suitable information in the access request to a data format of the content unit, as the invention is not limited in this respect. For example, in one embodiment, the access request may identify the host computer and/or application program that sent the request. The metadata may map the identity of the host computer and/or application program to a particular data format and the version of the content unit in that data format may be returned.
  • In another embodiment, the access request may include timestamp information. The metadata may map the timestamp information to a particular version of the content unit. For example, the metadata may map a time range to each version of the content unit, where the beginning of the time range for a particular content unit corresponds to the date of creation of the content unit and the end of the time range corresponds to the date of creation of the subsequent version of the content unit. Thus, the version of the content unit to be returned may be selected based on in which time range in the metadata map the timestamp falls.
  • In another embodiment, the access request may include format-related metadata keywords which indicate a particular data format of the requested content unit, as discussed above. The metadata may map these keywords to particular data formats of the content unit. For example, the access request may specify “word processing version 5.” The metadata may map these keywords to the version of the content unit that has the data format of version 5 of a particular word processing application program.
  • The determination as to which version of a content unit should be provided to the application program may be made on any suitable computer in the computer system, such as for example, storage system 101, utility computer 201, and/or host computer 103.
  • Storage system 101 may any suitable type of storage system, as the invention is not limited in this respect. For example, in one embodiment, the storage system may be a block I/O storage system. In another embodiment, storage system 101 may be an OAS system.
  • In some embodiments in which storage system 101 is an OAS system, storage system 101 may be a CAS system in which the object identifier for a content unit is a content address that is computed, at least in part, from at least a portion of the content of the content unit. In embodiments in which storage system 101 is a CAS system, when a host computer receives, from the CAS system, a content unit requested using its content address, the host computer may verify that the content unit has not been modified or corrupted by recomputing the content address from the content of the received content unit and determining whether the recomputed content address matches the content address used to request the content unit from the storage system.
  • Applicants have appreciated that when the host computer receives a content unit that has been converted to a different format, the host computer may not be able to verify that the content unit has not been corrupted or modified using the content address because the content address was computed from the content of the original version of the content unit. However, when the storage system creates the new version of the content unit, the storage system may generate a new content address for the new version of the content unit. Thus, when the storage system provides the new version of the content unit to a host computer in response to an access request, the storage system may verify that the content unit has not been modified or corrupted using the content address computer for that version of the content unit.
  • The above-described embodiments of the present invention can be implemented on any suitable computer or storage system. Examples of suitable computers and/or storage systems are described in the patent applications listed below in Table 1 (collectively “the OAS applications”), each of which is incorporated herein by reference. It should be appreciated that the computers and storage systems described in these applications are only examples of computers and storage systems on which the embodiments of the present invention may be implemented, as the invention is not limited to implementation on any of these object addressable storage systems, or to object addressable storage systems at all.
  • TABLE 1
    Title Ser. No. Filing Date
    Content Addressable 09/236,366 Jan. 21, 1999
    Information, Encapsulation,
    Representation, And
    Transfer
    Access To Content 09/235,146 Jan. 21, 1999
    Addressable Data Over A
    Network
    System And Method For 09/391,360 Sep. 7, 1999
    Secure Storage Transfer
    And Retrieval Of Content
    Addressable Information
    Method And Apparatus For 10/731,790 Dec. 9, 2003
    Data Retention In A
    Storage System
    Methods And Apparatus 10/731,613 Dec. 9, 2003
    For Facilitating Access To
    Content In A Data Storage
    System
    Methods And Apparatus 10/731,796 Dec. 9, 2003
    For Caching A Location
    Index In A Data Storage
    System
    Methods And Apparatus 10/731,603 Dec. 9, 2003
    For Parsing A Content
    Address To Facilitate
    Selection Of A Physical
    Storage Location In A Data
    Storage System
    Methods And Apparatus 10/731,845 Dec. 9, 2003
    For Generating A Content
    Address To Indicate Data
    Units Written To A Storage
    System Proximate In Time
    Methods And Apparatus 10/762,044 Jan. 21, 2004
    For Modifying A Retention
    Period For Data In A
    Storage System
    Methods And Apparatus 10/761,826 Jan. 21, 2004
    For Extending A Retention
    Period For Data In A
    Storage System
    Methods And Apparatus 10/762,036 Jan. 21, 2004
    For Indirectly Identifying A
    Retention Period For Data
    In A Storage System
    Methods And Apparatus 10/762,043 Jan. 21, 2004
    For Indirectly Identifying A
    Retention Period For Data
    In A Storage System
    Methods And Apparatus 10/787,337 Feb. 26, 2004
    For Increasing Data Storage
    Capacity
    Methods And Apparatus 10/787,670 Feb. 26, 2004
    For Storing Data In A
    Storage Environment
    Methods And Apparatus 10/910,985 Aug. 4, 2004
    For Segregating A Content
    Addressable Computer
    System
    Methods And Apparatus 10/911,330 Aug. 4, 2004
    For Accessing Content In A
    Virtual Pool On A Content
    Addressable Storage
    System
    Methods and Apparatus For 10/911,248 Aug. 4, 2004
    Including Storage System
    Capability Information In
    An Access Request To A
    Content Addressable
    Storage System
    Methods And Apparatus 10/911,247 Aug. 4, 2004
    For Tracking Content
    Storage In A Content
    Addressable Storage
    System
    Methods and Apparatus For 10/911,360 Aug. 4, 2004
    Storing Information
    Identifying A Source Of A
    Content Unit Stored On A
    Content Addressable
    System
    Software System For 11/021,892 Dec. 23, 2004
    Providing Storage System
    Functionality
    Software System For 11/022,022 Dec. 23, 2004
    Providing Content
    Addressable Storage
    System Functionality
    Methods And Apparatus 11/022,077 Dec. 23, 2004
    For Providing Data
    Retention Capability Via A
    Network Attached Storage
    Device
    Methods And Apparatus 11/021,756 Dec. 23, 2004
    For Managing Storage In A
    Computer System
    Methods And Apparatus 11/021,012 Dec. 23, 2004
    For Processing Access
    Requests In A Computer
    System
    Methods And Apparatus 11/021,378 Dec. 23, 2004
    For Accessing Information
    In A Hierarchical File
    System
    Methods And Apparatus 11/034,613 Jan. 12, 2005
    For Storing A Reflection
    On A Storage System
    Method And Apparatus For 11/034,737 Jan. 12, 2005
    Modifying A Retention
    Period
    Methods And Apparatus 11/034,732 Jan. 12, 2005
    For Managing Deletion of
    Data
    Methods And Apparatus 11/107,520 Apr. 15, 2005
    For Managing The Storage
    Of Content
    Methods And Apparatus 11/107,063 Apr. 15, 2005
    For Retrieval Of Content
    Units In A Time-Based
    Directory Structure
    Methods And Apparatus 11/107,194 Apr. 15, 2005
    For Managing The
    Replication Of Content
    Methods And Apparatus 11/165,104 Jun. 23, 2005
    For Managing the Storage
    Of Content In A File
    System
    Methods And Apparatus 11/165,103 Jun. 23, 2005
    For Accessing Content
    Stored In A File System
    Methods And Apparatus 11/165,102 Jun. 23, 2005
    For Storing Content In A
    File System
    Methods And Apparatus 11/212,898 Aug. 26, 2005
    For Managing the Storage
    of Content
    Methods And Apparatus 11/213,565 Aug. 26, 2005
    For Scheduling An Action
    on a Computer
    Methods And Apparatus 11/213,233 Aug. 26, 2005
    For Deleting Content From
    A Storage System
    Method and Apparatus For 11/324,615 Jan. 3, 2006
    Managing The Storage Of
    Content
    Method and Apparatus For 11/324,639 Jan. 3, 2006
    Providing An Interface To
    A Storage System
    Methods And Apparatus 11/324,533 Jan. 3, 2006
    For Managing A File
    System On A Content
    Addressable Storage
    System
    Methods And Apparatus 11/324,637 Jan. 3, 2006
    For Creating A File System
    Methods And Apparatus 11/324,726 Jan. 3, 2006
    For Mounting A File
    System
    Methods And Apparatus 11/324,642 Jan. 3, 2006
    For Allowing Access To
    Content
    Methods And Apparatus 11/324,727 Jan. 3, 2006
    For Implementing A File
    System That Stores Files
    On A Content Addressable
    Storage System
    Methods And Apparatus 11/324,728 Jan. 3, 2006
    For Reconfiguring A
    Storage System
    Methods And Apparatus 11/324,646 Jan. 3, 2006
    For Increasing The Storage
    Capacity Of A Storage
    System
    Methods And Apparatus 11/324,644 Jan. 3, 2006
    For Accessing Content On
    A Storage System
  • The above-described embodiments of the present invention can be implemented in any of numerous ways. For example, the embodiments may be implemented using hardware, software or a combination thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers. It should be appreciated that any component or collection of components that perform the functions described above can be generically considered as one or more controllers that control the above-discussed functions. The one or more controllers can be implemented in numerous ways, such as with dedicated hardware, or with general purpose hardware (e.g., one or more processors) that is programmed using microcode or software to perform the functions recited above.
  • In this respect, it should be appreciated that one implementation of the embodiments of the present invention comprises at least one computer-readable medium (e.g., a computer memory, a floppy disk, a compact disk, a tape, etc.) encoded with a computer program (i.e., a plurality of instructions), which, when executed on a processor, performs the above-discussed functions of the embodiments of the present invention.
  • The computer-readable medium can be transportable such that the program stored thereon can be loaded onto any computer environment resource to implement the aspects of the present invention discussed herein. In addition, it should be appreciated that the reference to a computer program which, when executed, performs the above-discussed functions, is not limited to an application program running on a host computer. Rather, the term computer program is used herein in a generic sense to reference any type of computer code (e.g., software or microcode) that can be employed to program a processor to implement the above-discussed aspects of the present invention.
  • It should be appreciated that in accordance with several embodiments of the present invention wherein processes are implemented in a computer readable medium, the computer implemented processes may, during the course of their execution, receive input manually (e.g., from a user).
  • The phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising,” “having,” “containing”, “involving”, and variations thereof, is meant to encompass the items listed thereafter and additional items.
  • Having described several embodiments of the invention in detail, various modifications and improvements will readily occur to those skilled in the art. Such modifications and improvements are intended to be within the spirit and scope of the invention. Accordingly, the foregoing description is by way of example only, and is not intended as limiting. The invention is limited only as defined by the following claims and the equivalents thereto.

Claims (24)

1. A method for use in a computer system comprising at least one storage system and at least one host computer that is coupled to the at least one storage system and executes an application program that writes a plurality of content units to the at least one storage system, wherein the at least one storage system stores the plurality of content units in a first stored format, the method comprising:
(A) executing on at least one computer other than the at least one host computer at least one utility that reads at least some of the plurality of content units and stores the at least some of the plurality of content units on the at least one storage system in a second stored format that is different from the first stored format.
2. The method of claim 1, wherein the act (A) comprises an act of executing the at least one utility on the at least one storage system.
3. The method of claim 1, wherein the computer system further comprises at least one component that couples the at least one host to the at least one storage system, and wherein the act (A) comprises an act of executing the at least one utility on the at least one component.
4. The method of claim 1, wherein the act (A) comprises an act of executing the at least one utility on at least one appliance that is separate from the at least one storage system and the at least one host computer.
5. The method of claim 1, wherein the at least one utility replaces the at least some of the plurality of content units stored in the first stored format with the at least some of the plurality of content units stored in the second stored format.
6. The method of claim 1, wherein the application program is a first application program, wherein the second stored format is compatible with a second application program, and wherein the first format is incompatible with the second application program.
7. At least one computer readable medium encoded with instructions that, when executed on a computer system comprising at least one storage system and at least one host computer that is coupled to the at least one storage system and executes an application program that writes a plurality of content units to the at least one storage system, wherein the at least one storage system stores the plurality of content units in a first stored format, performs a method comprising:
(A) executing on at least one computer other than the at least one host computer at least one utility that reads at least some of the plurality of content units and stores the at least some of the plurality of content units on the at least one storage system in a second stored format that is different from the first stored format.
8. The at least one computer readable medium of claim 7, wherein the act (A) comprises an act of executing the at least one utility on the at least one storage system.
9. The at least one computer readable medium of claim 7, wherein the computer system further comprises at least one component that couples the at least one host to the at least one storage system, and wherein the act (A) comprises an act of executing the at least one utility on the at least one component.
10. The at least one computer readable medium of claim 7, wherein the act (A) comprises an act of executing the at least one utility on at least one appliance that is separate from the at least one storage system and the at least one host computer.
11. The at least one computer readable medium of claim 7, wherein the least one utility stores the at least some of the plurality of content units in the second stored format without replacing the at least some of the plurality of content units stored in the first stored format.
12. The at least one computer readable medium of claim 7, wherein the application program is a first application program, wherein the second stored format is compatible with a second application program, and wherein the first format is incompatible with the second application program.
13. In a computer system, at least one storage system comprising:
at least one network interface that couples the at least one storage system to at least one host computer in the computer system, wherein the at least one host computer is capable of executing an application program that writes a plurality of content units to the at least one storage system;
at least one storage device that stores the plurality of content units in a first stored format; and
at least one controller that executes at least one utility that reads at least some of the plurality of content units and stores the at least some of the plurality of content units on the at least one storage system in a second stored format that is different from the first stored format.
14. The at least one storage system of claim 13, wherein the at least one utility replaces the at least some of the plurality of content units stored in the first stored format with the at least some of the plurality of content units stored in the second stored format.
15. The at least one storage system of claim 13, wherein the least one utility stores the at least some of the plurality of content units in the second stored format without replacing the at least some of the plurality of content units stored in the first stored format.
16. The at least one storage system of claim 13, wherein the application program is a first application program, and wherein the second stored format is compatible with a second application program.
17. The at least one storage system of claim 16, wherein the first and second application programs are different versions of an application program that performs substantially the same functions.
18. The at least one storage system of claim 16, wherein the second format is incompatible with the first application program.
19. A method for use in a computer system comprising at least one storage system and at least one host computer that is coupled to the at least one storage system and executes an application program that writes a plurality of content units to the at least one storage system, wherein the at least one storage system stores the plurality of content units in a first stored format, the method comprising:
(A) installing on at least one computer other than the at least one host computer at least one utility that can read at least some of the plurality of content units and store the at least some of the plurality of content units on the at least one storage system in a second stored format that is different from the first stored format.
20. The method of claim 19, wherein the act (A) comprises an act of installing the at least one utility on the at least one storage system.
21. The method of claim 19, wherein the computer system further comprises at least one component that couples the at least one host to the at least one storage system, and wherein the act (A) comprises an act of installing the at least one utility on the at least one component.
22. The method of claim 19, wherein the act (A) comprises an act of installing the at least one utility on at least one appliance that is separate from the at least one storage system and the at least one host computer.
23. The method of claim 19, wherein the at least one utility replaces the at least some of the plurality of content units stored in the first stored format with the at least some of the plurality of content units stored in the second stored format.
24. The method of claim 19, wherein the application program is a first application program, wherein the second stored format is compatible with a second application program, and wherein the first format is incompatible with the second application program.
US11/438,770 2004-08-04 2006-05-23 Methods and apparatus for conversion of content Abandoned US20070276789A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US11/438,770 US20070276789A1 (en) 2006-05-23 2006-05-23 Methods and apparatus for conversion of content
PCT/US2007/012115 WO2007139757A2 (en) 2006-05-23 2007-05-22 Methods and apparatus for conversion of content
US12/804,349 US8489559B2 (en) 2004-08-04 2010-07-20 Methods and apparatus for conversion of content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/438,770 US20070276789A1 (en) 2006-05-23 2006-05-23 Methods and apparatus for conversion of content

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/911,330 Continuation-In-Part US8069269B2 (en) 2004-08-04 2004-08-04 Methods and apparatus for accessing content in a virtual pool on a content addressable storage system

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/804,349 Continuation US8489559B2 (en) 2004-08-04 2010-07-20 Methods and apparatus for conversion of content

Publications (1)

Publication Number Publication Date
US20070276789A1 true US20070276789A1 (en) 2007-11-29

Family

ID=38670533

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/438,770 Abandoned US20070276789A1 (en) 2004-08-04 2006-05-23 Methods and apparatus for conversion of content
US12/804,349 Active 2026-04-03 US8489559B2 (en) 2004-08-04 2010-07-20 Methods and apparatus for conversion of content

Family Applications After (1)

Application Number Title Priority Date Filing Date
US12/804,349 Active 2026-04-03 US8489559B2 (en) 2004-08-04 2010-07-20 Methods and apparatus for conversion of content

Country Status (2)

Country Link
US (2) US20070276789A1 (en)
WO (1) WO2007139757A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080140855A1 (en) * 2006-11-23 2008-06-12 Mawell Svenska Ab Method and system for sharing data between radiology information systems
WO2009079263A1 (en) * 2007-12-14 2009-06-25 Casdex, Inc. System for logging and reporting access to content using unique content identifiers
US20110004630A1 (en) * 2009-07-02 2011-01-06 Quantum Corporation Method for reliable and efficient filesystem metadata conversion

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9934244B2 (en) * 2010-08-13 2018-04-03 At&T Intellectual Property I, L.P. System and method for file format management

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5173853A (en) * 1990-03-14 1992-12-22 Digital Equipment International Ltd. Data format conversion
US5675789A (en) * 1992-10-22 1997-10-07 Nec Corporation File compression processor monitoring current available capacity and threshold value
US6078924A (en) * 1998-01-30 2000-06-20 Aeneid Corporation Method and apparatus for performing data collection, interpretation and analysis, in an information platform
US20020194227A1 (en) * 2000-12-18 2002-12-19 Siemens Corporate Research, Inc. System for multimedia document and file processing and format conversion
US20030028447A1 (en) * 2001-04-18 2003-02-06 International Business Machines Corporation Process for data driven application integration for B2B
US6549918B1 (en) * 1998-09-21 2003-04-15 Microsoft Corporation Dynamic information format conversion
US6691116B1 (en) * 2001-10-31 2004-02-10 Storability, Inc. Method and system for data collection from remote sources
US20040049514A1 (en) * 2002-09-11 2004-03-11 Sergei Burkov System and method of searching data utilizing automatic categorization
US20040205621A1 (en) * 2002-05-28 2004-10-14 Johnson Steven C. Method and apparatus for formatting documents
US20060031653A1 (en) * 2004-08-04 2006-02-09 Emc Corporation Methods and apparatus for accessing content in a virtual pool on a content addressable storage system
US7047379B2 (en) * 2003-07-11 2006-05-16 International Business Machines Corporation Autonomic link optimization through elimination of unnecessary transfers
US20060155788A1 (en) * 2000-03-09 2006-07-13 Pkware, Inc. System and method for manipulating and managing computer archive files
US20070174486A1 (en) * 2001-05-03 2007-07-26 Holstege Mary A System and method for monitoring multiple online resources in different formats
US20070294320A1 (en) * 2006-05-10 2007-12-20 Emc Corporation Automated priority restores
US20070299891A1 (en) * 2006-06-26 2007-12-27 Bellsouth Intellectual Property Corporation Data back-up utility
US7577689B1 (en) * 2005-06-15 2009-08-18 Adobe Systems Incorporated Method and system to archive data

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5488702A (en) * 1994-04-26 1996-01-30 Unisys Corporation Data block check sequence generation and validation in a file cache system
US6173291B1 (en) * 1997-09-26 2001-01-09 Powerquest Corporation Method and apparatus for recovering data from damaged or corrupted file storage media
JP4035872B2 (en) * 1997-10-27 2008-01-23 株式会社日立製作所 File format conversion method, file system, information system and electronic commerce system using the same
US6567826B1 (en) * 2000-06-23 2003-05-20 Microsoft Corporation Method and system for repairing corrupt files and recovering data
TW571201B (en) * 2001-02-02 2004-01-11 Wistron Corp Conversion method and system for contents format of document file
US20020143794A1 (en) * 2001-03-30 2002-10-03 Helt David J. Method and system for converting data files from a first format to second format
US7263521B2 (en) * 2002-12-10 2007-08-28 Caringo, Inc. Navigation of the content space of a document set

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5173853A (en) * 1990-03-14 1992-12-22 Digital Equipment International Ltd. Data format conversion
US5675789A (en) * 1992-10-22 1997-10-07 Nec Corporation File compression processor monitoring current available capacity and threshold value
US6078924A (en) * 1998-01-30 2000-06-20 Aeneid Corporation Method and apparatus for performing data collection, interpretation and analysis, in an information platform
US6549918B1 (en) * 1998-09-21 2003-04-15 Microsoft Corporation Dynamic information format conversion
US20060155788A1 (en) * 2000-03-09 2006-07-13 Pkware, Inc. System and method for manipulating and managing computer archive files
US20020194227A1 (en) * 2000-12-18 2002-12-19 Siemens Corporate Research, Inc. System for multimedia document and file processing and format conversion
US20030028447A1 (en) * 2001-04-18 2003-02-06 International Business Machines Corporation Process for data driven application integration for B2B
US20070174486A1 (en) * 2001-05-03 2007-07-26 Holstege Mary A System and method for monitoring multiple online resources in different formats
US6691116B1 (en) * 2001-10-31 2004-02-10 Storability, Inc. Method and system for data collection from remote sources
US20040205621A1 (en) * 2002-05-28 2004-10-14 Johnson Steven C. Method and apparatus for formatting documents
US20040049514A1 (en) * 2002-09-11 2004-03-11 Sergei Burkov System and method of searching data utilizing automatic categorization
US7047379B2 (en) * 2003-07-11 2006-05-16 International Business Machines Corporation Autonomic link optimization through elimination of unnecessary transfers
US20060031653A1 (en) * 2004-08-04 2006-02-09 Emc Corporation Methods and apparatus for accessing content in a virtual pool on a content addressable storage system
US7577689B1 (en) * 2005-06-15 2009-08-18 Adobe Systems Incorporated Method and system to archive data
US20070294320A1 (en) * 2006-05-10 2007-12-20 Emc Corporation Automated priority restores
US20070299891A1 (en) * 2006-06-26 2007-12-27 Bellsouth Intellectual Property Corporation Data back-up utility

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080140855A1 (en) * 2006-11-23 2008-06-12 Mawell Svenska Ab Method and system for sharing data between radiology information systems
US8156202B2 (en) * 2006-11-23 2012-04-10 Mawell Scandinavia Ab Method and system for sharing data between radiology information systems
WO2009079263A1 (en) * 2007-12-14 2009-06-25 Casdex, Inc. System for logging and reporting access to content using unique content identifiers
US8095558B2 (en) 2007-12-14 2012-01-10 Casdex, Inc. System for logging and reporting access to content using unique content identifiers
US20110004630A1 (en) * 2009-07-02 2011-01-06 Quantum Corporation Method for reliable and efficient filesystem metadata conversion
US8190655B2 (en) * 2009-07-02 2012-05-29 Quantum Corporation Method for reliable and efficient filesystem metadata conversion
US8577939B2 (en) 2009-07-02 2013-11-05 Quantum Corporation Method for reliable and efficient filesystem metadata conversion
US10496612B2 (en) 2009-07-02 2019-12-03 Quantum Corporation Method for reliable and efficient filesystem metadata conversion

Also Published As

Publication number Publication date
WO2007139757A2 (en) 2007-12-06
WO2007139757A3 (en) 2008-01-24
US8489559B2 (en) 2013-07-16
US20100293561A1 (en) 2010-11-18

Similar Documents

Publication Publication Date Title
US9348842B2 (en) Virtualized data storage system optimizations
US8335890B1 (en) Associating an identifier with a content unit
US7765189B2 (en) Data migration apparatus, method, and program for data stored in a distributed manner
US9846700B2 (en) Compression and deduplication layered driver
US8117166B2 (en) Method and system for creating snapshots by condition
US9152600B2 (en) System and method for caching network file systems
US7293131B2 (en) Access to disk storage using file attribute information
US8904137B1 (en) Deduplication system space recycling through inode manipulation
US8583893B2 (en) Metadata management for virtual volumes
US20060174074A1 (en) Point-in-time copy operation
US8639658B1 (en) Cache management for file systems supporting shared blocks
US8990228B2 (en) Systems and methods for arbitrary data transformations
US9384201B2 (en) Method of managing data of file system using database management system
US10353636B2 (en) Write filter with dynamically expandable overlay
US8290911B1 (en) System and method for implementing data deduplication-aware copying of data
US8489559B2 (en) Methods and apparatus for conversion of content
US8909875B1 (en) Methods and apparatus for storing a new version of an object on a content addressable storage system
US9134916B1 (en) Managing content in a distributed system
JP4026698B2 (en) Disk storage device having correctable data management function

Legal Events

Date Code Title Description
AS Assignment

Owner name: EMC CORPORATION, MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KEITHLEY, KALEB S.;SCHINDLER, JIRI;HALL, JONATHAN B.;AND OTHERS;REEL/FRAME:017928/0710

Effective date: 20060426

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION