« ZurückWeiter »
DATA MANAGEMENT SYSTEM WITH
SHORTCUT MIGRATION VIA EFFICIENT
AUTOMATIC RECONNECTION TO
PREVIOUSLY MIGRATED COPY
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to data management systems that use removable data storage media such as magnetic tape. More particularly, the invention includes a data man- 10 agement system that, responsive to a migration request for a particular data object, automatically invokes a shortcut migration process that finds the previously migrated copy of the data object and reconnects to the copy.
2. Description of the Related Art
With the increasing importance of electronic information today, there is a similar increase in the importance of reliable data storage. The market abounds with different means of data storage today, ranging from high-speed, more expen- 2Q sive media such as random access memory (RAM), to slower speed, less expensive products such as magnetic tape. Some advanced, "hierarchical" systems utilize multiple levels of data storage, often high-speed, direct-access storage (such as magnetic disk drive storage) for frequently used 2J data, and relatively lower-speed removable storage media (such as magnetic tape) for infrequently used data. One example of such a system is the IBM System Managed Storage product, which includes the DFSMShsm component. 30
The movement of data from disk to tape in a hierarchical storage system is called "migration." A single tape might contain hundreds or thousands of migrated datasets. When a migrated dataset is referenced by a user, the dataset is copied back onto the disk in a movement known as "recall." One 35 example of recall appears in IBM Technical Disclosure Bulletin, Vol. 26, No. 9 (February 1984), which is incorporated herein by reference. With recall, the copy left on tape is invalidated in favor of the copy recalled to disk. This usually works well, because any changes to the recalled data 40 will render the copy left on tape worthless; namely, the nature of serially accessible storage media prevents updating the tape copy to match the disk copy.
The DFSMShsm program maintains an inventory of migrated datasets, and uses this inventory to aid in the recall 45 of datasets. The DFSMShsm program also keeps a limited inventory of recalled datasets (which exist on tape but are considered invalid), but only for a brief, fixed period of time. If a recalled dataset becomes inactive, DFSMShsm software re-migrates the recalled dataset back to tape. This 50 re-migration can be time consuming because it requires copying the dataset's entire contents from disk to tape. In cases where the recalled dataset was never changed, this copying is wasted work because the originally migrated data copy (on tape) is the same as the recalled version (on disk). 55
To address this performance issue, and expedite data re-migration, various approaches have been developed to "reconnect" previously recalled datasets. Broadly, reconnection updates and recreates inventory records rather than again copying data from disk to tape, allowing fast- 60 migration of unchanged recalled data whose migration copy still exists on tape, although flagged as invalid. With one reconnection approach, known as "recall browse," the storage system reconnects datasets back to their tape versions in response to operator-issued commands. The end user must 65 issue a command for each and every dataset to be reconnected. Although this function is beneficial in certain
respects, significant user activity is required to evaluate datasets for reconnection, requiring the user to determine if the data object had ever been migrated, determine if that migration copy exists, and if the copy exists, is it identical. Furthermore, the user may be unaware of certain datasets for which reconnection is nonetheless possible. In addition, there is some danger of improperly reconnecting datasets that have changed since recall, and are therefore not suitable for reconnection.
Improving upon the recall browse feature, others developed a reconnection procedure with more automated features. With the more-automated reconnection feature, software supplements the migration process by automatically considering the possibility of reconnecting data. This approach provides the advantage of greater automation, since the end user does not have to manually instigate the recall process, and because more datasets can be considered for reconnection than are possible by manual user command. Although beneficial in some respects, the more-automated approach still suffers from certain limitations. Chiefly, reconnection using the more-automated approach can be time consuming because various input/output operations are required to determine whether a dataset is suitable for reconnection. For instance, time-consuming work is required to determine whether the migration copy exists, and whether it is identical to the recall (disk) copy. In many cases, these operations are wasted, such as when a dataset being considered for reconnection has never been migrated and therefore cannot possibly be a reconnection candidate. When a large number of data objects are being migrated to tape, evaluating each dataset for reconnection can delay the migration by a considerable time.
Consequently, known reconnect procedures are not completely adequate for some applications due to certain unsolved problems.
SUMMARY OF THE INVENTION
Broadly, the present invention concerns a data management system that responds to each migration request for a particular data object by automatically invoking a shortcut migration process that finds a previously migrated copy of the exact data object, if it exists, and automatically reconnects that copy. More specifically, this data management system includes a primary level of storage (such as directaccess storage) and an auxiliary level of storage (such as multiple removable data storage media). An inventory stores metadata identifying data objects contained in the auxiliary level. A catalog includes metadata identifying data objects contained in the primary level, and whether such data objects are reconnectable.
When the data management system receives "recall" requests to copy target data objects from the auxiliary to the primary level, the system performs certain recall operations for each target data object as follows. The system determines whether the target data object meets prescribed futurereconnection criteria, and if so, it updates the catalog to include an expedited access indicator associated with the target data object. The system copies the target data object from the auxiliary level to the primary level. The system also updates the inventory to invalidate the metadata identifying the target data object in the auxiliary level, thereby deactivating the target data object in auxiliary storage. The system also prepares expiration information to be used in determining when to delete the invalidated inventory metadata for the target data object.
When the data management system receives "migration" requests to copy specified data objects from the primary
level to the auxiliary level, the system performs certain migration operations for each specified data object as follows. If the catalog does not contain an expedited access indicator associated with the target data object, the system copies content of the specified data object from the primary level to the auxiliary level in a "full" migration operation. On the other hand, if the catalog contains an expedited access indicator associated with the specified data object, the system determines whether restoration of the copy of the specified data object on the auxiliary level is possible. If restoration is not possible, the system performs a full migration. On the other hand, if restoration is possible, the system updates the inventory to restore previously invalidated metadata identifying the copy on the auxiliary level as being the specified data object, instead of re-copying contents of the specified data object from the primary level.
As mentioned above, the system also prepares certain expiration information. Namely, the system establishes a prescribed expiration schedule for metadata identifying auxiliary level copies of recalled data objects based upon access history of the data object. According to this schedule, the system cleans the inventory by removing invalidated metadata. Removing invalidated metadata prevents the inventory size from continually growing. Whenever the inventory is cleaned of metadata associated with certain data objects, the catalog may be updated to clear the expedited access indicators with these data objects. As an alternative, expedited access indicators may be cleared under other circumstances indicating an unusable auxiliary level copy of recalled data. One example occurs when the recalled data object is backed up, since the backup is presumably done to preserve changes in the recalled data object on the primary level.
The foregoing features may be implemented in a number of different forms. For example, the invention may be implemented to provide a method including a shortcut migration operation achieved by efficient, automatic reconnection to previously migrated data. In another embodiment, the invention may be implemented to provide an apparatus such as a data management system, configured to perform shortcut migration according to this invention. In still another embodiment, the invention may be implemented to provide a signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital data processing apparatus to perform a shortcut migration operation according to this invention. Another embodiment concerns logic circuitry having multiple interconnected electrically conductive elements configured to perform shortcut migration as described herein.
The invention affords its users with a number of distinct advantages. Basically, the invention saves time by avoiding a full migration to auxiliary storage where possible, since a full migration of a large data object can take hours to complete. Instead of full migration, the invention performs a shortcut migration that restores a deactivated copy of data on auxiliary storage. Advantageously, the invention efficiently determines reconnect candidacy by consulting a catalog that is necessarily consulted for other reasons during reconnection anyway. From the standpoint of overhead, the shortcut migration is beneficial because it has a high likelihood of successful completion. One reason is the expedited access indicator, which helps to quickly exclude data objects for which reconnection is not possible. Also, success of reconnection is aided by preserving invalidated metadata identifying recalled data objects in auxiliary level storage according to a use-based predictive schedule, which likely preserves metadata for future reconnection if needed. As a further advantage, reconnection quickly enables the dataset
to be scratched from primary level storage, freeing the typically more expensive primary level storage for storage of other data. As still another benefit, reconnecting datasets instead of copying the datasets to another auxiliary level
5 storage media conserves media and reduces the need to clean and recycle media that become cluttered with deactivated data objects that could have been re-used through reconnection. This invention also provides a number of other advantages and benefits, which should be apparent from the
10 following description of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of the hardware components and interconnections of a data management system, accord15 ing to the invention.
FIG. 2 is a block diagram of a digital data processing machine according to the invention.
FIG. 3 shows an exemplary signal-bearing medium 20 according to the invention.
FIG. 4 is a flowchart of a recall procedure according to the invention.
FIG. 5 is a flowchart of a data analysis and migration procedure according to the invention. 25 FIG. 6 is a flowchart of a cleanup procedure according to the invention.
The nature, objectives, and advantages of the invention 30 will become more apparent to those skilled in the art after considering the following detailed description in connection with the accompanying drawings.
Hardware Components & Interconnections
One aspect of the invention concerns a data management system, which may be embodied by various hardware components and interconnections, with one example being described in FIG. 1. The data management system 100
40 includes applications 103-105, a subsystem facility 102, operator interface 109, auxiliary level storage 130, and primary level storage 132. Applications
The applications 103-105 comprise software programs,
45 computer workstations, servers, personal computers, mainframe computers, manually activated operator terminals, or other host processes. In one example, the applications 103-105 represent customers' application programs that utilize storage managed by the subsystem facility 102.
50 The applications 103-105 communicate with the subsystem facility 102 via one or more interfaces, depicted as the interface 106. The interface 106 provides one or more communications links between the applications 103-105 and a central processing unit (CPU) 108. The interface 106
55 may utilize wires, busses, backplanes, wireless links, intelligent communications channels, shared memory, computer networks, or other communications links. Subsystem—CPU
The CPU 108 comprises computer-driven equipment
60 capable of managing operations of the storage levels 130, 132. The CPU 108 may be implemented by a variety of different hardware devices, such as a personal computer, server, computer workstation, mainframe computer, etc. Furthermore, the CPU 108 may even share common hard
65 ware with one or more of the applications 103-105.
As illustrated, the CPU 108 includes a storage manager 108a and an operating system 108fc. The storage manager