WO2010010393A1 - Monitoring of backup activity on a computer system - Google Patents

Monitoring of backup activity on a computer system Download PDF

Info

Publication number
WO2010010393A1
WO2010010393A1 PCT/GB2009/050904 GB2009050904W WO2010010393A1 WO 2010010393 A1 WO2010010393 A1 WO 2010010393A1 GB 2009050904 W GB2009050904 W GB 2009050904W WO 2010010393 A1 WO2010010393 A1 WO 2010010393A1
Authority
WO
WIPO (PCT)
Prior art keywords
backup
client
information
computer system
backed
Prior art date
Application number
PCT/GB2009/050904
Other languages
French (fr)
Inventor
Peter M. Watkin
Original Assignee
Watkin Peter M
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Watkin Peter M filed Critical Watkin Peter M
Priority to GB1101317.4A priority Critical patent/GB2474790B/en
Publication of WO2010010393A1 publication Critical patent/WO2010010393A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1451Management of the data involved in backup or backup restore by selection of backup contents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention is concerned with monitoring and reporting of backup activity on a computer system. Manager software is known that receives from multiple nodes (12) on a system data which is to be backed up and which stores the data in mass storage devices (16) associated with a storage manager (server 10). Typically, the success, failure or other current status of a backup job is communicated from the client to the storage manager server merely through return codes and this approach can lead to a user being provided with erroneous or incomplete information. The present invention provides a home server (22) in communication with the clients which receives from the clients information generated by them relating to the backup jobs, parses this client information, and so provides the user with backup information concerning e.g. the success or failure of the client backup operations. More complete and reliable information can be consequently provided to the user.

Description

Monitoring of Backup Activity on a Computer System
The present invention is concerned with monitoring of backup activity on a computer system.
Storage manager software (abbreviated below to "SM") will be used to refer to backup products that can be installed on a computer or server (referred to hereafter as a node or nodes) which may be running any one of several operating systems (for demonstration purposes, nodes running Wintel and UNIX operating systems will be further referenced). Once the SM software has been installed on a node, it is referred to as a client. The client is configured to perform a backup of data located on the node using an incremental backup method. The client is registered with a specific SM server (referred to hereafter as the storage manager server). Communication between the client and the storage manager server is performed over a Local Area Network (LAN) or a Wide Area Network (WAN) using a system protocol such as Internet Protocol (IP). The client can be used to manually perform a backup, however in most cases a schedule is used to schedule a backup to start on the node at a particular time. An existing SM of this type is sold by International Business Machines under the name Tivoli Storage Manager.
There are various methods and tools that enable a job to be scheduled; however the most common method is for the storage manager server to send the schedule to the client. Once a scheduled backup has been started, any data to be backed up is backed up to the storage manager server over the network. The storage manager server subsequently saves the data to storage media that can be accessed in the event that the data has to be restored.
Detailed results, detailing the activity of a backup job performed by the client, are entered into log files located on the node (and will be referred to hereafter as client information). Only a subset of this information is reported to the storage manager server, typically in the form of return codes.
Under certain conditions found on a node, or due to the way in which the storage manager software functions, it can be that a backup job reported as "successful" in the client information, and subsequently communicated as such to the storage manager server, is actually not successful. This means it is possible that not all data that should potentially have been backed up, was backed up. In some cases it is only possible to detect, and thus prevent the recurrence of, such a critical event by reviewing the client information.
Due to the nature of computer systems and their security requirements, only a restricted number of administrators are able to access and review the client information for potentially critical errors. Due to a heavy workload, and time constraints of these administrators, it can be that this task is not performed. In most cases the information reported to the storage manager server is relied upon to categorize the backup status of a backup. The reason for this is that storage manager server information is far easier to access than client information ac tho status of all clients connected to the server can be viewed from one central location.
It is not sufficient to rely on storage manager server information in order to be certain that a backup was successful. It is also not always sufficient to review just the latest information detailed in the client log files that relate to the latest backup, as this is only point in time information. The data and values from the previous backup also need to be considered in order to be certain that the backup was successful.
Furthermore, if a backup job is reported as failed, only the client contains enough detail to ascertain the cause. In order to access this information and discover the root cause of the problem, one must in any event connect to the respective node and review the client information. This is a time consuming activity especially in an environment where hundreds of backups are performed every day.
In accordance with a first aspect of the present invention, there is a computer system comprising a storage manager server and multiple clients, the storage manager server being adapted to receive data objects to be backed up from the clients and to store them in at least one mass storage device, and the clients being adapted to create client information comprising a log of status of backup jobs carried out by the client, the system being characterized in that it further comprises a home server adapted to receive client information from the client, to parse the client information, and so to provide a user with back relating to the success, failure or other status of backup operations on the computer system.
The invention can thus provide a means for monitoring the backup activity which is distinct from the monitoring activity of the SM itself and which utilizes the client information, thus providing potentially more detailed and reliable information than for example return code-based facilities of the SFvI. True failure information, normally found only on the client, is made available thus reducing time and effort normally required to connect to the client to ascertain failure information.
In accordance with a second aspect of the present invention, there is a method of monitoring backup activity on a computer system comprising a storage manager server which receives data objects to be backed up from multiple clients in the system and which stores the data objects in at least one main storage device, the clients creating client information comprising respective logs of backup jobs which they carry out, the method of monitoring comprising providing a home server, transferring client information from the clients to the home server, parsing the client information received by the home server, and providing a user with backup information relating to the success, failure or other current backup status of a node. In accordance with a third aspect of the present invention, there is a computer program product for running on a home server of a computer system in which the home server is in communication with multiple clients, the computer program product comprising instructions which cause the home server to receive from the clients respective client information files comprising logs of backup jobs carried out by the clients, to parse the client information files, and to output for a user backup information relating to the success, failure or other current backup status of a node.
Through the utilization of client information obtained directly and automatically from the node, the present invention makes it possible to present the backup data via a website in such a way that one is able at a glance to gain a complete overview of the current as well as historical backup situation for an individual node or complete environment, be it large or small.
As well as an overview, detailed information can be made available for every backup job, thus reducing the need to manually connect to a node in order to review the information.
Potentially critical scenarios can be automatically flagged with coloured warnings on the website as well as e-mail and SMS (text message) alerts. The time and effort saved in identifying failed backups, as well as the advantage of having all the client information to hand means one can concentrate on solving backup problems. In addition, being able to proactively react to possibly critical situations that may normally go undetected helps protect against possible data loss.
The backups can be monitored by anyone who has a valid account for the website, thus removing the pressure from node or server administrators. Reports complete with graphs and tables can be generated for individual nodes or the complete environment for several time frames. This is particularly helpful in an audit situation, and for ensuring Service Level Agreements (SLAs) are reached.
Specific embodiments of the present invention will now be described, by way of example only, with reference to the accompanying drawings, in which:-
Figure 1 is a diagrammatic representation of a backup environment;
Figure 2 is a diagrammatic representation of an environment for implementing the present invention;
Figure 3 is a screen display from a website used in an embodiment of the present invention;
Figure 4 is a backup environment summary prepared by a system embodying the present invention; Figure 5 presents summary data in relation to a chosen item in the backup environment summary;
Figure 6 presents backup information specific to a chosen node;
Figure 7 presents a historical backup data for a given node;
Figure 8 is a graph of the frequency of backup operation outcomes - failure, success, etc. - over time; and
Figure 9 shows backup information for a particular node.
The embodiment of the invention to be described below will be referred to as "EBC" and consists of two distinct parts. One part is software that is installed, according to the present exemplary embodiment, on a Wintef server (to be referred to hereafter as a "home server") which is within the environment where the nodes that are to be backed up exist, and can be accessed over a system. In accordance with the invention, log files containing the client information are either collected by the home server or can be delivered. Once the log files are available the software, which comprises a parsing mechanism, applies specific initial parsing logic to the client information. As a result of the parsing, two distinct log files are created. Key amongst them is a file containing summary information pertaining to the latest backup from every node where log files were available. This log file will be referred to hereafter as the "summary log."
The second part of EBC is a second Wintel server (to be referred to hereafter as the "host") running a data warehouse and hosting a website. On the home server, once all the respective log files have been parsed and the summary log has been completed, the summary log is sent, e.g. via FTP or e-mail (with attachment) to the host where it is further parsed, having a certain criterion inspected as well as specific values compared and contrasted. The results of the data warehouse parsing are subsequently accessible via the website with a valid and registered e-mail account.
Via the website one is able to ascertain at a glance the overall picture of a complete backup environment, for example how many backups were successful, failed, are still running etc. As well as providing an overview, the system makes it possible to drill down to the latest client information from an individual node. This means one is able to immediately identify (in most cases) the cause of a backup failure without having to spend time connecting to a node and manually checking the respective client information.
The website offers an abundance of information to the end user, highlighting possible critical situations with the use of colours. However this is a passive approach to backup monitoring and relies on someone actively logging on and paying attention to the backups. In order to ensure that no critical warnings are overlooked, messages containing warnings and backup failure information can be sent to those who choose to have this service. These may for example be SMS (text messages) and/or e-mails. Each subscriber is able to individually tailor the service to meet his or her requirements. It is possible to receive just e-mails or SMS messages or both for the complete environment, and to specify individual nodes. This service ensures that the backups are proactively monitored, enabling administrators to swiftly and efficiently react to failed or possibly critical situations that may otherwise go undetected, even if a manual backup control was performed.
As well as monitoring the backups, EBC is also able to report on all backup activity at node level, service pack level or complete environment level over several (user defined) time scales.
In summary, EBC is a complete backup monitoring and reporting system that enables one to proactively manage, react and report on the backup environment from anywhere in the world. Additionally, the information available can be used for audits or to ensure Service Level Agreements (SLAs) have been reached or adhered to.
Before looking at EBC itself in greater detail, the operation of a known suite of storage manager software (such as IBM's Tivoli Storage Manager, for example) will be described with reference to Figure 1. The storage manager software (referred to hereafter as SM) serves to backup data located on nodes running several platforms such as Wintel and UNIX. In order for an SM backup to be performed, firstly a suitable environment must exist. A simple environment consists of a single storage manager server 10 and several clients 12a, 12b connected over a network such as a Local Area Network (LAN) or Wide Area Network (WAN) 14 as seen in Figure 1. The storage manager server has storage resources 16 in the form of multiple mass media devices (tapes, disks) connected to it where backup data from the clients 12a, 12b is stored and managed.
The role of the storage manager server 10 is to schedule backup jobs on each client and store and manage the data backed up from the clients, making it available as and when a restore is required.
Each node to be backed up must have SM client software, called an SM client, installed on it. In order for the client to know what it should backup, a configuration file or files are used. Detailed information pertaining to a backup is to be found in log files produced by the client 12a, 12b. There are slight differences between the number and type of tiles used tor Wintel and UNIX clients. Both will be detailed below.
A Wintel client has three main files:
• a configuration file; • a main client backup information file; and
• a specific error information file.
The configuration file can be edited to configure the various specific SM settings for the respective backup. This includes specifying files/directories, otherwise known as "objects", that should be included in or excluded from the backup. When using an include statement, it is possible to assign a "management class" which defines how long the objects that are backed up by the client should be retained once they have been transferred to the server. Additionally it defines how many versions are to be retained and for how long. If a management class is specifically defined on the client, this will be used. If however no management class is defined on the client, a default management class defined on the server that the client is connected to will be used.
The main client backup information file contains the backup statistics resulting from the backup activity of the client. This file can contain more or less information pertaining to the backups depending on how the configuration file is configured. For example, it can contain a listing of every single file or directory backed up, or just a summary of the backup statistics depending on the configuration.
Should a backup encounter any problems, detailed error messages are placed in the specific error information file. A UNIX client may have up to five files:
• a configuration file which can be edited to configure the various specific SM settings for the respective backup. Unlike the Wintel configuration file it may or may not be used to configure objects that should be included or excluded;
• if it is not used, a separate include/exclude file is used to define these items;
• a server identification file which is primarily used to define which server the client should be connected to;
• a main client backup information file, which has same function as described above with reference to the Wintel main client backup information file; and
• a specific error information file, which has the same function as described above with reference to the Wintel specific error information file.
The method used by the exemplary SM to perform a backup is known as "incremental". An incremental backup works by initially inspecting "objects". Objects are directories and files that reside on the file system of a node. What should be inspected is based on the include and exclude statements defined in the respective configuration files. Following the inspection phase, the inspected objects are compared with those that were previously backed up to see if they already exist or have been modified. If an object is new or has been modified since the last backup, it will be backed up and the data sent to the server. All other objects will not be backed up.
Under certain conditions, some caused by the node and some due to the way in which SMs function, it can be that a backup job reported as "successful" by the client and subsequently reported to the server, is not actually successful. This means it is possible that not all data that should potentially have been backed, was backed up. In some cases it is only possible to detect and thus prevent the recurrence of such a critical event by review the data (client information) located on the node.
To overcome this problem, the system to be described below, through the utilisation of information obtained directly and automatically from the node only, makes it possible via a secure Internet website to gain, at a glance, an overview of a backup environment, large or small, or simply view backup information pertaining to a single backup. The option of receiving an e-mail or SMS or both makes certain that valuable information that may only otherwise be gleaned from browsing the website is not overlooked. The system enables a user to be actively or passively informed, and thus to proactively react against potential critical situations based on key client information (the only realistic data source that should be considered when monitoring SM backups) that, left undetected, could lead to data loss. EBC utlises only client information found on the respective nodes. Client log files are obtained from every monitored node.
With reference to Figure 2 an explanation of the operation of EBC will now follow.
A UNIX node 12c has an SM client installed on it. The arrow 20 pointing from a home server 22 to the node 12c indicates that the log files are being collected from the node 12c by the home server 22.
A Wintel node 12d has an SM client installed on it. The arrow pointing from the node 12d to the home server indicates that the log files are being sent from the node 12d to the home server 22.
For both Wintel and UNIX nodes, either method of log file collection or delivery can be used. The same methods can be applied to any node running an alternative operating system other than Wintel or UNIX.
Once all the relevant log files have been centrally consolidated on the home server 22 the parser applies initial parsing logic to them. The resulting information pertaining to the latest backup of each respective node, where the required client log files were available is subsequently entered into a file called the summary log. The summary log file forms the basis of information used to monitor and report on the backups. Within this file it is already possible to glean the basic status of most jobs, for example (a) failed, (b) still running or (c) successful.
It is important however that the true backup status of a job is never based solely on this file, as only the information that is ultimately presented on the website portrays the real status.
As well as the summary log, a second file is generated called the missing nodes log. If the log files from a node were once available but for whatever reason became unavailable, the name(s) of the missing nodes(s) appear(s) in this file. This file is produced as part of the initial parsing on the home server.
Once the summary log file and the missing node log are complete, they are subsequently transferred via e-mail, File Transfer Protocol (FTP) or Secure File Transfer Protocol (SFTP) over the world wide web 24 to a data warehouse and web server 26.
The files once automatically loaded into the data warehouse 26 are further parsed with the resulting data uploaded to the website which is then available to an EBC service provider 28. Additionally, depending on how the user has configured his or her account, they will receive e-mails and SMS (text) messages detailing the specific backup failures and warnings.
Some details of the website will be provided by way of example rather than limitation and with reference to Figures 3 to 8. Website access requires a valid account (e-mail address) and password. The initial website view seen in Figure 3 offers several viewing options from a backup environment overview to detailed node backup information. It is also possible to generate reports as well as export data to a commercial spreadsheet package.
The backup environment summary (Figure 4) enables one to view the backup results of the complete environment at a glance, thus gleaning the most important statistics such as the number of backups that are failed, unknown or have possible warnings associated with them.
The user may create a "Personai Backup View", selecting the particular backup information required, including alerts, the personal backup view being subsequently automatically provided to that particular user.
If one clicks on the value associated with a specific item, for example value 3 in Figure 4 associated with "failed backups", a summary of only the corresponding data - in this example backup jobs that failed - is produced as can be seen in Figure 5. This option makes it easy for one to view only what is of interest, preventing having to scroll up and down the page in order to view pertinent information.
In Figure 9, information is given relating to backup jobs on a single node. The status of the backup job can be seen to be "failed". By clicking on the node name, the user obtains the more detailed textual information shown toward the bottom of the figure including the reason why the backup failed - in this case, "Backup using Microsoft volume shadow copy failed" (Microsoft is a registered trade mark). Only the client contains this information, and being able to view this information via the website saves wasted time and effort that would normally have to be invested by connecting to the node.
Either through the selection of a summary view via the backup environment summary, or by scrolling down the page, one is immediately able to view basic backup information and the corresponding status for the backup job. A typical example of the type of information available can be seen under Figure 6 that has the status "failed."
If one clicks where indicated on Figure 6, i.e. on "backup status", be it successful, failed, unknown or running, up to a thirty-day (or backup) historical view is available (Figure 7). This view enables the trend of the backups for a specific node to be monitored. This can help identify and rectify certain backup problems that may appear once a week for example, that would normally go undetected. Taking the example that someone would like a restore of a specific object from a date more than thirty-days ago, prior to performing a restore it is important to check that the backup for this particular date was successful. EBC offers a "point in time" option that makes it possible to specify a date and view the respective backup information, and thus to establish whether or not the restore is possible (as long as there was data available at this time). This feature is also very useful in an audit situation where one must demonstrate that a particular backup (chosen at random by an auditor) was successful and most importantly that a record exists.
Graphs and charts can be generated to help to identify trends and identify problem areas. As well as this they can be used for audits and for demonstrating Service Level Agreement (SLA) compliance. EBC offers several graph options specific for an individual node, service pack Wintel, UNIX or covering the complete environment. For example Figure 8 shows a graph of the outcomes of backup operations - successful, failed, etc. - over a consecutive sequence of backups.
In a conventional SM system, if a backup job fails for any reason, and due to the fact that the communication between the client and the server is in the form of return codes, detailed information pertaining to the reason for the backup failure can only be found on the node where the client is installed. In order to find out exactly what happened with the backup, one must first connect to the node in order to view the information. One can only access this information if one has the relevant access rights. This activity is time consuming. However it is the only realistic way of identifying exactly what the cause of the backup failure was.
EBC enables one to view detailed failure information pertaining to a backup via the website, thus saving time and effort connecting to the node. It also has the advantage that it allows people to view the information who would normally not have the correct rights to view the information on the node. Armed with this detailed information it is then possible to take steps to prevent a further failed backup.
Occasionally the file system of a node may become corrupt. In this event, when the SM backup encounters the point of the corrupt file system during the inspection phase, the inspection can simply stop, and only a backup of the objects inspected thus far is performed. Barring any other backup issues that may arise, the SM can subsequently - and erroneously - proclaim the backup as "successful". Left undetected, this can have devastating consequences as depending at what level the corrupt file system was located it could mean that only a small percentage of the objects to be backed up were actually backed up, thus making the restore of any affected objects impossible.
In order to immediately identify this problem, if there is a decrease of ten percent or more in the value for "objects inspected" between the last backup and the current, a warning appears on the website highlighting the value. Additionally, an e-mail and/or an 5MS can be sent to proactϊvely inform the relevant parties.
In the SM, the server 10 and the client 12 communicate the status of a backup job via "return codes". Particular return code values may for example indicate
(a) all backup operations on the client completed successfully;
(b) the backup operation completed successfully but some files were not processed (a common outcome in practice, e.g., or were in use by another application and so were inaccessible, or were deleted between the inspect operation and the subsequent backup);
(c) the backup operation completed with at least one warning message.
The value of the return code for any respective backup job is based primarily on the successful completion or failure of the scheduled job that starts the backup on the client. Subsequently, additional information pertaining to the respective backup such as the number of objects failed is taken into account.
As an example, the schedule defined on the client starts and successfully completes and no objects fail, the server classifies the respective backup job as return code "0" or completely successful and attributes the corresponding return code. If the schedule for a backup job on a client starts and ends without an error, but one or more objects are reported as failed, the job is given a return code indicating that the scheduled job was successful, but one or more objects failed. In a worst-case scenario, every single object could fail for whatever reason during the backup, SM can declare the schedule that started the backup as successfully run although one or more objects were not backed up. This is because it is normal for some objects to fail, e.g. for one of the reasons just mentioned. Because outcome (b) above is a common occurrence, such a job (as long as the schedule started and completes successfully) is normally classed as successful by administrators and no further action is taken. With regard to the "worst case scenario", if no further action were taken and it were left undetected, it would not be possible to restore any of the objects that were affected by the respective backup.
To ensure that this situation, if it does arise, is guarded against by EBC, if more than a threshold number (configurable depending on the environment) objects - e.g. one hundred - are listed as failed, an objects failed threshold warning highlighting the value appears on the website. Once again an e-mail or SMS can be sent to proactively inform the relevant parties.
A management class is an entry that may be included in the server identification file or the include/exclude file located locally on the respective node. This entry defines how "objects" are stored and managed within the SM storage infrastructure. The main purpose of the management class is to define how many versions of an object should be stored and for how long once it has been backed up.
If a management class is incorrectly defined, objects backed up will have the "default" management class rules applied to them. The administrator defines the default management class on the server. If the default management class is not defined correctly or accurately to reflect the specific requirements of the objects backed up, it could have serious consequences. This could lead to a situation where objects that should normally be available for restore are not, as they have not been backed up or stored for the correct amount of time.
Information pertaining to the fact that the management class defined in the server identification or include/exclude files located on the node is invalid can be found in the specific error information file located on the node. This information may or may not be communicated to the respective server. In the event that it is communicated to the server, it may well be overlooked.
If an "invalid management" class is defined, a warning will appear on the website highlighting the issue. As before an e-mail and/or an SMS can be sent to proactively inform the relevant parties.
Occasionally the SM may report a backup as "successful" although nothing at al! has been backed up. There are two reasons for this. The first is that no new objects were created or have been modified since the last backup. The second can be clue to a configuration error in the configuration file. To ensure that this potentially critical situation does not go unnoticed, EBC generates a warning on the website indicating that although the backup was successful, 0 bytes of information were backed up. E-mail and/or SMS alerts are available for this warning.
It is normal that not all objects (files) can be backed up and there are several reasons why thfs might be the case, one of which is that the object was in use during the backup. It is however important to review the failed objects and make a special backup if it is deemed necessary. Via the website one is able to view the objects (files) that were not backed up.
If an object does not have a backup three or more times in succession, a warning appears on the website highlighting the value for "failed objects". If the user clicks on the warning they are able to view the affected objects and thus take the necessary action. This warning is complete with an e-mail and/or SMS alerting option.
As the SM performs an incremental backup, it is very rare that a backup job runs for longer than twenty-four hours. Apart from the initial backup, not all data is backed up every time a backup is performed. Only new objects or objects that have been modified since the last backup are indeed backed up, which is why a backup is normally complete in under twenty-four hours. In the event that a backup does run longer than twenty-four hours, it could indicate a problem, e.g. the backup session between the client and the server may be hanging. With this in mind, a warning for this event is available via the website and is available with an e-mail and/or SMS alert.

Claims

1. A computer system comprising a storage manager server and multiple clients, the storage manager server being adapted to receive data objects to be backed up from the clients and to store them in at least one mass storage device, and the clients being adapted to create client information comprising a log of status of backup jobs carried out by the client, the system being characterized in that it further comprises a home server adapted to receive client information from the client, to parse the client information, and so to provide a user with backup information relating to the success, failure or other status of backup operations on the computer system, the system comprising a processor configured to compare a backup value in a current backup operation with a corresponding backup value from a previous backup operation and to trigger a warning if the backup value in the current backup operation is smaller than the corresponding backup value from the previous backup operation by more than a defined margin.
2. A computer system as claimed in claim 1 in which the backup values are numbers of objects backed up in the respective backup operation.
3. A computer system as claimed in claim 1 or claim 2 in which the margin is defined to be a fraction of the backup value in either the current backup operation or the previous backup operation.
4. A computer system as claimed in any of claims 1 to 3 in which the backup process involves inspecting data objects held by the client to determine whether they need to be backed up.
5. A computer system as claimed in any preceding claim in which the said comparison is of the number of the number of objects backed up for a client in the current backup operation with the number of objects backed up for the same client in the previous backup operation.
6. A computer system as claimed in claim 4 in which the comparison is made for every client.
7. A computer system as claimed in any preceding claim, further comprising means for automatically dispatching a message to a user in response to a predefined criterion relating to the backup information.
8. A computer system as claimed in claim 7 in which the message is dispatched by e-mail or by the short messaging service.
9. A computer system as claimed in any preceding claim in which the backup information, or part of it, is made available to users via a website.
10. A computer system as claimed in claim 9 further comprising a website host configured to receive the backup information, or part of it, from the home server and to make it available through the website.
11. A computer system as claimed in any preceding claim in which a warning signal is provided in the event that the number of data objects whose backup fails exceeds a predetermined threshold.
12. A computer system as claimed in any preceding claim in which a warning signal is provided in the event that a backup operation is not completed within a predetermined time period.
13. A computer system as claimed in any preceding claim in which a warning signal is provided in the event that no data is backed up in a backup operation.
14. A computer system as claimed in any preceding claim in which a warning signal is provided in the event that backup of an object fails more than a predetermined number of times.
15. A computer system as claimed in any preceding claim in which the home server receives, parses and collates the client information and thereby creates a summary log file relating to a system backup operation.
16. A computer system as claimed in claim 15 in which backup jobs are categorized in categories including (a) successful, (b) failed, (c) still running, (d) unknown backups, (e) expired nodes, (f) warnings and (g) backups not required.
17. A computer system as claimed in any preceding claim in which historical data relating to multiple past backup operations is stored and made available to the user.
18. A computer system as claimed in any preceding claim in which a missing nodes log is created, which contains a list of system nodes which were once available and which subsequently become unavailable.
19. A computer system as claimed in claim 18 in which a previously unavailable system node which comes online once again automatically appears normally on the system, and a node which remains offline remains in the missing nodes log until it is removed by a user, thereafter appearing on the system as "expired" for a defined period prior to vanishing.
20. A method of monitoring backup activity on a computer system comprising a storage manager server which receives data objects to be backed up from multiple clients in the system and which stores the data objects in at least one main storage device, the clients creating client information comprising respective logs of backup jobs which they carry out, the method of monitoring comprising providing a home server, transferring client information from the clients to the home server, parsing the client information received by the home server, and providing a user with backup information relating to the success, failure or other current backup status of a node, the method further comprising comparing a backup value in a current backup operation with a backup value from a previous backup operation and providing a warning backup value in the current backup operation is smaller than the backup value from the previous backup operation by more than a defined margin.
21. A method as claimed in claim 20 in which the backup values are numbers of objects backed up in the respective backup operation.
22. A method as claimed in claim 20 or in claim 21 in which the margin is defined to be a fraction of the backup value in either the current backup operation or the previous backup operation.
23. A method as claimed in any of claims 20 to 22 in which the backup process involves inspecting data objects held by the client to determine whether they need to be backed up.
24. A method as claimed in any of claims 20 to 23 in which the said comparison is of the number of the number of objects backed up for a client in the current backup operation with the number of objects backed up for the same client in the previous backup operation.
25. A method as claimed in claim 24 in which the comparison is made for every client.
26. A method as claimed in any of claims 20 to 24 in which a message is dispatched to a user in response to a predefined criterion relating to the backup information.
27. A method as claimed in claim 26 in which the message is dispatched by e-mail or by the short messaging service.
28. A method as claimed in any of claims 20 to 27 in which the client backup information, or part of it, is made available to users via a website.
29. A method as claimed in any of claims 20 to 28, further comprising providing a warning signal based upon a predefined criterion relating to the backup information.
30. A method as claimed in claim 29 in which the warning signal is provided in the event that the number of data objects whose backup fails exceeds a predetermined threshold.
31. A method as claimed in claim 29 or claim 30 in which a warning signal is provided in the event that a backup operation is not completed within a predetermined time period.
32. A method as claimed in any of claims 29 to 31 in which a warning signal is provided in the event that no data is backed up in a given backup operation.
33. A method as claimed in any of claims 29 to 32, in which a warning signal is provided in the event that backup of an object fails more than a predetermined number of times.
34. A method as claimed in any of claims 20 to 33 comprising parsing and collating the client information to create a summary log file relating to a system backup operation.
35. A method as claimed in claim 34 in which backup jobs are categorized in categories including (a) successful, (b) failed, (c) still running, (d) unknown backups, (e) expired nodes, (f) warnings and (g) backups not required.
36. A method as claimed in any of claims 20 to 35 in which historical data relating to multiple past backup operations is stored and made available to the user.
37. A method as claimed in any of claims 20 to 36 in which a missing nodes log is created, which contains a list of system nodes which were once available and which subsequently became unavailable.
38. A computer program product for running on a home server of a computer system in which the home server is in communication with multiple clients, the computer program product comprising instructions which cause the home server to receive from the clients respective client information files comprising logs of backup jobs carried out by the clients, to parse the client information files, and to output for a user backup information relating to the success, failure or other current backup status of a node.
39. A computer program product as claimed in claim 38 further comprising instructions for causing the home server to output some of the client backup information to a host for provision to a user through a website.
40. A computer program product as claimed in claim 38 further comprising instructions which cause the home server to apply to the backup information criterion to establish whether a user warning is required, and for dispatching a warning to the client in the event that the criterion are satisfied.
41. A computer program product as claimed in claim 40 in which the warning is dispatched by e-mail or by the short messaging service.
42. A computer system as claimed in any of claims 1 to 19 which is adapted to provide a warning in the event that a management class on the client is invalid.
43. A computer system as claimed in any of claims 1 to 19 adapted in the event that the current backup is reported by the backup system as failed to make a synopsis indicative of the cause of failure available via the short messaging service and/or via e-mail and/or via the website.
PCT/GB2009/050904 2008-07-22 2009-07-22 Monitoring of backup activity on a computer system WO2010010393A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1101317.4A GB2474790B (en) 2008-07-22 2009-07-22 Monitoring of backup activity on a computer system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0813397.7 2008-07-22
GB0813397A GB0813397D0 (en) 2008-07-22 2008-07-22 Monitoring of backup activity on a computer system

Publications (1)

Publication Number Publication Date
WO2010010393A1 true WO2010010393A1 (en) 2010-01-28

Family

ID=39737447

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2009/050904 WO2010010393A1 (en) 2008-07-22 2009-07-22 Monitoring of backup activity on a computer system

Country Status (2)

Country Link
GB (2) GB0813397D0 (en)
WO (1) WO2010010393A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9471576B1 (en) 2011-12-08 2016-10-18 Veritas Technologies Systems and methods for providing backup storage interfaces
EP3528123A1 (en) * 2018-02-16 2019-08-21 Wipro Limited Method and system for automating data backup in hybrid cloud and data centre (dc) environment
CN110245045A (en) * 2019-05-23 2019-09-17 平安科技(深圳)有限公司 A kind of keyword alarm method and device based on log
US10684921B1 (en) * 2011-12-08 2020-06-16 Veritas Technologies Llc Systems and methods for navigating backup configurations
US11645164B2 (en) 2021-08-11 2023-05-09 International Business Machines Corporation Adjusting data backups based on system details

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999023562A1 (en) * 1997-11-03 1999-05-14 Gateway, Inc. Automatic backup based on disk drive condition
EP0945800A1 (en) * 1998-03-02 1999-09-29 Hewlett-Packard Company Data backup system
US20040030721A1 (en) * 2000-10-17 2004-02-12 Hans-Joachim Kruger Device and method for data mirroring
US20050228836A1 (en) * 2004-04-08 2005-10-13 Bacastow Steven V Apparatus and method for backing up computer files
EP1635244A2 (en) * 2004-09-09 2006-03-15 Microsoft Corporation Method, system, and apparatus for creating an archive routine for protecting data in a data protection system
US20060136360A1 (en) * 2004-12-22 2006-06-22 Alexander Gebhart Preserving log files in a distributed computing environment
US20070061385A1 (en) * 2003-05-06 2007-03-15 Aptare, Inc. System to manage and store backup and recovery meta data
US20070198593A1 (en) * 2005-11-28 2007-08-23 Anand Prahlad Systems and methods for classifying and transferring information in a storage network

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999023562A1 (en) * 1997-11-03 1999-05-14 Gateway, Inc. Automatic backup based on disk drive condition
EP0945800A1 (en) * 1998-03-02 1999-09-29 Hewlett-Packard Company Data backup system
US20040030721A1 (en) * 2000-10-17 2004-02-12 Hans-Joachim Kruger Device and method for data mirroring
US20070061385A1 (en) * 2003-05-06 2007-03-15 Aptare, Inc. System to manage and store backup and recovery meta data
US20050228836A1 (en) * 2004-04-08 2005-10-13 Bacastow Steven V Apparatus and method for backing up computer files
EP1635244A2 (en) * 2004-09-09 2006-03-15 Microsoft Corporation Method, system, and apparatus for creating an archive routine for protecting data in a data protection system
US20060136360A1 (en) * 2004-12-22 2006-06-22 Alexander Gebhart Preserving log files in a distributed computing environment
US20070198593A1 (en) * 2005-11-28 2007-08-23 Anand Prahlad Systems and methods for classifying and transferring information in a storage network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"IBM Tivoli Storage Manager for System Backup and Recovery (SysBack) Version 6.1 Installation and User's Guide, pp. vii, viii, 19-68, 313-328, 347-350", November 2007, IBM, XP002550871 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9471576B1 (en) 2011-12-08 2016-10-18 Veritas Technologies Systems and methods for providing backup storage interfaces
US10684921B1 (en) * 2011-12-08 2020-06-16 Veritas Technologies Llc Systems and methods for navigating backup configurations
EP3528123A1 (en) * 2018-02-16 2019-08-21 Wipro Limited Method and system for automating data backup in hybrid cloud and data centre (dc) environment
US10824514B2 (en) 2018-02-16 2020-11-03 Wipro Limited Method and system of automating data backup in hybrid cloud and data centre (DC) environment
CN110245045A (en) * 2019-05-23 2019-09-17 平安科技(深圳)有限公司 A kind of keyword alarm method and device based on log
CN110245045B (en) * 2019-05-23 2022-06-07 平安科技(深圳)有限公司 Keyword warning method and device based on log
US11645164B2 (en) 2021-08-11 2023-05-09 International Business Machines Corporation Adjusting data backups based on system details

Also Published As

Publication number Publication date
GB2474790B (en) 2012-12-19
GB0813397D0 (en) 2008-08-27
GB2474790A (en) 2011-04-27
GB201101317D0 (en) 2011-03-09

Similar Documents

Publication Publication Date Title
KR100579956B1 (en) Change monitoring system for a computer system
US8751283B2 (en) Defining and using templates in configuring information technology environments
US8763006B2 (en) Dynamic generation of processes in computing environments
JP5492788B2 (en) System and apparatus for automatic data anomaly correction in a computer network
US8990810B2 (en) Projecting an effect, using a pairing construct, of execution of a proposed action on a computing environment
CN109614283B (en) Monitoring system of distributed database cluster
US20040148385A1 (en) Method and apparatus for software and hardware event monitoring and repair
US20110296390A1 (en) Systems and methods for generating machine state verification using number of installed package objects
US20150220373A1 (en) Identifying and Modifying Hanging Escalation Tasks to Avoid Hang Conditions
US20080059123A1 (en) Management of host compliance evaluation
US9411969B2 (en) System and method of assessing data protection status of data protection resources
US9355009B2 (en) Performance of scheduled tasks via behavior analysis and dynamic optimization
US7757122B2 (en) Remote maintenance system, mail connect confirmation method, mail connect confirmation program and mail transmission environment diagnosis program
KR100972073B1 (en) System and method for managing service level
US7469287B1 (en) Apparatus and method for monitoring objects in a network and automatically validating events relating to the objects
WO2010010393A1 (en) Monitoring of backup activity on a computer system
US20050154734A1 (en) Method and system for monitoring and reporting backup results
KR20040091392A (en) Method and system for backup management of remote using the web
US7954062B2 (en) Application status board mitigation system and method
US8402125B2 (en) Method of managing operations for administration, maintenance and operational upkeep, management entity and corresponding computer program product
US8380729B2 (en) Systems and methods for first data capture through generic message monitoring
JP6317074B2 (en) Failure notification device, failure notification program, and failure notification method
US20210182364A1 (en) Software license manager security
WO2015103764A1 (en) Monitoring an object to prevent an occurrence of an issue
US20070005756A1 (en) Shared data center monitor

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09785378

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 1101317

Country of ref document: GB

Kind code of ref document: A

Free format text: PCT FILING DATE = 20090722

WWE Wipo information: entry into national phase

Ref document number: 1101317.4

Country of ref document: GB

122 Ep: pct application non-entry in european phase

Ref document number: 09785378

Country of ref document: EP

Kind code of ref document: A1