US20140258315A9 - Method And Process For Enabling Distributing Cache Data Sources For Query Processing And Distributed Disk Caching Of Large Data And Analysis Requests - Google Patents

Method And Process For Enabling Distributing Cache Data Sources For Query Processing And Distributed Disk Caching Of Large Data And Analysis Requests Download PDF

Info

Publication number
US20140258315A9
US20140258315A9 US13/943,187 US201313943187A US2014258315A9 US 20140258315 A9 US20140258315 A9 US 20140258315A9 US 201313943187 A US201313943187 A US 201313943187A US 2014258315 A9 US2014258315 A9 US 2014258315A9
Authority
US
United States
Prior art keywords
data
pneuron
distributed
disk
cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US13/943,187
Other versions
US20140012867A1 (en
US9542408B2 (en
Inventor
Simon Byford Moss
Elizabeth Winters Elkins
Douglas Wiley Bachelor
Raul Hugo Curbelo
Thomas C. Fountain
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
UST Global Singapore Pte Ltd
Original Assignee
Pneuron Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/870,348 external-priority patent/US9659247B2/en
Priority claimed from US13/442,353 external-priority patent/US9558441B2/en
Priority to US13/943,187 priority Critical patent/US9542408B2/en
Application filed by Pneuron Corp filed Critical Pneuron Corp
Publication of US20140012867A1 publication Critical patent/US20140012867A1/en
Publication of US20140258315A9 publication Critical patent/US20140258315A9/en
Priority to US15/401,658 priority patent/US10684990B2/en
Publication of US9542408B2 publication Critical patent/US9542408B2/en
Application granted granted Critical
Assigned to UST Global (Singapore) Pte. Ltd. reassignment UST Global (Singapore) Pte. Ltd. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: PNEURON CORP.
Assigned to CITIBANK, N.A., AS AGENT reassignment CITIBANK, N.A., AS AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: UST GLOBAL (SINGAPORE) PTE. LIMITED
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • G06F17/30132
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/184Distributed file systems implemented as replicated file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling

Definitions

  • the present invention relates to enabling distributing cache data sources for processing large data and analysis requests and more particularly, relates to providing a distributed caching model to enable the management of distributed cache files on multiple servers or virtual machines and facilitating multiple distributed processing operations simultaneously.
  • the system should function such that multiple analytic and business operations occur, while the system should also enable sampling and evaluation with collection and recording of results. Furthermore, the invention should provide for distributed cache creation and orchestration of coordinated distributed data access and generation of iteration results from other distributed applications. All distributed cache files operations should be coordinated together into unified processing models.
  • the system and method of the present invention implements an Adapter Pneuron that interacts within its distributed processing infrastructure for large data processing.
  • the Adapter Pneuron enables the real-time acquisition of data from different types of application data sources, including service application programming interface (API), database, and files.
  • API application programming interface
  • Data is acquired and transformed into self-describing ASCII disk cache files with an associated definition of the structure.
  • the disk cache files are distributed across one to many servers or virtual machines (VMs).
  • the distributed disk cache files are accessed by participating Pneuron applications to perform operations selectively on the distributed disk data. Multiple operations are performed simultaneously by the different Pneurons with results evaluated and subsequent iteration operations applied. Evaluated results are concatenated and federated together across the different disk cache files simultaneously.
  • Disk cache files are removed automatically using a high-low disk evaluation model to remove disk cache files based on server disk utilization and automatic evaluation aging for disk cache files.
  • the present invention enables the ability to quickly access target systems and data sources and generate distributed disk cache files, to simultaneously perform real-time operations by other Pneuron programs and to federate the results together. These activities occur without requiring preparation of the data.
  • FIGS. 1A-1B are a comparison of the prior art process execution with the distributed cache model according to one embodiment of the present invention.
  • FIG. 2 is an overview of the dynamic memory mapping tree according to one embodiment of the present invention.
  • FIG. 3 is an overview of distributed disk cache removal model scenarios according to one embodiment of the present invention.
  • FIG. 4 is a block diagram of a system on which may be implemented the present invention.
  • the present invention features a system and method for large data processing and requesting reconstruction.
  • the system 100 , FIG. 4 and method includes the capacity for large data processing considerations (targeting record, queries and responses of 1 million and higher results).
  • the invention provides for the ability to access and acquire information from large data files 102 (greater than 1 million records) and rapidly provide the information to subsequent Pneurons for processing combined with the ability to extract and render large queries from databases 102 without impacting system of records processing and rapidly provide the information to subsequent Pneurons for processing.
  • the system also has the ability for multi-threaded processing by multiple distributed pneurons 104 of large input/record sets files, enabling of storage and access to large historical results and the ability to handle large inputs.
  • the invention provides the ability to store or persist large result sets.
  • a million plus raw data evaluation may generate a very large array of intelligence results that need to be persisted for future use, which might occur with time-series data with multiple month-years and multiple intelligence results for each intelligence record.
  • the invention is able to deal with sequential as well as asynchronous parallel processing, is able to address large unstructured data; web logs, email, web pages, etc. and is able to handle failures to large block processing.
  • the design considerations of the present invention are focused on maximizing distributed processing workload (volumes, results and requests) without running out of resources; e.g. hardware resources, including memory and CPU.
  • the solution consists of essentially three specific interoperable but distributed functions. First, the Adaptor/Cache Pneuron 14 and distributed disk cache files 30 , 32 . Second, the dynamic mapping tree 50 , FIG. 3 . Third, the distributed disk cache file cleanup FIG. 4 . Each function will be described in greater detail below.
  • the Adaptor/Cache Pneuron 14 and/or distributed adaptor/cache pneurons 104 ) and distributed disk cache files 34 address the problem of extremely large record set processing which presents different technology challenges.
  • Some of the current problems in the prior art include: loading all information into memory will exceed hardware server resources; and breaking up large requests presents complexities in consolidating and synchronizing the information results together and multiple operations may be required at different times by different programs across one or more large record sets.
  • the present invention solves these problems by extracting large record sets from target systems 102 and data sources and converting them into distributed disk cache files 34 .
  • the disk-based intermediate cache files and processing is coordinated by and across multiple Pneurons 104 to perform multiple simultaneous operations on the information (distributed disk cache files 34 ).
  • FIG. 1A A comparison of the prior art system of process execution ( FIG. 1A ) and the distributed cache model of the present invention ( FIG. 1B ) is shown in FIG. 1 .
  • the cache file based system 10 of the present invention will store the large requests within self-describing ASCII files and make these files (the data within them) available to any Pneuron that needs to access them.
  • Large data requests 12 are received and processed by the Adapter Pneuron 14 .
  • the Adapter Pneuron 14 transforms the large data requests into ASCII file content (extended CSV format—including the attribute type definition), and saves the ASCII file content on the local host hard drive.
  • the Adapter Pneuron 14 will send to all its associated Pneuron connections 104 a special message that will announce that new work is available and the data can be accessed from the referred files from the target disk cache location 30 , 32 on the file system.
  • the Adapter Pneuron 14 maintains context of each distributed cache file and provides system context to each participating Pneuron. Context includes the definition of the cached file format and information elements and location of the file. Participating Pneurons are able to parse the cached/adaptor Pneuron information and perform different operations.
  • the Adapter Pneuron 14 will send to subsequently connected Pneurons 104 a special message 15 that will announce to all configured and associated Pneurons that new work is available and the Pneurons can execute their operations on the disk cache files.
  • the system includes a utility that enables the calling Pneurons 104 to transform to and from XmlMessage to the target extended CSV extended file format.
  • the invention greatly simplifies the access and operations on the distributed disk cache data and provides a common abstraction layer and interface for the Pneurons to access and perform operations on the data.
  • the Pneurons only need to read the referred file content and transform the information into usable XmlMessage Type data.
  • the Pneurons can filter and extract only the necessary attributes as vectors or other objects and optimize the memory management resources.
  • the data is accessed and managed at targeted server 106 locations on the respective filing system 30 , 32 such that requests do not need to be reconstructed, which saves processing time and reduces complexity.
  • the system ability to process very large amounts of data is significant and unconstrained. Within the actual memory processing, the information is streamlined. Only reference and common messages and pointers are included.
  • the distributed cache file model enables a planning mechanism to be implemented to optimize the resources and synchronize distributed cache file access and processing.
  • the messages do not require any complex logical operations that will require the file structure to change.
  • the system will be fully capable of handling the CRUD operations (create-add new entry/record; read-record; update-record; and delete-record). This solution will work for all cases where the entity (large request—as a whole) will retain its integrity/structure.
  • the dynamic mapping tree model shown for example in FIG. 2 is implemented to support the Adaptor Pneuron.
  • the memory mapping enables a large data processing request transaction to retain its processing integrity from initiation through completion of an execution. By retaining processing integrity, the representation and all the data characteristics will be retained and accessible during the request life cycle.
  • Data representation defines the meta-data characteristics of the information, including the way that the data is stored on the file system, the number of files, file types, data definition (attribute definition), request references etc.
  • the invention In order to manage the distributed disk caching model, the invention enables the following operations to be performed on the disk cache files: Create—add new record within the large request; Read—access one or more records from the large request; Update—update/modify the data for one or more records; and Delete—delete one or more records. Given the synchronization and management complexities, the invention restricts the following functions: batching, duplicate batches and conditional batches.
  • the invention maintains and adjusts the system context dynamically.
  • This model enables automatic changes to the data representation and structure.
  • a programmatic change history tracking is maintained, which keeps track of changes applied to the disk cache file(s).
  • This feature enables automatic reconstruction of the disk cache file at any given time to support a Pneuron initiated operation and request.
  • the present invention has implemented a programmatic process to decompose large data sets into request into smaller batches. The batches are organized into parallel execution requests and configured as part of the Pneuron Networks definition.
  • a dynamic memory tree map is implemented to manage the distributed cache process across multiple Pneurons.
  • the dynamic tree maintains and provides system context for the entire distributed processing model and plan. The entire processing life cycle is maintained.
  • Each node/leaf within the dynamic tree will contain a file reference or a position/index reference and then point the Pneuron request message to the corresponding memory area.
  • the dynamic memory tree map establishes a breadcrumb trail. Using this approach, the system is able to reconstruct the request with the new values by traversing the memory tree. The system merges and reconstructs the disk cache results based on the specific request. The same logic and approach is also applied for the Large Request Reconstruction, which enables a generic distributed disk cache operation model to be applied at the Pneuron Base Level.
  • Dead (empty) messages are still sent out through the network.
  • the final Pneuron should have a max dead time interval, which will represent the time that it will wait for more batches. This time is checked/validated with the last batch arrival time.
  • Each time a batch gets split the characteristic flag is appended with additional information meant to inform about the split.
  • SPLIT is defined as [Position/Number Of Message/Batch]/[Total Number Of Messages].
  • the split information will be appended to the current flag, which will be done for each split/sub batch.
  • the Pneuron will be able to establish the context based on the amount of information that it receives, and the Pneuron will be ready to create an execution tree, such as the one detailed in FIG. 2 .
  • This approach is based on the fact that when the Final Pneuron receives a batch request, it will be able to trace it and complete (or start if it is the first batch from a large request) based on the defined execution tree.
  • any sub-batch that is received is able to communicate with the Pneuron of all the tree node parents and also the number of “leafs” per split. With this approach the Final Pneuron will be able to map out what it should receive, also the information that it receives can be ordered.
  • the target data source or system is not available for access by the Adapter Pneuron and the disk file cache cannot be created; and (2) The file system where the disk cache file is stored is not available.
  • An Idle or Dead Time interval model can be implemented to manage this scenario, such that the Idle or Dead Time interval establishes a periodic mechanism to compose the message and send it further (or execute the request).
  • the Idle or Dead Time interval evaluates each past request and the elapsed time when the last batch was received and the execution trigger.
  • the distributed disk cache file clean up portion of the process 28 provides users with the capability of caching data, within the entire system, on all the hosts 106 that are running the platform (distributed pneuron 104 ).
  • the cache is a file system 34 based mechanism that transforms and stores them indefinitely making them available. to one or more worker process pneurons. Since the invention is dealing with a highly distributed system that provides value by providing the users with parallel computing capabilities, all the resources that are used within this computing process must be available at each host level (that takes part of the parallel execution). In doing so, each host will own a copy for each cache data that it will process. This creates a big problem because the hardware resources, hard drive space in this case is not unlimited, and since each host must have a local copy of the cached job the system does not deal with replication (duplicate resources—at different host levels).
  • the present invention has implemented a High-Low distributed disk cache removal model.
  • the invention configures properties for each host 106 (either a physical server or virtual server machine).
  • the host Max Available Space property establishes the amount of bytes (megabytes or even gigabytes) that can be used by the caching system 34 on that specific server 106 . Once this max threshold is reached, the system will delete existing cache files based on the size and age of the distributed cache file. This model will eliminate past files and enable new disk files to be established and used.
  • the cache file system will be bounded with these rules; in this case the only rule/limitation that we need is to have a maximum level of space that it can be used in order to store the working cache files. This maximum level of space that can be used will be stored within the Softwarerx.Properties file 36 from CFG directory, because this is a centralized storage point for all the properties and attributes that must or can't be stored within the database.
  • a save cache data request 38 is requested/received and max space has not been reached on the host server 30 / 32 .
  • a Pneuron issues a request 38 to save data into the cache data file system 34 .
  • the request reaches the SAN (Storage Area Network or Cache System/process) 40 .
  • the system checks the Max Space configured value 36 .
  • the system 28 compares the Max Space with the actual available space on the local hard drive, which is the hard drive where the host system 106 is running, or more exactly where the “cache” directory file system 34 is found. In this first example there is sufficient space to save the information; therefore the system 28 will save the information 42 with the provided data (reference name/file name) in the file system 34 .
  • a save cache data request is requested and max space has been reached.
  • a Pneuron issues a request to save data into the cache data system.
  • the request reaches the SAN (Storage Area Network or Cache System).
  • the system checks the Max Space configured value.
  • the system compares the Max Space with the actual available space on the local hard drive, which is the hard drive where the system is running, or more exactly where the “cache” directory is found.
  • the system determines 44 there is NO sufficient space to save the information.
  • the system orders the existing cache data in descending order based upon the creation date. Then a loop occurs, which deletes the oldest file 46 and then re-checks to see if there is sufficient space. The loop ends once sufficient space is cleared or if there is nothing else to delete. If the system has sufficient space to save, then the information is saved 42 with the provided data (reference name/file name).
  • a save cache data request is requested and max space has been reached, however the system is unable to make sufficient space.
  • a Pneuron issues a request to save data into the cache data system.
  • the request reaches the SAN (Storage Area Network or Cache System).
  • the system checks the Max Space configured value.
  • the system compares the Max Space with the actual available space on the local hard drive, which is the hard drive where the system is running, or more exactly where the “cache” directory is found.
  • the system finds there is NO sufficient space to save the information.
  • the system orders the existing cache data descending based upon the creation date. A loop is created, such that the oldest file is deleted and then the system re-checks to see if there is sufficient space.
  • the system deletes all old files 46 and checks again for sufficient space and determines that there is not sufficient space and there is nothing else to delete, thereby ending the loop.
  • the system does not have sufficient space to save and the system will register a failure.
  • a system is able to get cache data when a local copy is available.
  • the cache system receives a request 48 to get a specific data.
  • This request can be issued by any Pneuron Instance that is supposed to use the cached data and needs to get a reference to the local file copy in order to read and parse/analyze or otherwise utilize the necessary information.
  • the system receives a request to get cache data 48 .
  • the system process cache 50 checks to see if the cached data is found within the local file system 34 .
  • the cache data is found to exist 52 within the local file system. Return reference to cache data 54 .
  • the caller will then be able to use the data.
  • a system is unable to get cache data because a local copy is not available.
  • the cache system 30 , 32 receives a request to get specific data 48 .
  • This request can be issued by any Pneuron Instance that is supposed to use the cached data and needs to get a reference to the local file copy in order to read and parse/analyze or otherwise utilize the necessary information.
  • the system receives a request to get cache data 48 .
  • the system cache process 50 checks to see if the cached data is found within the local file system 34 a.
  • the system determines that the cache data DOES NOT EXIST within the local file system.
  • the Current Cache System asks the other registered host 32 by calling their associated cache system process 50 a which check for existence of the data.
  • a loop is created, such that the Foreign Cache file system 34 b of server 32 is checked for data 56 , then the data is found, and then the data is copied locally 58 .
  • the loop ends when there are no more hosts/cache systems to search or once the cache data is found. Return reference to cache data 58 .
  • the caller host 30 will then be able to use the cached data.
  • a system is unable to get cache data because a local copy is not available anywhere.
  • the cache system receives a request to get a specific data. This request can be issued by any Pneuron Instance that is supposed to use the cached data and needs to get a reference to the local file copy in order to read and parse the necessary information.
  • the system receives a get cache data request. The system checks to see if the cached data is found within the local file system. The system determines that cache data DOES NOT EXIST within the local file system.
  • the Current Cache System asks the other registered host by calling their associated cache systems 32 and checking for the data existence. A loop is created, wherein the system checks the Check Foreign Cache System for data and determines that the data is not found. The loop ends once there are no more hosts/cache systems to check and no cache data has been found. The system determines that the data was not found. A failure has occurred.
  • the present invention enables the real-time generation, management, and synchronization of distributed disk caches within a highly distributed processing environment.
  • the process deconstructs and organizes large data sets acquired from disparate systems and data sources across an unlimited number of physical servers and virtual machines.
  • An abstraction layer is applied across all distributed disk cache files.
  • Multiple distributed Pneurons perform simultaneous operations across one or more disk cache files.
  • Processing is synchronized automatically.
  • the system maintains an in-memory mapping tree to maintain distributed interactions and provides the ability to dynamically construct and deconstruct the distributed cache files into any form.
  • the distributed cache model enables synchronized federation of selected information from multiple distributed cache files automatically and as part of the Pneuron processing.
  • the invention allows Pneuron to use existing client disk capacity and obtain and utilize targeted large data cache files on demand and without preparing aggregated data stores. As a result, businesses benefit by foregoing large data preparation activities.

Abstract

A method and system for large data and distributed disk cache processing in a Pneuron platform 100. The system and method include three specific interoperable but distributed functions: the adapter/cache Pneuron 14 and distributed disk files 34, a dynamic memory mapping tree 50, and distributed disk file cleanup 28. The system allows for large data processing considerations and the ability to access and acquire information from large data files 102 and rapidly distribute and provide the information to subsequent Pneurons 104 for processing. The system also provides the ability to store large result sets, the ability to deal with sequential as well as asynchronous parallel processing, the ability to address large unstructured data; web logs, email, web pages, etc., as well as the ability to handle failures to large block processing.

Description

  • This application is a continuation-in-part of U.S. patent application Ser. No. 12/870,348 filed on Aug. 27, 2010 and entitled “System and Method For Employing The Use Of Neural Networks For The Purpose Of Real-Time Business Intelligence And Automation Control”; and a continuation-in-part of U.S. patent application Ser. No. 13/442,353 filed on Apr. 9, 2012 and entitled “Legacy Application Migration To Real Time, Parallel Performance Cloud”; and claims the benefit of U.S. Provisional Patent Application No. 61/672,028 entitled “A Method And Process For Enabling Distributing Cache Data Sources For Query Processing And Distributed Disk Caching Of large Data And Analysis Requests”, which was filed on Jul. 16, 2012, all of which are incorporated herein by reference
  • TECHNICAL FIELD
  • The present invention relates to enabling distributing cache data sources for processing large data and analysis requests and more particularly, relates to providing a distributed caching model to enable the management of distributed cache files on multiple servers or virtual machines and facilitating multiple distributed processing operations simultaneously.
  • BACKGROUND INFORMATION
  • Accessing geographically dispersed multiple systems and large datasets and being able to operate on this information to perform multiple simultaneous operations is very difficult. Combining and federating distributed operation results together compounds the problems. Most companies utilize an aggregated data warehouse with multiple feeder data sources and extraction, transformation, and loading (ETL) routines to organize distributed data together. The data preparation cost and time are signification.
  • Therefore, what is needed is a distributed cache evaluation and processing model that operates across multiple servers simultaneously. The system should function such that multiple analytic and business operations occur, while the system should also enable sampling and evaluation with collection and recording of results. Furthermore, the invention should provide for distributed cache creation and orchestration of coordinated distributed data access and generation of iteration results from other distributed applications. All distributed cache files operations should be coordinated together into unified processing models.
  • SUMMARY OF THE INVENTION
  • The system and method of the present invention implements an Adapter Pneuron that interacts within its distributed processing infrastructure for large data processing. The Adapter Pneuron enables the real-time acquisition of data from different types of application data sources, including service application programming interface (API), database, and files. Data is acquired and transformed into self-describing ASCII disk cache files with an associated definition of the structure. The disk cache files are distributed across one to many servers or virtual machines (VMs). The distributed disk cache files are accessed by participating Pneuron applications to perform operations selectively on the distributed disk data. Multiple operations are performed simultaneously by the different Pneurons with results evaluated and subsequent iteration operations applied. Evaluated results are concatenated and federated together across the different disk cache files simultaneously.
  • Disk cache files are removed automatically using a high-low disk evaluation model to remove disk cache files based on server disk utilization and automatic evaluation aging for disk cache files. The present invention enables the ability to quickly access target systems and data sources and generate distributed disk cache files, to simultaneously perform real-time operations by other Pneuron programs and to federate the results together. These activities occur without requiring preparation of the data.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • These and other features and advantages of the present invention will be better understood by reading the following detailed description, taken together with the drawings wherein:
  • FIGS. 1A-1B are a comparison of the prior art process execution with the distributed cache model according to one embodiment of the present invention;
  • FIG. 2 is an overview of the dynamic memory mapping tree according to one embodiment of the present invention;
  • FIG. 3 is an overview of distributed disk cache removal model scenarios according to one embodiment of the present invention; and
  • FIG. 4 is a block diagram of a system on which may be implemented the present invention.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • The present invention features a system and method for large data processing and requesting reconstruction. The system 100, FIG. 4 and method includes the capacity for large data processing considerations (targeting record, queries and responses of 1 million and higher results). The invention provides for the ability to access and acquire information from large data files 102 (greater than 1 million records) and rapidly provide the information to subsequent Pneurons for processing combined with the ability to extract and render large queries from databases 102 without impacting system of records processing and rapidly provide the information to subsequent Pneurons for processing. The system also has the ability for multi-threaded processing by multiple distributed pneurons 104 of large input/record sets files, enabling of storage and access to large historical results and the ability to handle large inputs. The invention provides the ability to store or persist large result sets. For example, a million plus raw data evaluation may generate a very large array of intelligence results that need to be persisted for future use, which might occur with time-series data with multiple month-years and multiple intelligence results for each intelligence record. Further, the invention is able to deal with sequential as well as asynchronous parallel processing, is able to address large unstructured data; web logs, email, web pages, etc. and is able to handle failures to large block processing.
  • The design considerations of the present invention are focused on maximizing distributed processing workload (volumes, results and requests) without running out of resources; e.g. hardware resources, including memory and CPU. The solution consists of essentially three specific interoperable but distributed functions. First, the Adaptor/Cache Pneuron 14 and distributed disk cache files 30, 32. Second, the dynamic mapping tree 50, FIG. 3. Third, the distributed disk cache file cleanup FIG. 4. Each function will be described in greater detail below.
  • The Adaptor/Cache Pneuron 14 (and/or distributed adaptor/cache pneurons 104) and distributed disk cache files 34 address the problem of extremely large record set processing which presents different technology challenges. Some of the current problems in the prior art include: loading all information into memory will exceed hardware server resources; and breaking up large requests presents complexities in consolidating and synchronizing the information results together and multiple operations may be required at different times by different programs across one or more large record sets.
  • The present invention solves these problems by extracting large record sets from target systems 102 and data sources and converting them into distributed disk cache files 34. The disk-based intermediate cache files and processing is coordinated by and across multiple Pneurons 104 to perform multiple simultaneous operations on the information (distributed disk cache files 34). A comparison of the prior art system of process execution (FIG. 1A) and the distributed cache model of the present invention (FIG. 1B) is shown in FIG. 1.
  • The cache file based system 10 of the present invention will store the large requests within self-describing ASCII files and make these files (the data within them) available to any Pneuron that needs to access them. Large data requests 12, are received and processed by the Adapter Pneuron 14. The Adapter Pneuron 14 transforms the large data requests into ASCII file content (extended CSV format—including the attribute type definition), and saves the ASCII file content on the local host hard drive. Once a request is received, the Adapter Pneuron 14 will send to all its associated Pneuron connections 104 a special message that will announce that new work is available and the data can be accessed from the referred files from the target disk cache location 30, 32 on the file system. This process will perform in the same manner even if the request is composed from multiple batches, thereby allowing the request to be reconstructed. All of the Pneurons will interact with this model approach. The Adapter Pneuron 14 maintains context of each distributed cache file and provides system context to each participating Pneuron. Context includes the definition of the cached file format and information elements and location of the file. Participating Pneurons are able to parse the cached/adaptor Pneuron information and perform different operations.
  • Once the data has been cached, the Adapter Pneuron 14 will send to subsequently connected Pneurons 104 a special message 15 that will announce to all configured and associated Pneurons that new work is available and the Pneurons can execute their operations on the disk cache files. The system includes a utility that enables the calling Pneurons 104 to transform to and from XmlMessage to the target extended CSV extended file format.
  • As a result, the invention greatly simplifies the access and operations on the distributed disk cache data and provides a common abstraction layer and interface for the Pneurons to access and perform operations on the data. The Pneurons only need to read the referred file content and transform the information into usable XmlMessage Type data. In addition, the Pneurons can filter and extract only the necessary attributes as vectors or other objects and optimize the memory management resources.
  • This invention therefore provides many critical transformational benefits. The data is accessed and managed at targeted server 106 locations on the respective filing system 30, 32 such that requests do not need to be reconstructed, which saves processing time and reduces complexity. The system ability to process very large amounts of data is significant and unconstrained. Within the actual memory processing, the information is streamlined. Only reference and common messages and pointers are included. The distributed cache file model enables a planning mechanism to be implemented to optimize the resources and synchronize distributed cache file access and processing. The messages do not require any complex logical operations that will require the file structure to change. The system will be fully capable of handling the CRUD operations (create-add new entry/record; read-record; update-record; and delete-record). This solution will work for all cases where the entity (large request—as a whole) will retain its integrity/structure.
  • The dynamic mapping tree model shown for example in FIG. 2 is implemented to support the Adaptor Pneuron. The memory mapping enables a large data processing request transaction to retain its processing integrity from initiation through completion of an execution. By retaining processing integrity, the representation and all the data characteristics will be retained and accessible during the request life cycle. Data representation defines the meta-data characteristics of the information, including the way that the data is stored on the file system, the number of files, file types, data definition (attribute definition), request references etc.
  • In order to manage the distributed disk caching model, the invention enables the following operations to be performed on the disk cache files: Create—add new record within the large request; Read—access one or more records from the large request; Update—update/modify the data for one or more records; and Delete—delete one or more records. Given the synchronization and management complexities, the invention restricts the following functions: batching, duplicate batches and conditional batches.
  • To manage the distribution complexity of multiple disk cache files, the invention maintains and adjusts the system context dynamically. This model enables automatic changes to the data representation and structure. A programmatic change history tracking is maintained, which keeps track of changes applied to the disk cache file(s). This feature enables automatic reconstruction of the disk cache file at any given time to support a Pneuron initiated operation and request. The present invention has implemented a programmatic process to decompose large data sets into request into smaller batches. The batches are organized into parallel execution requests and configured as part of the Pneuron Networks definition.
  • A dynamic memory tree map, FIG. 2, is implemented to manage the distributed cache process across multiple Pneurons. The dynamic tree maintains and provides system context for the entire distributed processing model and plan. The entire processing life cycle is maintained. Each node/leaf within the dynamic tree will contain a file reference or a position/index reference and then point the Pneuron request message to the corresponding memory area. The dynamic memory tree map establishes a breadcrumb trail. Using this approach, the system is able to reconstruct the request with the new values by traversing the memory tree. The system merges and reconstructs the disk cache results based on the specific request. The same logic and approach is also applied for the Large Request Reconstruction, which enables a generic distributed disk cache operation model to be applied at the Pneuron Base Level.
  • The system will apply different solutions based on the context and type of operation. Dead (empty) messages are still sent out through the network. When a batch gets split in two or more sub-batches they are flagged. By doing this the system will be able to track the messages. The final Pneuron should have a max dead time interval, which will represent the time that it will wait for more batches. This time is checked/validated with the last batch arrival time. Each time a batch gets split the characteristic flag is appended with additional information meant to inform about the split. Example: 1/1-3/15-1/3-6/7-4/4. SPLIT is defined as [Position/Number Of Message/Batch]/[Total Number Of Messages]. Each time a batch gets split request, the split information will be appended to the current flag, which will be done for each split/sub batch. By the time the message reaches the Final Pneuron, the Pneuron will be able to establish the context based on the amount of information that it receives, and the Pneuron will be ready to create an execution tree, such as the one detailed in FIG. 2. This approach is based on the fact that when the Final Pneuron receives a batch request, it will be able to trace it and complete (or start if it is the first batch from a large request) based on the defined execution tree. Any sub-batch that is received is able to communicate with the Pneuron of all the tree node parents and also the number of “leafs” per split. With this approach the Final Pneuron will be able to map out what it should receive, also the information that it receives can be ordered.
  • There are scenarios where the requesting Pneuron is unable to interact with the distributed cache disk. Examples could include: (1) The target data source or system is not available for access by the Adapter Pneuron and the disk file cache cannot be created; and (2) The file system where the disk cache file is stored is not available. An Idle or Dead Time interval model can be implemented to manage this scenario, such that the Idle or Dead Time interval establishes a periodic mechanism to compose the message and send it further (or execute the request). The Idle or Dead Time interval evaluates each past request and the elapsed time when the last batch was received and the execution trigger.
  • Finally, the distributed disk cache file clean up portion of the process 28, FIG. 3, provides users with the capability of caching data, within the entire system, on all the hosts 106 that are running the platform (distributed pneuron 104). The cache is a file system 34 based mechanism that transforms and stores them indefinitely making them available. to one or more worker process pneurons. Since the invention is dealing with a highly distributed system that provides value by providing the users with parallel computing capabilities, all the resources that are used within this computing process must be available at each host level (that takes part of the parallel execution). In doing so, each host will own a copy for each cache data that it will process. This creates a big problem because the hardware resources, hard drive space in this case is not unlimited, and since each host must have a local copy of the cached job the system does not deal with replication (duplicate resources—at different host levels).
  • Therefore, the present invention has implemented a High-Low distributed disk cache removal model. The invention configures properties for each host 106 (either a physical server or virtual server machine). The host Max Available Space property establishes the amount of bytes (megabytes or even gigabytes) that can be used by the caching system 34 on that specific server 106. Once this max threshold is reached, the system will delete existing cache files based on the size and age of the distributed cache file. This model will eliminate past files and enable new disk files to be established and used. The cache file system will be bounded with these rules; in this case the only rule/limitation that we need is to have a maximum level of space that it can be used in order to store the working cache files. This maximum level of space that can be used will be stored within the Softwarerx.Properties file 36 from CFG directory, because this is a centralized storage point for all the properties and attributes that must or can't be stored within the database.
  • The following examples are intended to provide details on how the distributed disk file clean up functions in the present system. In a first example, a save cache data request 38 is requested/received and max space has not been reached on the host server 30/32. In this scenario, a Pneuron issues a request 38 to save data into the cache data file system 34. The request reaches the SAN (Storage Area Network or Cache System/process) 40. The system checks the Max Space configured value 36. The system 28 compares the Max Space with the actual available space on the local hard drive, which is the hard drive where the host system 106 is running, or more exactly where the “cache” directory file system 34 is found. In this first example there is sufficient space to save the information; therefore the system 28 will save the information 42 with the provided data (reference name/file name) in the file system 34.
  • In a second example, a save cache data request is requested and max space has been reached. In this scenario, a Pneuron issues a request to save data into the cache data system. The request reaches the SAN (Storage Area Network or Cache System). The system checks the Max Space configured value. The system compares the Max Space with the actual available space on the local hard drive, which is the hard drive where the system is running, or more exactly where the “cache” directory is found. The system determines 44 there is NO sufficient space to save the information. The system orders the existing cache data in descending order based upon the creation date. Then a loop occurs, which deletes the oldest file 46 and then re-checks to see if there is sufficient space. The loop ends once sufficient space is cleared or if there is nothing else to delete. If the system has sufficient space to save, then the information is saved 42 with the provided data (reference name/file name).
  • In a third example, a save cache data request is requested and max space has been reached, however the system is unable to make sufficient space. In this scenario, a Pneuron issues a request to save data into the cache data system. The request reaches the SAN (Storage Area Network or Cache System). The system checks the Max Space configured value. The system compares the Max Space with the actual available space on the local hard drive, which is the hard drive where the system is running, or more exactly where the “cache” directory is found. The system finds there is NO sufficient space to save the information. The system orders the existing cache data descending based upon the creation date. A loop is created, such that the oldest file is deleted and then the system re-checks to see if there is sufficient space. In this example, the system deletes all old files 46 and checks again for sufficient space and determines that there is not sufficient space and there is nothing else to delete, thereby ending the loop. In this example, the system does not have sufficient space to save and the system will register a failure.
  • In a fourth example, a system is able to get cache data when a local copy is available. In this scenario, the cache system receives a request 48 to get a specific data. This request can be issued by any Pneuron Instance that is supposed to use the cached data and needs to get a reference to the local file copy in order to read and parse/analyze or otherwise utilize the necessary information. The system receives a request to get cache data 48. The system process cache 50 checks to see if the cached data is found within the local file system 34. The cache data is found to exist 52 within the local file system. Return reference to cache data 54. The caller will then be able to use the data.
  • In a fifth example, a system is unable to get cache data because a local copy is not available. In this scenario, the cache system 30, 32 receives a request to get specific data 48. This request can be issued by any Pneuron Instance that is supposed to use the cached data and needs to get a reference to the local file copy in order to read and parse/analyze or otherwise utilize the necessary information. The system receives a request to get cache data 48. The system cache process 50 checks to see if the cached data is found within the local file system 34 a. The system determines that the cache data DOES NOT EXIST within the local file system. The Current Cache System asks the other registered host 32 by calling their associated cache system process 50 a which check for existence of the data. A loop is created, such that the Foreign Cache file system 34 b of server 32 is checked for data 56, then the data is found, and then the data is copied locally 58. The loop ends when there are no more hosts/cache systems to search or once the cache data is found. Return reference to cache data 58. The caller host 30 will then be able to use the cached data.
  • In a sixth example, a system is unable to get cache data because a local copy is not available anywhere. The cache system receives a request to get a specific data. This request can be issued by any Pneuron Instance that is supposed to use the cached data and needs to get a reference to the local file copy in order to read and parse the necessary information. The system receives a get cache data request. The system checks to see if the cached data is found within the local file system. The system determines that cache data DOES NOT EXIST within the local file system. The Current Cache System asks the other registered host by calling their associated cache systems 32 and checking for the data existence. A loop is created, wherein the system checks the Check Foreign Cache System for data and determines that the data is not found. The loop ends once there are no more hosts/cache systems to check and no cache data has been found. The system determines that the data was not found. A failure has occurred.
  • In summary, the present invention enables the real-time generation, management, and synchronization of distributed disk caches within a highly distributed processing environment. The process deconstructs and organizes large data sets acquired from disparate systems and data sources across an unlimited number of physical servers and virtual machines. An abstraction layer is applied across all distributed disk cache files. Multiple distributed Pneurons perform simultaneous operations across one or more disk cache files. Processing is synchronized automatically. The system maintains an in-memory mapping tree to maintain distributed interactions and provides the ability to dynamically construct and deconstruct the distributed cache files into any form. The distributed cache model enables synchronized federation of selected information from multiple distributed cache files automatically and as part of the Pneuron processing. The invention allows Pneuron to use existing client disk capacity and obtain and utilize targeted large data cache files on demand and without preparing aggregated data stores. As a result, businesses benefit by foregoing large data preparation activities.
  • Modifications and substitutions by one of ordinary skill in the art are considered to be within the scope of the present invention, which is not to be limited except by the allowed claims and their legal equivalents.

Claims (3)

The invention claimed is:
1. A method for processing large data and analysis requests, said method comprising the following acts:
acquiring data in real-time from one or more application data sources by a data processing pneuron, said data processing pneuron configured for operating under control of a computer program product, said computer program product comprising a computer program;
transforming said data into self-describing ASCII disk cache files by said data processing pneuron;
distributing said disk cache files across one or more servers or virtual machines by an adaptor cache pneuron;
allowing access to said distributed disk cache files by participating applications, wherein said applications are configured to perform operations on said distributed disk data; and
removing said disk cache files automatically using a high-low disk evaluation model to remove said distributed disk cache files based on server disk utilization and automatic evaluation aging for said distributed disk cache files.
2. A system for processing large data and analysis requests, said system comprising:
a pneuron server configured for operating under control of a computer program product, said computer program product comprising a computer program, for deploying a data processing pneuron for acquiring data in real-time from one or more application data sources, said data processing pneuron for transforming said data into one or more self-describing ASCII disk cache files;
said pneuron server configured for deploying one or more adaptor cache pneuron, said adaptor cache pneuron operating under control of a computer program product, said computer program product comprising a computer program, for distributing said one or more disk cache files across one or more servers or virtual machines;
said one or more servers configured for allowing access to said distributed disk cache files by one or more distributed worker pneurons, wherein said distributed worker pneurons are configured to perform operations on said distributed disk data; and
said one or more worker pneurons configured removing said disk cache files automatically using a high-low disk evaluation model to remove said distributed disk cache files based on server disk utilization and automatic evaluation aging for said distributed disk cache files.
3. The system of claim 2 wherein said one or more worker pneurons configured removing said disk cache files are configured for removing disk cache files prior to storing disk cache files received from said adaptor pneuron.
US13/943,187 2009-08-28 2013-07-16 Method and process for enabling distributing cache data sources for query processing and distributed disk caching of large data and analysis requests Active US9542408B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US13/943,187 US9542408B2 (en) 2010-08-27 2013-07-16 Method and process for enabling distributing cache data sources for query processing and distributed disk caching of large data and analysis requests
US15/401,658 US10684990B2 (en) 2009-08-28 2017-01-09 Reconstructing distributed cached data for retrieval

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US12/870,348 US9659247B2 (en) 2009-08-28 2010-08-27 System and method for employing the use of neural networks for the purpose of real-time business intelligence and automation control
US13/442,353 US9558441B2 (en) 2009-08-28 2012-04-09 Legacy application migration to real time, parallel performance cloud
US201261672028P 2012-07-16 2012-07-16
US13/943,187 US9542408B2 (en) 2010-08-27 2013-07-16 Method and process for enabling distributing cache data sources for query processing and distributed disk caching of large data and analysis requests

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
US12/870,348 Continuation-In-Part US9659247B2 (en) 2009-08-28 2010-08-27 System and method for employing the use of neural networks for the purpose of real-time business intelligence and automation control
US13/442,353 Continuation-In-Part US9558441B2 (en) 2009-08-28 2012-04-09 Legacy application migration to real time, parallel performance cloud

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/401,658 Continuation US10684990B2 (en) 2009-08-28 2017-01-09 Reconstructing distributed cached data for retrieval

Publications (3)

Publication Number Publication Date
US20140012867A1 US20140012867A1 (en) 2014-01-09
US20140258315A9 true US20140258315A9 (en) 2014-09-11
US9542408B2 US9542408B2 (en) 2017-01-10

Family

ID=49879316

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/943,187 Active US9542408B2 (en) 2009-08-28 2013-07-16 Method and process for enabling distributing cache data sources for query processing and distributed disk caching of large data and analysis requests
US15/401,658 Active US10684990B2 (en) 2009-08-28 2017-01-09 Reconstructing distributed cached data for retrieval

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/401,658 Active US10684990B2 (en) 2009-08-28 2017-01-09 Reconstructing distributed cached data for retrieval

Country Status (1)

Country Link
US (2) US9542408B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140214888A1 (en) * 2013-01-31 2014-07-31 Splunk Inc. Supplementing a high performance analytics store with evaluation of individual events to respond to an event query

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9020868B2 (en) * 2010-08-27 2015-04-28 Pneuron Corp. Distributed analytics method for creating, modifying, and deploying software pneurons to acquire, review, analyze targeted data
US10754699B2 (en) * 2012-08-05 2020-08-25 International Business Machines Corporation Remote provisioning of virtual appliances for access to virtualized storage
US10884991B1 (en) * 2014-03-14 2021-01-05 Jpmorgan Chase Bank, N.A. Data request analysis and fulfillment system and method
US11829349B2 (en) 2015-05-11 2023-11-28 Oracle International Corporation Direct-connect functionality in a distributed database grid
CN105069151A (en) * 2015-08-24 2015-11-18 用友网络科技股份有限公司 HBase secondary index construction apparatus and method
US10140461B2 (en) * 2015-10-30 2018-11-27 Microsoft Technology Licensing, Llc Reducing resource consumption associated with storage and operation of containers
US10567460B2 (en) * 2016-06-09 2020-02-18 Apple Inc. Managing data using a time-based directory structure
US10719446B2 (en) * 2017-08-31 2020-07-21 Oracle International Corporation Directly mapped buffer cache on non-volatile memory
CN108038226A (en) * 2017-12-25 2018-05-15 郑州云海信息技术有限公司 A kind of data Fast Acquisition System and method
US11119915B2 (en) 2018-02-08 2021-09-14 Samsung Electronics Co., Ltd. Dynamic memory mapping for neural networks
CN109885771B (en) * 2019-02-26 2020-06-30 紫光云引擎科技(苏州)有限公司 Application software screening method and service equipment
US11943098B2 (en) * 2019-04-08 2024-03-26 Telefonaktiebolaget Lm Ericsson (Publ) Management model for node fault management

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6442549B1 (en) * 1997-07-25 2002-08-27 Eric Schneider Method, product, and apparatus for processing reusable information
US20090182778A1 (en) * 2006-03-20 2009-07-16 Swsoft Holdings, Ltd. Managing computer file system using file system trees
US7689599B1 (en) * 2005-01-31 2010-03-30 Symantec Operating Corporation Repair of inconsistencies between data and metadata stored on a temporal volume using transaction log replay
US7831643B1 (en) * 2002-03-27 2010-11-09 Parallels Holdings, Ltd. System, method and computer program product for multi-level file-sharing by concurrent users
US20110145363A1 (en) * 2009-12-16 2011-06-16 International Business Machines Corporation Disconnected file operations in a scalable multi-node file system cache for a remote cluster file system
US20140317031A1 (en) * 2013-04-23 2014-10-23 Dropbox, Inc. Application recommendation

Family Cites Families (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6289450B1 (en) 1999-05-28 2001-09-11 Authentica, Inc. Information security architecture for encrypting documents for remote access while maintaining access control
US6636242B2 (en) 1999-08-31 2003-10-21 Accenture Llp View configurer in a presentation services patterns environment
US6757689B2 (en) 2001-02-02 2004-06-29 Hewlett-Packard Development Company, L.P. Enabling a zero latency enterprise
AU2002242033A1 (en) 2001-02-16 2002-09-04 Idax, Inc. Decision support for automated power trading
IL159685A0 (en) 2001-07-05 2004-06-20 Computer Ass Think Inc System and method for identifying and generating business events
US7016886B2 (en) 2001-08-10 2006-03-21 Saffron Technology Inc. Artificial neurons including weights that define maximal projections
US7369984B2 (en) 2002-02-01 2008-05-06 John Fairweather Platform-independent real-time interface translation by token mapping without modification of application code
US6946715B2 (en) 2003-02-19 2005-09-20 Micron Technology, Inc. CMOS image sensor and method of fabrication
US20040122937A1 (en) 2002-12-18 2004-06-24 International Business Machines Corporation System and method of tracking messaging flows in a distributed network
GB0308262D0 (en) * 2003-04-10 2003-05-14 Ibm Recovery from failures within data processing systems
US7010513B2 (en) 2003-04-14 2006-03-07 Tamura Raymond M Software engine for multiple, parallel processing with neural networks
US7529722B2 (en) 2003-12-22 2009-05-05 Dintecom, Inc. Automatic creation of neuro-fuzzy expert system from online anlytical processing (OLAP) tools
US20060184410A1 (en) 2003-12-30 2006-08-17 Shankar Ramamurthy System and method for capture of user actions and use of capture data in business processes
US10796364B2 (en) 2004-04-15 2020-10-06 Nyse Group, Inc. Process for providing timely quality indication of market trades
US7706401B2 (en) 2004-08-13 2010-04-27 Verizon Business Global Llc Method and system for providing interdomain traversal in support of packetized voice transmissions
US7557707B2 (en) 2004-09-01 2009-07-07 Microsoft Corporation RFID enabled information systems utilizing a business application
US8266237B2 (en) 2005-04-20 2012-09-11 Microsoft Corporation Systems and methods for providing distributed, decentralized data storage and retrieval
US20070078692A1 (en) 2005-09-30 2007-04-05 Vyas Bhavin J System for determining the outcome of a business decision
US20070255713A1 (en) * 2006-04-26 2007-11-01 Bayhub, Inc. Checkpoint flow processing system for on-demand integration of distributed applications
US7881990B2 (en) 2006-11-30 2011-02-01 Intuit Inc. Automatic time tracking based on user interface events
US8468244B2 (en) 2007-01-05 2013-06-18 Digital Doors, Inc. Digital information infrastructure and method for security designated data and with granular data stores
US8655939B2 (en) 2007-01-05 2014-02-18 Digital Doors, Inc. Electromagnetic pulse (EMP) hardened information infrastructure with extractor, cloud dispersal, secure storage, content analysis and classification and method therefor
US8626844B2 (en) 2007-03-26 2014-01-07 The Trustees Of Columbia University In The City Of New York Methods and media for exchanging data between nodes of disconnected networks
AU2008356120A1 (en) 2007-11-07 2009-11-12 Edsa Micro Corporation Systems and methods for real-time forecasting and predicting of electrical peaks and managing the energy, health, reliability, and performance of electrical power systems based on an artificial adaptive neural network
US20100064033A1 (en) 2008-09-08 2010-03-11 Franco Travostino Integration of an internal cloud infrastructure with existing enterprise services and systems
US20100077205A1 (en) 2008-09-19 2010-03-25 Ekstrom Joseph J System and Method for Cipher E-Mail Protection
US8069242B2 (en) 2008-11-14 2011-11-29 Cisco Technology, Inc. System, method, and software for integrating cloud computing systems
SG175215A1 (en) 2009-04-15 2011-11-28 Virginia Polytechnic Inst Complex situation analysis system
US8490087B2 (en) 2009-12-02 2013-07-16 International Business Machines Corporation System and method for transforming legacy desktop environments to a virtualized desktop model
WO2011112917A2 (en) 2010-03-11 2011-09-15 Entegrity LLC Methods and systems for data aggregation and reporting
US20120102103A1 (en) 2010-10-20 2012-04-26 Microsoft Corporation Running legacy applications on cloud computing systems without rewriting
US9330141B2 (en) 2011-09-29 2016-05-03 Cirro, Inc. Federated query engine for federation of data queries across structure and unstructured data
WO2013049713A1 (en) 2011-09-30 2013-04-04 Cirro, Inc. Spreadsheet based data store interface

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6442549B1 (en) * 1997-07-25 2002-08-27 Eric Schneider Method, product, and apparatus for processing reusable information
US7831643B1 (en) * 2002-03-27 2010-11-09 Parallels Holdings, Ltd. System, method and computer program product for multi-level file-sharing by concurrent users
US7689599B1 (en) * 2005-01-31 2010-03-30 Symantec Operating Corporation Repair of inconsistencies between data and metadata stored on a temporal volume using transaction log replay
US20090182778A1 (en) * 2006-03-20 2009-07-16 Swsoft Holdings, Ltd. Managing computer file system using file system trees
US20110145363A1 (en) * 2009-12-16 2011-06-16 International Business Machines Corporation Disconnected file operations in a scalable multi-node file system cache for a remote cluster file system
US20140317031A1 (en) * 2013-04-23 2014-10-23 Dropbox, Inc. Application recommendation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140214888A1 (en) * 2013-01-31 2014-07-31 Splunk Inc. Supplementing a high performance analytics store with evaluation of individual events to respond to an event query
US9128985B2 (en) * 2013-01-31 2015-09-08 Splunk, Inc. Supplementing a high performance analytics store with evaluation of individual events to respond to an event query

Also Published As

Publication number Publication date
US20140012867A1 (en) 2014-01-09
US10684990B2 (en) 2020-06-16
US20170185621A1 (en) 2017-06-29
US9542408B2 (en) 2017-01-10

Similar Documents

Publication Publication Date Title
US10684990B2 (en) Reconstructing distributed cached data for retrieval
US11816126B2 (en) Large scale unstructured database systems
US11403321B2 (en) System and method for improved performance in a multidimensional database environment
US9507807B1 (en) Meta file system for big data
EP3545431A1 (en) Event driven extract, transform, load (etl) processing
US20220114064A1 (en) Online restore for database engines
DE202009019139U1 (en) Asynchronous distributed deduplication for replicated content-addressed storage clusters
CA2910211A1 (en) Object storage using multiple dimensions of object information
US20210303537A1 (en) Log record identification using aggregated log indexes
Merceedi et al. A comprehensive survey for hadoop distributed file system
CN103365987A (en) Clustered database system and data processing method based on shared-disk framework
US10095738B1 (en) Dynamic assignment of logical partitions according to query predicate evaluations
Zhou et al. Sfmapreduce: An optimized mapreduce framework for small files
US9275059B1 (en) Genome big data indexing
CA2918472C (en) A method and process for enabling distributing cache data sources for query processing and distributed disk caching of large data and analysis requests
US11568067B2 (en) Smart direct access
US11341163B1 (en) Multi-level replication filtering for a distributed database
US20240104074A1 (en) Location-constrained storage and analysis of large data sets
US11853319B1 (en) Caching updates appended to an immutable log for handling reads to the immutable log
US20240004867A1 (en) Optimization of application of transactional information for a hybrid transactional and analytical processing architecture
Ismail et al. HopsFS-S3: Extending Object Stores with POSIX-like Semantics and more (industry track)
Johnson et al. Big data processing using Hadoop MapReduce programming model
Chawla Optimizing the Resource utilization of Enterprise Content management workloads through measured performance baselines and dynamic topology adaptation
CN117495288A (en) Data asset full life cycle management system and method
CN116561063A (en) Lustre file life cycle management system

Legal Events

Date Code Title Description
AS Assignment

Owner name: PNEURON CORP., NEW HAMPSHIRE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MOSS, SIMON BYFORD;ELKINS, ELIZABETH WINTERS;BACHELOR, DOUGLAS WILEY;AND OTHERS;SIGNING DATES FROM 20131121 TO 20140331;REEL/FRAME:032666/0554

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: UST GLOBAL (SINGAPORE) PTE. LTD., SINGAPORE

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PNEURON CORP.;REEL/FRAME:048126/0590

Effective date: 20180112

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment: 4

AS Assignment

Owner name: CITIBANK, N.A., AS AGENT, TEXAS

Free format text: SECURITY INTEREST;ASSIGNOR:UST GLOBAL (SINGAPORE) PTE. LIMITED;REEL/FRAME:058309/0929

Effective date: 20211203