US20120095957A1 - Component Based Approach to Building Data Integration Tools - Google Patents

Component Based Approach to Building Data Integration Tools Download PDF

Info

Publication number
US20120095957A1
US20120095957A1 US13/252,630 US201113252630A US2012095957A1 US 20120095957 A1 US20120095957 A1 US 20120095957A1 US 201113252630 A US201113252630 A US 201113252630A US 2012095957 A1 US2012095957 A1 US 2012095957A1
Authority
US
United States
Prior art keywords
query
target
source
queries
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/252,630
Inventor
Sreedhar Sannareddy Reddy
Prasenjit DAS
Raghavendra Reddy YEDDULA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tata Consultancy Services Ltd
Original Assignee
Tata Consultancy Services Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tata Consultancy Services Ltd filed Critical Tata Consultancy Services Ltd
Assigned to TATA CONSULTANCY SERVICES LIMITED reassignment TATA CONSULTANCY SERVICES LIMITED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: YEDDULA, RAGHAVENDRA REDDY, DAS, PRASENJIT, REDDY, SREEDHAR SANNAREDDY
Publication of US20120095957A1 publication Critical patent/US20120095957A1/en
Priority to US14/551,766 priority Critical patent/US10606861B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/256Integrating or interfacing systems involving database management systems in federated or virtual databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24535Query rewriting; Transformation of sub-queries or views
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation

Definitions

  • the present invention relates to data integration and more particularly, to a method of composing purpose specific integration tools for facilitating migration from one integrating platform solution to another.
  • data integration refers to the problem of combining data residing in heterogeneous sources and currently, it relates to wide range of technologies, from extract, transform and load (ETL) to enterprise application integration (EAI) to enterprise information integration (EII) and various change propagation technologies.
  • ETL extract, transform and load
  • EAI enterprise application integration
  • EII enterprise information integration
  • no single technology suffices for all needs of an enterprise and these needs keep changing with growing data volumes and changing business needs. Consequently, enterprises end up tweaking their integration systems continually and sometimes summarily moving them from one technology to another, to keep up with these demands. This consumes a lot of effort—by some estimates as much as 40% of all IT efforts in an enterprise.
  • a method of composing platform specific integration tools to facilitate easy migration from source implementation solution to target implementation solution is provided.
  • Another object of the invention is to identify primitive foundational components and combining them to build effective purpose specific integrating tools.
  • Yet another object of the present invention is to combine GAV and LAV mappings to model complex data entities existing in a hierarchy.
  • One of the objects of the present invention is to provide a query translation component for translation of query on source schema into an equivalent query on target schema.
  • Another object of the present invention is to provide query to data flow graph translation component to translate a given query into an equivalent data flow graph that can process data from multiple databases.
  • One of the objects of the present invention is to achieve performance optimization by performing a set of semantically invariant graph transformation operations on the generated data flow graph to obtain an optimal execution order on data flow operators.
  • a reference model driven architecture a set of representational formalisms and a set of primitive foundational components that conform to this architecture is provided, and a method of composing these components to derive purpose specific integration tools, the said method comprising processor implemented steps of:
  • mapping between source schema and global schema using mapping component selected from primitive foundational components
  • FIG. 100 represents well delineated hierarchical view of an enterprise in the present invention.
  • FIG. 200 highlights meta model to specify ETL data flow graphs.
  • FIG. 300 is a schematic illustration of an EII (Enterprise information integration) tool composed from primitive components.
  • EII Enterprise information integration
  • FIG. 400 depicts an integration tool that combines ETL and EII wherein ETL is used to build a warehouse and EII is used to access data uniformly from both warehouse and original sources.
  • FIG. 500 is a schematic representation of Data synchronization tool composed from primitive components.
  • FIG. 600 is a representation of Data migration tool for translating queries from source model to target model, and for generating ETL to migrate data from source database to target database.
  • FIG. 100 illustrates a hierarchical structure of an enterprise where the data exists at enterprise level, department level, function level and so on.
  • the data contained within an enterprise at hierarchical level needs to be modeled using richer conceptual models that allow modeling of data entities at multiple levels of abstraction and capture relationships existing between them.
  • the object model is chosen as a conceptual modeling language to define unified global schema.
  • the source model can however be defined at multiple levels using conceptual modeling framework for at least one level and conceptual model or a physical model for subsequent levels.
  • the model consists of Class, Attribute and Association.
  • Classes and associations can have subtype relationships. A subtype relationship can be further qualified as complete or incomplete and as overlapping or disjoint.
  • a class can specify a subset of its attributes as a key. Association cardinalities specify constraints on participating objects. Objects are identified by unique IDs. If a class has a defined key then its object ID is modelled as a function over the key attributes. Mappings between object models are specified in one of the following ways:
  • GAV Global as view mappings, where a class or an association at some level is specified as a view over the models at the next lower level.
  • LAV Local as view mappings, where a class or an association at some level is specified as a view over the model at the next higher level.
  • Rules that relate classes and associations of adjacent levels. Rules are essentially like GLAV mappings.
  • Subtype relationships where a class or an association at a lower level is specified as a subtype of a class or association at the higher level.
  • GAV mappings where a class is specified as a view over a set of relations. For a class that has a primary key defined, the view must specify all its primary key attributes.
  • LAV mappings where a relation is specified as a view over a set of classes.
  • the modelling between defined global schema and source schema employs a set of representational formalisms to represent queries, mappings and data flow graphs.
  • representational formalisms used to define conceptual modelling is illustrated below.
  • SQL structured query language
  • conceptual models are richer in structure, we need a more specialized query language that exploits this structure, hiding implementation details such as how the model is implemented in a relational database.
  • p.department.company is a path expression. It navigates associations from person to department and from department to company.
  • Rule 1 Path expressions can occur in the ‘where’ clause. Semantics are the same as in SQL—from the cartesian product of the relations listed in the ‘from’ clause, select those that satisfy the ‘where’ clause, and project out columns specified in the ‘select’ clause.
  • Path expressions have implicit existential quantification.
  • at least one employee of at least one department of a company must be a non Indian; and it is not required to have all employees of all departments of a company to be non Indians.
  • representational formalisms also include view definition language which is quite similar to the PSQL query language as discussed above. The only exception is that a view must specify attribute names of the target relation as aliases in the select clause.
  • the conceptual modelling framework may have a set of rules that specify relationships among its elements.
  • the rule language is thus defined to support and capture relationships which may be in the form of a constraint, a derivation or a mapping.
  • Rules can exist in a range of forms—from rules that can be efficiently executed by a machine to rules that can only be interpreted by humans. However, the scope of the present invention is confined to machine executable rules.
  • ⁇ (x) is a conjunction of atomic formulas over a model of interest with variables in x and with each variable in x occurring in at least one conjunct of ⁇ (x)
  • ⁇ (x, y) is a conjunction of atomic formulas over another model of interest (could be the same as the first) with variables among those in x and y.
  • the atomic formulas may comprise of object terms, property terms, association terms, arithmetic expressions, relational expressions and aggregation functions.
  • One may devise any suitable surface syntax to represent the rules (such as the popular IF-THEN notation) as long as they are reducible to the above general form.
  • representational formalism includes data flow graphs using a meta model as shown in FIG. 200 .
  • the model is abstract enough and rich enough to be mappable to ETL specifications of most commercial tools.
  • DFG Operator is defined as the super class of all operators while the sub classes include Source Qualifier, Target, Join, Union, LookUp, Expression, etc.
  • An operator has a set of input ports, a set of output ports and a set of associated expressions. An operator's input flows in through the input ports and its output flows out through the output ports. The semantics of the associated expressions depends on the operator type (e.g. for join it is a relational expression).
  • a port has an associated variable that binds to the values that flow in or flow out. The output port of one operator may be connected to the input port of another operator.
  • model modeling architecture and representational formalisms discussed above enable enterprise data assets to modeled and mapped at a conceptual level of abstraction, in an implementation independent manner
  • Mapping component helps develop mappings between models. Given two models it can discover correspondences between the models and interactively prompt a user to refine them into full-fledged mappings. It provides a user-friendly interface that hides much of the complexity of the mapping language discussed earlier. For instance, with respect to the example given earlier, when a user identifies Employee as the target and Emp and Dept as the sources, it can automatically discover the join between Emp and Dept by traversing the association between them.
  • Query translation component provides the following functionality:
  • Query equivalence testing component provides the following functionality:
  • This component provides the following functionality:
  • Input query may contain subgoals from multiple sources. These subgoals need to be segmented into source specific groups. Each such segment should then be factored out as a subquery that can be executed on a single source. Emphasis is made to maximize the size of each segment so that as many subgoals as possible are pushed into a single source query.
  • a graph data structure to identify query segments is constructed as described below.
  • Each maximally connected subgraph that contains at least one relation node is a separate query segment.
  • Such query segments can be separated out as subqueries.
  • the head variables of a subquery are determined by selecting the variables of the query segment that are required outside the segment (i.e. either in the query head or in other query segments). Calls to these subqueries together with the remaining conditions make up the original query.
  • subquery Before forming a subquery for a query segment it is checked if there already exists a subquery that has an equivalent query segment. If so, the subquery is reused to assist in performance optimization. This can be well illustrated using an example described below:
  • R1, R2 & R3 are from source S1 and R4, R5 are from source S2.
  • v4 is not there in the head since it is not required outside the query segment.
  • Outer query corresponds to the target
  • inner queries correspond to the intermediate nodes
  • source-specific queries correspond to the sources.
  • ⁇ Qk-var-list-i> is one of the inputs -- corresponding to the subgoal Qk(..) -- to the JOIN operator corresponding to ‘Qi(..) :- ⁇ bodyi>, Qk( ⁇ Qk-var-list>)’.
  • DFG can be generated by mapping different query forms to their corresponding DFG operators.
  • the generated DFG can be optimized further by using techniques similar to the ones used in query graph optimization which are well documented in literature.
  • DFG to query translation component This component provides the following functionality:
  • DFG data flow graph
  • the query corresponding to each operator has the following format:
  • Source Qualifier operator has an associated query which is simply dumped as is.
  • This query is then optimized to remove duplicate sub queries. This is done after it is converted to an internal representation. This is done to avoid reevaluation of the same sub query thereby contributing to performance optimization.
  • matching component which can be used to figure out matching entity instances in two databases. Its capability to provide such functionality is exemplified below:
  • Matching Component Given two database schemas S 1 and S 2 that are mapped to a common global schema G, and given a specification to detect duplicate instances in G (i.e. a deduplication specification over G), and given two database instances I 1 , and I 2 of schemas S 1 and S 2 respectively, output all matching object pairs from I 1 and I 2 as tuples of the form ⁇ S1: ⁇ entity>, S2: ⁇ entity>, ⁇ pk1-value tuple>, ⁇ pk2-value tuple>>.
  • each tuple identifies an entity instance in and its corresponding matching entity instance in I 2 .
  • This list of tuples is called as an object map.
  • the deduplication specification essentially identifies when two instances of an entity in the global schema refer to the same underlying entity.
  • Platform specific component A platform specific component translates between internal representations of queries and DFGs and their platform specific representations. One such component exists for each supported platform. They provide the import/export bridge between the tool and third-party platforms, and thus play a key role in the overall architecture.
  • EII tools enable a unified view of data by mapping a set of database schemas to a unified global schema. They provide a query interface in terms of the global model. The tool translates these queries into data source specific queries and assembles the results returned by the data sources. Composition of an EII tool from the primitive components is shown in FIG. 300 .
  • the method for constructing EII tool using defined foundational components includes the steps of:
  • An ETL tool from the primitive components can be composed as shown in FIG. 400 , the method describing which is as follows:
  • a tool to synchronize data sources can be composed as follows:
  • Data migration involves not only database migration but migration of various data access programs as well, such as queries, stored procedures, ETL specifications, etc.
  • the present invention besides providing a solution for creating purpose specific integration tools also discloses a method by which migration from one solution architecture to another can be implemented.
  • the interoperability among different integrating tools available is discussed below.
  • Migrating from one solution architecture to another The migration process from one solution to another is illustrated by the cases involving currently available integration tools.
  • a method of migrating from an EII solution platform to an ETL platform solution is provided using the primitive foundational components.
  • a method of migrating from an EII solution platform to an ETL platform solution is provided using the primitive foundational components.
  • a method of migrating from an ETL solution to an EII platform solution uses purpose specific tools for such migration composed from a set of primitive foundational components.
  • the method uses purpose specific tools for such migration composed from a set of primitive foundational components.
  • the proposed conceptual modelling architecture allows enterprise integration to be modelled at a semantic level, capturing rules and constraints that govern the behaviour of models and their mappings and the set of primitive components presented are sufficiently general purpose in nature that can be used to compose purpose specific integration tools.
  • the integration architecture here presented in the disclosure thus enables easy migration from one integration solution to another.

Abstract

A method facilitating interoperability across the integrating platforms is provided by composing purpose specific integrating tools using conceptual modeling framework and a set of primitive foundational components. The model driven architecture provides easy migration from one solution architecture to another. Also, the present invention discloses a method to migrate from one implementation to another using set of composed purpose specific integration tools.

Description

    FIELD OF THE INVENTION
  • The present invention relates to data integration and more particularly, to a method of composing purpose specific integration tools for facilitating migration from one integrating platform solution to another.
  • BACKGROUND OF THE INVENTION
  • The term data integration refers to the problem of combining data residing in heterogeneous sources and currently, it relates to wide range of technologies, from extract, transform and load (ETL) to enterprise application integration (EAI) to enterprise information integration (EII) and various change propagation technologies. There has been extensive theoretical research on data integration systems, exploring various mapping systems and languages, and their complex results and limitations. However no single technology suffices for all needs of an enterprise and these needs keep changing with growing data volumes and changing business needs. Consequently, enterprises end up tweaking their integration systems continually and sometimes summarily moving them from one technology to another, to keep up with these demands. This consumes a lot of effort—by some estimates as much as 40% of all IT efforts in an enterprise.
  • One reason why this consumes so much effort is the rigidity of the available integration technologies. Once a solution is implemented in one of these technologies, moving to another is like implementing the entire solution afresh which requires large amount of time, efforts and computational resources. As a result, people end up building ad-hoc, quick-fix solutions, which over time leads to data fragmentation and inconsistencies. Keeping these fragments synchronized to avoid inconsistencies puts a lot of strain on these systems.
  • Lack of common reference architecture and lack of a common set of foundational primitives from which purpose specific solutions can be composed are the principal reasons for this state of affairs.
  • Though there exist a large number of vendors with tool offerings in ETL, EAI, data migration, EII and so on, each uses one's own proprietary technology with no interoperability, sometimes even among tools of the same category (E.g. Vendor X's ETL tool to vendor Y's ETL tool). Some vendors offer tools in many categories (for example both ETL and EII), but again with no interoperability between a tool of one category and a tool of another. The principal reason for lack of interoperability among tools of the same category (say ETL) is the lack of common reference architecture across tool implementations. The principal reason for lack of interoperability across categories (say ETL and EII) is that their specifications are too close to the implementation platform i.e. they are not at a level of abstraction that allows their semantics to be mapped easily.
  • Moreover, the data inter-operability problem arises from the fact that data, even within a single domain of application, is available at many different sites, in many different schemas, and even in different data models. The integration and transformation of such data has become increasingly important for many modern applications that need to support their users with informed decision making.
  • While number of useful approaches has been devised for designing and deploying specific integration processes, there remains a need for tools to enable easy migration of the integration processes themselves, once designed, among different technology platforms.
  • OBJECTS OF THE INVENTION
  • In accordance with the present invention, a method of composing platform specific integration tools to facilitate easy migration from source implementation solution to target implementation solution is provided.
  • It is an object of the present invention to employ model driven common reference data integration architecture and set of primitive foundational components for composing platform specific integrating tools.
  • Another object of the invention is to identify primitive foundational components and combining them to build effective purpose specific integrating tools.
  • It is an object of the invention to identify primitive foundational components and combining them to build effective purpose specific migration tools which can be directly employed to migrate from one solution architecture to another.
  • It is yet another object of the present invention to facilitate interoperability across widely available data integration technologies.
  • Yet another object of the present invention is to combine GAV and LAV mappings to model complex data entities existing in a hierarchy.
  • It is another object of the present invention to support complex relationships ranging from hierarchical relationships to association relationships existing between data models via model driven architecture.
  • It is yet another object of the present invention to use platform independent representational formalism to represent complex query language, mappings and data flow graphs employed to build model driven data integration architecture.
  • It is further object of the present invention to present data flow graphs using rich meta models capable of being mapped to required platform specification for representing data dependencies between numbers of operations.
  • One of the objects of the present invention is to provide a query translation component for translation of query on source schema into an equivalent query on target schema.
  • Another object of the present invention is to provide query to data flow graph translation component to translate a given query into an equivalent data flow graph that can process data from multiple databases.
  • One of the objects of the present invention is to achieve performance optimization by performing a set of semantically invariant graph transformation operations on the generated data flow graph to obtain an optimal execution order on data flow operators.
  • SUMMARY OF THE INVENTION
  • It is the purpose of the present invention to provide a system and method for composing platform specific integration tools to facilitate easy migration across the integrating platforms by composing purpose specific integrating tools using conceptual modeling framework and a set of primitive foundational components.
  • In one of the preferred embodiments of the present invention a reference model driven architecture, a set of representational formalisms and a set of primitive foundational components that conform to this architecture is provided, and a method of composing these components to derive purpose specific integration tools, the said method comprising processor implemented steps of:
  • modeling the data at multiple levels of abstraction and capturing relationships, constraints and rules governing the behavior of the said models;
  • using the set of representational formalism to represent queries, mappings, rules and data flow graphs;
  • specifying set of primitive foundational components using which purpose specific data integrations tools are composed;
  • mapping between source schema and global schema using mapping component selected from primitive foundational components;
  • translating queries on a given model into equivalent target queries on other mapped data source models using primitive foundational components;
  • translating the target query into equivalent data flow graphs using primitive foundational components and;
  • executing the data flow graphs directly or translating the data flow graph into third party ETL and executing it thereof.
  • BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
  • The foregoing summary, as well as the following detailed description of preferred embodiments, are better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there is shown in the drawings example constructions of the invention; however, the invention is not limited to the specific methods and system disclosed. In the drawings:
  • FIG. 100 represents well delineated hierarchical view of an enterprise in the present invention.
  • FIG. 200 highlights meta model to specify ETL data flow graphs.
  • FIG. 300 is a schematic illustration of an EII (Enterprise information integration) tool composed from primitive components.
  • FIG. 400 depicts an integration tool that combines ETL and EII wherein ETL is used to build a warehouse and EII is used to access data uniformly from both warehouse and original sources.
  • FIG. 500 is a schematic representation of Data synchronization tool composed from primitive components.
  • FIG. 600 is a representation of Data migration tool for translating queries from source model to target model, and for generating ETL to migrate data from source database to target database.
  • DETAILED DESCRIPTION OF THE INVENTION
  • Some embodiments of this invention, illustrating all its features, will now be discussed in detail.
  • The words “comprising,” “having,” “containing,” and “including,” and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items.
  • It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. Although any systems and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present invention, the preferred, systems and methods are now described.
  • A variety of data integration tools exist, each with particular strengths and weaknesses. As a given user's needs evolve, the user may desire to move from using one tool to using another. A problem for such a user is that the user may have devoted significant time and resources to the development of data integration jobs using one tool and would not like to repeat the effort while moving to another tool. However, converting data integration jobs has to date required very extensive coding efforts. Thus, a need exists for improved methods and systems for converting data integration jobs that use one integration tool into data integration jobs that use a different integration tool.
  • As mentioned, there remains a need for the right reference architecture and a set of foundational primitives using which data integration problems can be addressed efficiently. The embodiments of the present invention addresses precisely this need by providing a method and a set of primitive components using which purpose specific integration tools can be composed efficiently.
  • Referring now to the drawings, there are shown preferred embodiments of the invention. FIG. 100 illustrates a hierarchical structure of an enterprise where the data exists at enterprise level, department level, function level and so on. The data contained within an enterprise at hierarchical level needs to be modeled using richer conceptual models that allow modeling of data entities at multiple levels of abstraction and capture relationships existing between them. The object model is chosen as a conceptual modeling language to define unified global schema. The source model can however be defined at multiple levels using conceptual modeling framework for at least one level and conceptual model or a physical model for subsequent levels.
  • The model consists of Class, Attribute and Association. Classes and associations can have subtype relationships. A subtype relationship can be further qualified as complete or incomplete and as overlapping or disjoint. A class can specify a subset of its attributes as a key. Association cardinalities specify constraints on participating objects. Objects are identified by unique IDs. If a class has a defined key then its object ID is modelled as a function over the key attributes. Mappings between object models are specified in one of the following ways:
  • GAV (Global as view) mappings, where a class or an association at some level is specified as a view over the models at the next lower level.
  • LAV (Local as view) mappings, where a class or an association at some level is specified as a view over the model at the next higher level.
  • Rules that relate classes and associations of adjacent levels. Rules are essentially like GLAV mappings.
  • Subtype relationships, where a class or an association at a lower level is specified as a subtype of a class or association at the higher level.
  • Mapping between an object model and a relational model is specified as follows:
  • GAV mappings, where a class is specified as a view over a set of relations. For a class that has a primary key defined, the view must specify all its primary key attributes.
  • LAV mappings, where a relation is specified as a view over a set of classes.
  • The modelling between defined global schema and source schema employs a set of representational formalisms to represent queries, mappings and data flow graphs. The description of representational formalisms used to define conceptual modelling is illustrated below.
  • Query Language: In general, structured query language (SQL) is used as a language for managing data in relational databases. Since conceptual models are richer in structure, we need a more specialized query language that exploits this structure, hiding implementation details such as how the model is implemented in a relational database.
  • At a minimum we need a language that supports association navigation. SQL does not offer any special constructs for association navigation as it is designed for the relational model. We extend SQL with path expressions to support association navigation. We call this extended language as PSQL. PSQL can be pre-processed into equivalent SQL. Suitable examples specifying the representational format and characteristic features of PSQL is demonstrated below.
  • We use the following conceptual model in the examples:
  • Classes:
      • Company, Department, Person
  • Associations:
      • Company.(department/company; 1:*).Department
      • Department.(employee/department; *:1).Person
  • Next, we present a set of rules that define the semantics of PSQL, and explain these rules with the help of examples.
  • EXAMPLE 1 List the Names of Persons and Their Companies
      • Select p.name, c.name
      • from Person p, Company c
      • where
        • p.department.company=c;
  • In this example p.department.company is a path expression. It navigates associations from person to department and from department to company.
  • Rule 1: Path expressions can occur in the ‘where’ clause. Semantics are the same as in SQL—from the cartesian product of the relations listed in the ‘from’ clause, select those that satisfy the ‘where’ clause, and project out columns specified in the ‘select’ clause.
  • EXAMPLE 2 List the Names of Companies that Have Foreign Nationals
      • Select c.name
      • from Company c
      • where
        • c.department.employee.nationality < > ‘Indian’;
  • Rule 2: Path expressions have implicit existential quantification. In the above expression, at least one employee of at least one department of a company must be a non Indian; and it is not required to have all employees of all departments of a company to be non Indians.
  • EXAMPLE 3 List the Names of All Persons Who Work for Company ‘X, and Their Departments
      • Select p.name, d.name
      • from Person p, Department d
      • where
        • p.department=d and d.company.name=‘X’;
    EXAMPLE 4 List the Names of Persons and Their Companies
      • Select p.name, p.department.company.name
      • from Person p;
  • Rule 3: Path expressions are allowed in ‘select’ as well, provided they are *:1 paths. The above example is correct because the cardinalities of both the associations involved are ‘*:1.
  • With regard to the above query language, it should be noted that it can be translated into an equivalent SQL format. Assuming a canonical representation where an association is stored as a relation with two columns corresponding to the two roles, path expressions can be translated to SQL format using the following rule.
  • Translation rule: Each association segment of a path expression, whether it occurs in where or select clause of the query, introduces two relations in the ‘from’ clause, corresponding to the association and the associated class, and corresponding join conditions in the where clause.
  • The above rule is elaborately discussed below using examples as shown below.
  • EXAMPLE 1
  • PSQL:
  • Select p.name, c.name
      • from Person p, Company c
  • where
      • p.department.company=c;
  • Equivalent SQL:
  • Select p.name, c.name
  • from
      • Person p, Company c, Department_employee_department_Person dp,
      • Department d, Company_department_company_Department cd
  • where
      • dp.employee=p.id and dp.department=d.id and
      • cd.department=d.id and cd.company=c.id;
    EXAMPLE 2
  • PSQL:
  • Select c.name
  • from Company c
  • where
      • c.department.employee.nationality < > ‘Indian’;
  • Equivalent SQL:
  • Select c.name
  • from
      • Company c, Company_department_company_Department cd, Department d, Department_employee_department_Person dp, Person p
  • where
      • cd.company=c.id and cd.department=d.id and
      • dp.department=d.id and dp.employee=p.id and
      • p.nationality < > ‘Indian’;
    EXAMPLE 3
  • PSQL:
  • Select p.name, p.department.company.name
  • from Person p;
  • Equivalent SQL:
  • Select p.name, c.name
  • from
      • Person p, Department_employee_department_Person dp, Department d,
      • Company_department_company_Department cd, Company c
  • where
      • dp.employee=p.id and dp.department=d.id and
      • cd.department=d.id and cd.company=c.id;
  • Further, the representational formalisms also include view definition language which is quite similar to the PSQL query language as discussed above. The only exception is that a view must specify attribute names of the target relation as aliases in the select clause.
  • An example differentiating the view definition language from PSQL query language is given below.
  • EXAMPLE
  • Suppose we have the following models:
      • Target: Employee(empName, address, salary, deptName)
      • Source: Emp(name, address, salary), Dept(name, manager) with an association <emp:dept; *:1> from Emp to Dept.
      • Target view definition:
      • select e.name empName, e.address address, e.salary salary, d.name deptName
      • from Emp e, Dept d
      • where
        • e.dept=d;
  • The conceptual modelling framework may have a set of rules that specify relationships among its elements. The rule language is thus defined to support and capture relationships which may be in the form of a constraint, a derivation or a mapping. Rules can exist in a range of forms—from rules that can be efficiently executed by a machine to rules that can only be interpreted by humans. However, the scope of the present invention is confined to machine executable rules.
  • These rules have the following general form:

  • ∀x(φ(x)→∃yψ(x,y))
  • Where x and y are tuples of variables, φ(x) is a conjunction of atomic formulas over a model of interest with variables in x and with each variable in x occurring in at least one conjunct of φ(x), and ψ(x, y) is a conjunction of atomic formulas over another model of interest (could be the same as the first) with variables among those in x and y. The atomic formulas may comprise of object terms, property terms, association terms, arithmetic expressions, relational expressions and aggregation functions. One may devise any suitable surface syntax to represent the rules (such as the popular IF-THEN notation) as long as they are reducible to the above general form. One can even allow disjunctions in the surface syntax as long as they can be decomposed into rules of the above form. One can readily verify that rules, as specified above, are similar in form to GLAV mappings discussed in data integration literature [1], and hence can be decomposed into suitable GAV and LAV rules that can be used by data integration components.
  • The above discussed rule language also forms a part of representational formalism. Next, representational formalism includes data flow graphs using a meta model as shown in FIG. 200. The model is abstract enough and rich enough to be mappable to ETL specifications of most commercial tools.
  • DFG Operator is defined as the super class of all operators while the sub classes include Source Qualifier, Target, Join, Union, LookUp, Expression, etc. One can add a new operator by adding a new subclass and implementing the operator interface. An operator has a set of input ports, a set of output ports and a set of associated expressions. An operator's input flows in through the input ports and its output flows out through the output ports. The semantics of the associated expressions depends on the operator type (e.g. for join it is a relational expression). A port has an associated variable that binds to the values that flow in or flow out. The output port of one operator may be connected to the input port of another operator.
  • The model modeling architecture and representational formalisms discussed above enable enterprise data assets to modeled and mapped at a conceptual level of abstraction, in an implementation independent manner
  • Next discussed is the set of primitive components from which purpose specific integration tools can be composed. The foundational primitive components so identified include the set of following stated components, which are discussed below along with supporting embodiments of the invention.
  • Mapping component: Mapping component helps develop mappings between models. Given two models it can discover correspondences between the models and interactively prompt a user to refine them into full-fledged mappings. It provides a user-friendly interface that hides much of the complexity of the mapping language discussed earlier. For instance, with respect to the example given earlier, when a user identifies Employee as the target and Emp and Dept as the sources, it can automatically discover the join between Emp and Dept by traversing the association between them.
  • Query Translation Component: Query translation component provides the following functionality:
  • Given a set of source models S and a set of target models T and a mapping between them (possibly via a number of intermediate model layers), translate a query Qs on S into an equivalent query Qt on T. Query translation can be performed by using a suitable query rewriting approach provided in related prior art.
  • Query equivalence testing component: Query equivalence testing component provides the following functionality:
  • Given a set of source models S and a set of target models T and a mapping between them (possibly via a number of intermediate model layers), and given two queries Qs and Qt on models S and T respectively, test if there is any containment relationship between the two (i.e. Qs Qt or Qt Qs or Qs=Qt)
  • This involves first translating Qs into an equivalent query Qst on target schema using the query translation component and then testing for containment relationship between Qst and Qt which is well discussed in literature.
  • Query to DFG translation component: This component provides the following functionality:
  • Given a query consisting of relations that come from multiple databases, translate it into an equivalent data-flow graph. Given below is an approach that takes a query spanning multiple databases and translates into an equivalent data-flow graph. The algorithm has three sequential steps: query segmentation, DFG generation and DFG optimization.
  • Query segmentation: Input query may contain subgoals from multiple sources. These subgoals need to be segmented into source specific groups. Each such segment should then be factored out as a subquery that can be executed on a single source. Emphasis is made to maximize the size of each segment so that as many subgoals as possible are pushed into a single source query.
  • Query Segmentation:
  • A graph data structure to identify query segments is constructed as described below.
      • A node for each relation subgoal and each expression subgoal is constructed.
      • An edge between two relation nodes is created if they belong to the same source and share at least one variable.
      • An expression node is labeled as belonging to a source S if each of its variables is either shared by a relation belonging to S or shared by another expression node that is labeled as belonging to source S.
      • Edges are created from an expression node labeled S to each of the relation nodes of S that share a variable with it.
      • Edges are created from an expression node labeled S to each expression node labeled S that shares a variable with it.
  • Each maximally connected subgraph that contains at least one relation node is a separate query segment. Such query segments can be separated out as subqueries. The head variables of a subquery are determined by selecting the variables of the query segment that are required outside the segment (i.e. either in the query head or in other query segments). Calls to these subqueries together with the remaining conditions make up the original query.
  • Before forming a subquery for a query segment it is checked if there already exists a subquery that has an equivalent query segment. If so, the subquery is reused to assist in performance optimization. This can be well illustrated using an example described below:
  • E.g.
      • Q(a1(v1), a2(v2), a3(v3)):—R1(a4(v1), a5(v4)), R2(a6(v4), a7(v2)), R3(a8(v1), a9(v5)), v2>v5, R4(a10(v5), a11(v6)), R5(a12(v1), a13(v7)), v6>v7, v3=v5+v6.
  • Let's say R1, R2 & R3 are from source S1 and R4, R5 are from source S2.
  • Next, two subquery segments shall be identified:
      • Qs1(a4(v1), a7(v2), a9(v5)):—R1(a4(v1), a5(v4)), R2(a6(v4), a 7(v2), R3(a8(v1), a9(v5)), v2>v5.
  • It is to be noted that v4 is not there in the head since it is not required outside the query segment.
      • Qs2(a12(v1), a10(v5), a11(v6)):—R4(a10(v5), a11(v6)), R5(a12(v1), a13(v7)), v6>v7.
  • Similarly v7 is not there in the head since it is not required outside the query segment.
  • Now the original query is rewritten as:
      • Q(a1(v1), a2(v2), a3(v3)):—Qs1(a4(v1), a7(v2), a9(v5)), Qs2(a12(v1), a10(v5), a11(v6)), v3=v5+v6.
  • The next phase is to describe the generation of data flow graph from segmented query representation which is an extension of the above stated example:
  • Generating DFG from the segmented query representation:
  • The approach is however discussed only for certain key DFG operators and the following naming convention is used in the discussion below:
  • T—Target
  • S—Source
  • SQ—Source query segment
  • Q—Query in general
  • ‘{‘<in-flow>’}’<DFG-operator>[‘(‘<parameters>’)’]‘{‘<out-flow>’}’-DFG operator
  • DFG(<query>)-DFG transformation of the query fragment <query>
  • Outer query corresponds to the target, and inner queries correspond to the intermediate nodes and source-specific queries correspond to the sources. We generate DFG job-flow backwards starting from the target.
  • In the following it is discussed how different query forms map to different DFG operators:
  • Query form DFG operator
    Q(<head-var-list>) :- Q1(<Q1-var-list>), {DFG(Q1(<Q1-var-list>)),
    Q2(<Q2-var-list>),.., Qm(..),  DFG(Q2(<Q2-var-list>)),
    <Q cond>.  ..,
     DFG(Qm(<Qm-var-list>))}
    JOIN(<Q cond>)
    {<head-var-list>}
    Q(v1,..,vj, <aggr>(vk)) :- <body> {DFG{‘Q(v1,..,vj, vk)) :- <body>’)}
    <aggr>(vk, vk-aggr-value)
    {v1,..,vj, vk-aggr-value}
    Q(v1,..,vn) :- SQ1(vi, vj,..vk), SQ2(..),.., {DFG(R1(<arg1>)), DFG(R2(<arg2>)),..,
    SQm(..), <Q cond>.  DFG(Rs(<argS>))}
    SQ1(v1,..,vl) :- R1(<arg1>), R2(..),.., JOIN(<SQ1 cond>)
    Rs(..), <SQ1 cond>. {v1,..,vl}
    Where if possible <cond> should be split
    and pushed down to R* (using the usual
    optimization techniques), resulting in:
    {R1}
    SELECT(<R1 cond>)
    {<R1 columns>}
    PROJECT(<arg1>)
    {<arg1>}
    Q(v1,..,vn) :- <body1> {
    Q(v1,..,vn) :- <body2> DFG(‘Q(v1,..,vn) :- <body1>’),
    .. DFG(‘Q(v1,..,vn) :- <body2>’),..,
    Q(v1,..,vn) :- <bodyk> DFG(‘Q(v1,..,vn) :- <bodyk>’)
    }
    UNION
    {v1,..,vn}
    Q1(..) :- <body1>, Qk(<SQk-var-list>). {DFG(Qk(<Qk-var-list>))}
    Q2(..) :- <body2>, Qk(<SQk-var-list>). SPLIT
    .. {<Qk-var-list-1>, <Qk-var-list-2>,..,<Qk-
    Qn(..) :- <bodyn>, Qk(<SQk-var-list>). var-list-n>}
    Where <Qk-var-list-i> is one of the inputs
    -- corresponding to the subgoal Qk(..) --
    to the JOIN operator corresponding to
    ‘Qi(..) :- <bodyi>, Qk(<Qk-var-list>)’.
    Q(v1,..,vn) :- <body1>, not Q2(vi,..,vk). {
    Refactor this query as below: (
    Q(v1,..,vn) :- {DFG(<query that selects (vi,..,vk)>),
    <query that selects (vi,..,vk)>,  DFG(Q2(vi,..,vk))}
    <query that selects rest of (v1,..,vn)>,  DIFF
    <cond>, {vi,..,vk}
    not Q2(vi,..,vk). ),
    DFG(<query that selects rest of
    (v1,..,vn)>)
    }
    JOIN(<cond>)
    {v1,..,vn}
    We view lookup as a join operation, but on {DFG(<body>)}
    a table that is marked as a lookup table. LOOKUP(LT, vk, vl)
    {v1,..,vl,..,vn}
    Q(v1,..,vl,..,vn) :- <body>, LT(vk, vl)
    Where ‘vk’ is a key variable selected in
    <body>, and LT is a lookup table that
    stores surrogate keys.
    Function application: DFG(<body that selects the rest of
    Functions occur in expression subgoals: (v1,..,vn)>)}
    Q(v1,..,vf,..,vn) :- FUNC(f, (vi,..,vk), vf)
    <body that selects the rest of (v1,..,vn)>, {v1,..,vf,..,vn}
    vf = f(vi,..,vk).
  • As described in the above table DFG can be generated by mapping different query forms to their corresponding DFG operators. The generated DFG can be optimized further by using techniques similar to the ones used in query graph optimization which are well documented in literature.
  • DFG to query translation component: This component provides the following functionality:
  • Given a data flow graph (DFG) that computes a target data view from a set of relations that come from one or more source databases, translate the DFG into an equivalent query on the relations. The translation consists of two procedures as described below:
  • Operator-to-Query Procedure:
  • The query corresponding to each operator (except Source Qualifier instances) has the following format:
      • Select clause contains the names of output ports.
      • From clause contains sub-queries corresponding to the operators whose output ports are connected to the input ports of this operator, with sub-query aliases set to the names of the corresponding input operators.
      • Where clause is derived from the expressions associated with the operator. Its precise form depends on the semantics of the operator type; for example in the case of join operator the expressions are simply dumped into the where clause after replacing all references to input ports by their corresponding subquery qualified names, i.e. <operatorName.portName>.
  • Source Qualifier operator has an associated query which is simply dumped as is.
  • DFG-to-Query Procedure:
      • Formulate the query of the target operator using the procedure described above. The sub-query creation would iteratively traverse backwards towards the sources. Effectively queries corresponding to all the operators are created. We are however interested only in the query corresponding to the Target Operator, as this corresponds to the Query of the DFG.
  • This query is then optimized to remove duplicate sub queries. This is done after it is converted to an internal representation. This is done to avoid reevaluation of the same sub query thereby contributing to performance optimization.
  • Discussed next is a matching component which can be used to figure out matching entity instances in two databases. Its capability to provide such functionality is exemplified below:
  • Matching Component: Given two database schemas S1 and S2 that are mapped to a common global schema G, and given a specification to detect duplicate instances in G (i.e. a deduplication specification over G), and given two database instances I1, and I2 of schemas S1 and S2 respectively, output all matching object pairs from I1 and I2 as tuples of the form <S1:<entity>, S2:<entity>, <pk1-value tuple>, <pk2-value tuple>>.
  • Where each tuple identifies an entity instance in and its corresponding matching entity instance in I2. This list of tuples is called as an object map.
  • The deduplication specification essentially identifies when two instances of an entity in the global schema refer to the same underlying entity.
  • E.g.
  • Following model exists in global schema:
  • Dept, Emp
  • Dep.(emp(0 . . . *)/dept(1)).Emp
  • There might exist following matching specification when two employee instances can be considered to refer to the same real-world employee.
  • Match e1,e2:Emp when {
     e1.dob = e2.dob
     and
     approx_match(e1.name. e2.name, 80)
     and
     e1.dept.name = e2.dept.name
    }
  • Platform specific component: A platform specific component translates between internal representations of queries and DFGs and their platform specific representations. One such component exists for each supported platform. They provide the import/export bridge between the tool and third-party platforms, and thus play a key role in the overall architecture.
  • For example, from internal representation of DFG to Informatica ETL representation, or from internal representation of query to Oracle specific representation, and so on.
  • After identification of all essentially required set of primitive components, next required is the composition of purpose specific integration tools, the method for which is discussed here forth.
  • Composing Purpose Specific Integration Tools
  • We discuss how the modeling framework and the primitive components discussed above can be used to compose integration tools. This is illustrated with a few known tool categories as given below:
  • Enterprise Information Integration (EII) Tool
  • EII tools enable a unified view of data by mapping a set of database schemas to a unified global schema. They provide a query interface in terms of the global model. The tool translates these queries into data source specific queries and assembles the results returned by the data sources. Composition of an EII tool from the primitive components is shown in FIG. 300. The method for constructing EII tool using defined foundational components includes the steps of:
      • Using the conceptual modeling framework to create the data models.
      • Using the mapping component to specify mappings between these models.
      • Using the query translation component to translate a unified model query into an equivalent target query involving relations from across source schemas.
      • Using the ‘query-to-DFG’ component to translate the target query into an equivalent DFG and finally
      • Translating the DFG into third party ETL and executing it or executing the DFG directly using the DFG execution component.
  • Extract, Transform and Load (ETL) Tool
  • An ETL tool from the primitive components can be composed as shown in FIG. 400, the method describing which is as follows:
      • Use the modeling tool to build warehouse model, source models and global model if any.
      • Use the mapping tool to map the warehouse schema either to the global schema or to the source schemas directly.
      • Treat these mapping views as source queries and use the query translation component to translate these queries into equivalent target queries on data source models.
      • Use the ‘query-to-DFG’ component to translate the target queries into equivalent DFGs and
      • Translate the DFGs into third party ETL or execute the DFGs directly using the DFG execution component.
  • Data Synchronization Tool:
  • A tool to synchronize data sources can be composed as follows:
      • Use the modeling tool to build source models and a unified global model.
      • Use the mapping tool to map the source models to the global model.
      • Use the matching tool to specify entity matching specifications on the global model and generate the object map.
      • For each update on a source entity, consult the object map to get the matching entities in other sources and use the query translator component to translate update statements into equivalent update statements on matching sources, and execute these update statements on the sources as shown in FIG. 500.
  • Data Migration Tool
  • Data migration involves not only database migration but migration of various data access programs as well, such as queries, stored procedures, ETL specifications, etc.
  • We can compose relevant tool as shown in FIG. 600, the method for which is detailed as follows:
      • Use the modeling tool to model source schemas and target schemas
      • Use the mapping component to specify mappings between them
      • Form pseudo queries corresponding to each target table (select * from <target_table>) and use the query translation component to translate these queries into source specific queries.
      • Either execute these queries directly (if there is only one source database) or use the ‘query-to-DFG’ component to generate equivalent DFGs (and there from tool specific ETLs if required).
      • Translate queries on the source schema into equivalent queries on the target schema using the query translator component.
  • Similarily, source specific ETL specifications are translated into equivalent target specific ETL specifications using the following procedure:
      • Use the ‘DFG-to-query’ component to translate source ETL specification into equivalent composite source queries.
      • Use the query translator component to translate these composite source queries into equivalent target queries.
      • Use the ‘query-to-DFG’ component to translate these target queries into equivalent target DFGs, and use the platform specific ETL translator to translate these DFGs into target specific ETL specifications.
  • The present invention besides providing a solution for creating purpose specific integration tools also discloses a method by which migration from one solution architecture to another can be implemented. The interoperability among different integrating tools available is discussed below.
  • Migrating from one solution architecture to another: The migration process from one solution to another is illustrated by the cases involving currently available integration tools.
  • Migrating from an EII Solution to an ETL Solution
  • In one of the embodiments of the present invention, a method of migrating from an EII solution platform to an ETL platform solution is provided using the primitive foundational components. Suppose instead of executing the queries on the virtual global model to extract required data views, there is a need to materialize those data views into a warehouse using an ETL like approach, say in order to improve the response time. Given below is the method of migrating to the new solution:
      • Translate the requisite global queries into equivalent source queries using the query translation component.
      • Translate these queries into equivalent DFGs using the ‘query-to-DFG’ component.
      • Translate these DFGs into target platform specific ETLs.
  • Migrating from an ETL Solution to an EII Solution
  • In one other embodiments of the present invention, a method of migrating from an ETL solution to an EII platform solution is provided. The method uses purpose specific tools for such migration composed from a set of primitive foundational components. Suppose we have a warehouse and an ETL mapping from sources to populate the warehouse, and we want to migrate to a new solution where we want to make the warehouse views virtual and on-demand so as to improve latency. Discussed below is a solution how to migrate to a new solution:
      • Turn the warehouse schema into the global schema.
      • Translate platform specific ETL specifications into equivalent DFGs.
      • Translate each DFG into an equivalent composite source query using the ‘DFG-to-query’ component. Turn this query into a mapping view specification of the corresponding global schema table and finally
      • Turn all warehouse queries into equivalent global schema queries.
  • Hence, the proposed conceptual modelling architecture allows enterprise integration to be modelled at a semantic level, capturing rules and constraints that govern the behaviour of models and their mappings and the set of primitive components presented are sufficiently general purpose in nature that can be used to compose purpose specific integration tools.
  • The integration architecture here presented in the disclosure thus enables easy migration from one integration solution to another.
  • The foregoing description of the specific embodiments will so fully reveal the general nature of the invention that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the invention has been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments of the invention can be practiced with modification within the spirit and scope of the appended claims.

Claims (16)

1) A method for composing integrating tools for facilitating interoperability across integrating platforms, the said method comprising processor implemented steps of:
modeling plurality of data entities at multiple levels of abstraction in a source schema and capturing relationship between the data entities using a query language in conjugation with path expressions;
mapping between the modeled data entities of the source scheme to obtain a unified model query using a mapping component ;
translating the unified model query of the source schema into an equivalent target queries on a target schema and checking for containment relationship between the queries thereof using a query translation component;
translating one or more target queries into equivalent target data flow graphs using a query to DFG translation component and;
executing the data flow graphs directly or translating the target data flow graphs into third party integrating platform and executing it thereof using a DFG to query translation component.
2) The method of claim 1, wherein the modeling of data entities at multiple levels is achieved by conceptual modeling framework using an object model or an entity relationship model as a modeling language.
3) The method of claim 1, wherein the source schema is defined at multiple levels using conceptual model for at least one level and physical level or conceptual model for subsequent levels.
4) The method of claim 1, wherein the query language and the path expressions captures association relationship between the data entities and gets translated into corresponding structured query language following a translation rule.
5) The method of claim 1, wherein the path expression are governed by a set of rules to support association relationships; the said rules including:
selecting from a cartesian product of relations listed in ‘from’ clause of query, the relations that satisfy ‘where’ clause and the path expressions; and projecting out columns specified during selection; wherein the path expressions have implicit existential quantification and;
allowing the path expressions in ‘select’ clause of the query provided *:1 (many to one) associations are involved.
6) The method of composing platform specific integrating tools as claimed in claim 5, wherein the governing rule allows each association segment of the path expression to introduce two relations in the ‘from’ clause, corresponding to the association and associated class, and corresponding join conditions in the ‘where’ clause of the query.
7) The method of claim 1, wherein the mapping is specified by either of global-as-view (GAV) or local-as-view (LAV) or GLAV mappings.
8) The method of claim 1, wherein the target query is translated into an equivalent target data flow graph in steps of:
segmenting a given query containing subgoals from multiple sources such that each maximally connected subgraph that contains at least one relation node constitutes a query segment;
checking for an already existing subquery having an equivalent query segment for reuse;
factoring the query segments into subquery executable on a single source and;
constructing the target data flow graph from segmented query representation expressing nodes and edges corresponding to subgoals of query.
9) The method of composing platform specific integrating tools as claimed in claim 1, wherein the data flow graphs are represented by a meta model, the said meta model further defined by data flow graph operators wherein the said operators includes a set of input and output ports along with associated expressions.
10) The method of composing platform specific integrating tools as claimed in claim 1, wherein the data flow graph (DFG) to query translation component translates the data flow graph consisting of one target relation from one target database and at least one source relation from at least one source database, wherein the translation is performed by:
formulating an outer query corresponding to target DFG operator in a suitable format and;
formulating sub queries corresponding to each inner operator of the data flow graph in a nested manner and
optimizing the formulated query by removing duplicate sub queries.
11) The method of composing platform specific integrating tools as claimed in claim 10, wherein the target DFG operator is formulated into its corresponding query using a specified format, the said format comprising
select clause containing names of output ports;
from clause containing sub queries corresponding to the operators, the said operators having their output port connected to an input port of the operator;
where clause derived from an associated expression of the operator, wherein semantics of the said operator deciding a form of said clause.
12) The method of claim 1, wherein the composed integrating tools can be either of EII or ETL or data migration or data synchronization tool or a combination thereof.
13) The method of claim 1, wherein the integrating tool, particularly a data integration tool can be composed in the steps of:
building a source model and a unified global model using a modeling framework;
mapping between the source model and the global model using a mapping component;
specifying entity matching specifications on the global model to generate object map using a matching component;
translating query updates on the source model into equivalent query updates on matching sources and;
executing the said query updates on the sources using a query translator component.
14) The method of claim 13, wherein the matching component takes matching specifications expressed on the global model and matches instances of a class across plurality of disparate databases by presenting all matching object pairs from said instances as one or more tuple called object map.
15) The method of claim 1, wherein the integration tool, particularly a data migration tool is composed in the steps of:
modeling a source schema and a target schema using a modeling component;
mapping between the source schema and the target schema using a mapping component;
formulating queries corresponding to each target table and translating the said queries into source specific queries using a query translation component and translating the said source specific queries into equivalent data flow graphs using a query to DFG translation component, and executing the said data flow graphs to migrate data from the source schema to the target schema;
16) A computer program product comprising a computer usable medium having a computer readable program code embodied therein, said computer readable program code adapted to be executed to implement a method for composing integrating tools for facilitating interoperability across integrating platforms, said method comprising:
modeling plurality of data entities at multiple levels of abstraction in a source schema and capturing relationship between the data entities using a query language in conjugation with path expressions;
mapping between the modeled data entities of the source scheme to obtain a unified model query using a mapping component ;
translating the unified model query of the source schema into an equivalent target queries on a target schema and checking for containment relationship between the queries thereof using a query translation component;
translating one or more target queries into equivalent target data flow graphs using a query to DFG translation component and;
executing the data flow graphs directly or translating the target data flow graphs into third party integrating platform and executing it thereof using a DFG to query translation component.
US13/252,630 2010-10-18 2011-10-04 Component Based Approach to Building Data Integration Tools Abandoned US20120095957A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/551,766 US10606861B2 (en) 2010-10-18 2014-11-24 Method for building data integration tools to facilitate interoperability of data integration solutions among data integration platforms

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN2893/MUM/2010 2010-10-18
IN2893MU2010 2010-10-18

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/551,766 Continuation US10606861B2 (en) 2010-10-18 2014-11-24 Method for building data integration tools to facilitate interoperability of data integration solutions among data integration platforms

Publications (1)

Publication Number Publication Date
US20120095957A1 true US20120095957A1 (en) 2012-04-19

Family

ID=45934979

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/252,630 Abandoned US20120095957A1 (en) 2010-10-18 2011-10-04 Component Based Approach to Building Data Integration Tools
US14/551,766 Active 2033-11-13 US10606861B2 (en) 2010-10-18 2014-11-24 Method for building data integration tools to facilitate interoperability of data integration solutions among data integration platforms

Family Applications After (1)

Application Number Title Priority Date Filing Date
US14/551,766 Active 2033-11-13 US10606861B2 (en) 2010-10-18 2014-11-24 Method for building data integration tools to facilitate interoperability of data integration solutions among data integration platforms

Country Status (1)

Country Link
US (2) US20120095957A1 (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120072391A1 (en) * 2010-09-22 2012-03-22 Alkiviadis Simitsis Apparatus and method for an automatic information integration flow optimizer
US20140279914A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Data Migration in a Database Management System
US20150205886A1 (en) * 2012-09-03 2015-07-23 Agfa Healthcare Nv On-demand semantic data warehouse
WO2015116040A1 (en) * 2014-01-28 2015-08-06 Hewlett-Packard Development Company, L.P. Data migration
US20150317331A1 (en) * 2014-05-02 2015-11-05 DhiLogics Solutions India Pvt. Ltd. Unified platform for application development
US20160125067A1 (en) * 2014-10-31 2016-05-05 International Business Machines Corporation Entity resolution between datasets
US20160154830A1 (en) * 2013-07-17 2016-06-02 Qatar Foundation Systems and methods for data integration
US20160205211A1 (en) * 2014-11-21 2016-07-14 Wipro Limited Method and system for migrating an interface
CN106021624A (en) * 2016-07-21 2016-10-12 中国农业银行股份有限公司 ETL (extract-transform-load) model generation method and device
CN108399064A (en) * 2017-02-07 2018-08-14 南京南瑞继保电气有限公司 Multilingual entry generation method based on Qt exploitations
US10579515B1 (en) * 2018-09-10 2020-03-03 International Business Machines Corporation Recycling segment pages while preserving integrity of memory addressing
WO2020084641A1 (en) * 2018-10-26 2020-04-30 Tata Consultancy Services Limited Systems and methods of data migration in multi-layer model-driven applications
US10659287B2 (en) * 2012-05-23 2020-05-19 Denso Corporation Management server
US11138157B2 (en) * 2017-08-30 2021-10-05 Jpmorgan Chase Bank, N.A. System and method for identifying business logic and data lineage with machine learning
US20210334248A1 (en) * 2018-06-18 2021-10-28 Tamr, Inc. Generating and reusing transformations for evolving schema mapping
US11303305B2 (en) * 2013-10-22 2022-04-12 Nippon Telegraph And Telephone Corporation Sparse graph creation device and sparse graph creation method
CN114579822A (en) * 2021-12-13 2022-06-03 北京市建筑设计研究院有限公司 Method and device for pushing modeling tool, electronic equipment and storage medium
EP3948564A4 (en) * 2019-03-25 2022-12-28 Datacoral, Inc. Systems and methods for generating, deploying, and managing data infrastructure stacks

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273106B (en) * 2016-04-08 2021-07-06 北京三星通信技术研究有限公司 Object information translation and derivative information acquisition method and device
US11630690B2 (en) 2017-12-19 2023-04-18 International Business Machines Corporation Simplifying data mapping in complex flows by defining schemas at convergence points in a workflow
US11593393B1 (en) 2020-05-22 2023-02-28 Cigna Intellectual Property, Inc. Systems and methods for providing automated integration and error resolution of records in complex data systems

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5970490A (en) * 1996-11-05 1999-10-19 Xerox Corporation Integration platform for heterogeneous databases
US20060101073A1 (en) * 2004-10-28 2006-05-11 International Business Machines Corporation Constraint-based XML query rewriting for data integration

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020129145A1 (en) * 2001-03-06 2002-09-12 Accelerate Software Inc. Method and system for real-time querying, retrieval and integration of data from database over a computer network
US20050243604A1 (en) * 2004-03-16 2005-11-03 Ascential Software Corporation Migrating integration processes among data integration platforms
US7246108B2 (en) * 2004-07-27 2007-07-17 Oracle International Corporation Reusing optimized query blocks in query processing
US20070021411A1 (en) * 2005-05-11 2007-01-25 Cloyd James C Supersaturated benzodiazepine solutions and their delivery
US7689582B2 (en) * 2006-03-10 2010-03-30 International Business Machines Corporation Data flow system and method for heterogeneous data integration environments
US9727604B2 (en) * 2006-03-10 2017-08-08 International Business Machines Corporation Generating code for an integrated data system
US8099725B2 (en) * 2006-10-11 2012-01-17 International Business Machines Corporation Method and apparatus for generating code for an extract, transform, and load (ETL) data flow
US7660884B2 (en) * 2006-11-10 2010-02-09 International Business Machines Corporation Apparatus, system, and method for generating a resource utilization description for a parallel data processing system
US8160999B2 (en) * 2006-12-13 2012-04-17 International Business Machines Corporation Method and apparatus for using set based structured query language (SQL) to implement extract, transform, and load (ETL) splitter operation
US8219518B2 (en) * 2007-01-09 2012-07-10 International Business Machines Corporation Method and apparatus for modelling data exchange in a data flow of an extract, transform, and load (ETL) process
US8935702B2 (en) * 2009-09-04 2015-01-13 International Business Machines Corporation Resource optimization for parallel data integration

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5970490A (en) * 1996-11-05 1999-10-19 Xerox Corporation Integration platform for heterogeneous databases
US20060101073A1 (en) * 2004-10-28 2006-05-11 International Business Machines Corporation Constraint-based XML query rewriting for data integration

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120072391A1 (en) * 2010-09-22 2012-03-22 Alkiviadis Simitsis Apparatus and method for an automatic information integration flow optimizer
US8538912B2 (en) * 2010-09-22 2013-09-17 Hewlett-Packard Development Company, L.P. Apparatus and method for an automatic information integration flow optimizer
US10659287B2 (en) * 2012-05-23 2020-05-19 Denso Corporation Management server
US20150205886A1 (en) * 2012-09-03 2015-07-23 Agfa Healthcare Nv On-demand semantic data warehouse
US10936656B2 (en) * 2012-09-03 2021-03-02 Agfa Healthcare Nv On-demand semantic data warehouse
US20140279914A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Data Migration in a Database Management System
US9378254B2 (en) * 2013-03-15 2016-06-28 International Business Machines Corporation Data migration in a database management system
US20160154830A1 (en) * 2013-07-17 2016-06-02 Qatar Foundation Systems and methods for data integration
US10528532B2 (en) * 2013-07-17 2020-01-07 Qatar Foundation Systems and methods for data integration
US11303305B2 (en) * 2013-10-22 2022-04-12 Nippon Telegraph And Telephone Corporation Sparse graph creation device and sparse graph creation method
US10142415B2 (en) 2014-01-28 2018-11-27 Hewlett Packard Enterprise Development Lp Data migration
WO2015116040A1 (en) * 2014-01-28 2015-08-06 Hewlett-Packard Development Company, L.P. Data migration
US20150317331A1 (en) * 2014-05-02 2015-11-05 DhiLogics Solutions India Pvt. Ltd. Unified platform for application development
US20160125067A1 (en) * 2014-10-31 2016-05-05 International Business Machines Corporation Entity resolution between datasets
US9996607B2 (en) * 2014-10-31 2018-06-12 International Business Machines Corporation Entity resolution between datasets
US20160205211A1 (en) * 2014-11-21 2016-07-14 Wipro Limited Method and system for migrating an interface
US9584614B2 (en) * 2014-11-21 2017-02-28 Wipro Limited Method and system for migrating an interface
CN106021624A (en) * 2016-07-21 2016-10-12 中国农业银行股份有限公司 ETL (extract-transform-load) model generation method and device
CN108399064A (en) * 2017-02-07 2018-08-14 南京南瑞继保电气有限公司 Multilingual entry generation method based on Qt exploitations
US11138157B2 (en) * 2017-08-30 2021-10-05 Jpmorgan Chase Bank, N.A. System and method for identifying business logic and data lineage with machine learning
US11860827B2 (en) 2017-08-30 2024-01-02 Jpmorgan Chase Bank, N.A. System and method for identifying business logic and data lineage with machine learning
US20210334248A1 (en) * 2018-06-18 2021-10-28 Tamr, Inc. Generating and reusing transformations for evolving schema mapping
US10579515B1 (en) * 2018-09-10 2020-03-03 International Business Machines Corporation Recycling segment pages while preserving integrity of memory addressing
WO2020084641A1 (en) * 2018-10-26 2020-04-30 Tata Consultancy Services Limited Systems and methods of data migration in multi-layer model-driven applications
US11593325B2 (en) 2018-10-26 2023-02-28 Tata Consultancy Services Limited Systems and methods of data migration in multi-layer model-driven applications
EP3948564A4 (en) * 2019-03-25 2022-12-28 Datacoral, Inc. Systems and methods for generating, deploying, and managing data infrastructure stacks
US11853314B2 (en) 2019-03-25 2023-12-26 DataCoral, Inc. Systems and methods for generating, deploying, and managing data infrastructure stacks
CN114579822A (en) * 2021-12-13 2022-06-03 北京市建筑设计研究院有限公司 Method and device for pushing modeling tool, electronic equipment and storage medium

Also Published As

Publication number Publication date
US20150081620A1 (en) 2015-03-19
US10606861B2 (en) 2020-03-31

Similar Documents

Publication Publication Date Title
US10606861B2 (en) Method for building data integration tools to facilitate interoperability of data integration solutions among data integration platforms
US7526503B2 (en) Interactive schema translation with instance-level mapping
Andersson Extracting an entity relationship schema from a relational database through reverse engineering
Calvanese et al. Information integration: Conceptual modeling and reasoning support
US9558239B2 (en) Relational query planning for non-relational data sources
US6609123B1 (en) Query engine and method for querying data using metadata model
US7769769B2 (en) Methods and transformations for transforming metadata model
US8214354B2 (en) Column constraints based on arbitrary sets of objects
Sun et al. Modeling data for business processes
Liu et al. An adaptive approach to query mediation across heterogeneous information sources
US10324930B2 (en) Database calculation engine with nested multiprovider merging
Berger et al. From federated databases to a federated data warehouse system
US11100098B2 (en) Systems and methods for providing multilingual support for data used with a business intelligence server
Naumann et al. Declarative data merging with conflict resolution
EP1019851A1 (en) Relational database coexistence in object oriented environments
US10140335B2 (en) Calculation scenarios with extended semantic nodes
KR101288208B1 (en) System of entity-relationship model reformulation of sparql query results on rdf data and the method
US20140365464A1 (en) Calculation Scenarios with Semantic Nodes
US8195712B1 (en) Lattice data set-based methods and apparatus for information storage and retrieval
CN113221528B (en) Automatic generation and execution method of clinical data quality evaluation rule based on openEHR model
Ahamed et al. Data integration-challenges, techniques and future directions: a comprehensive study
Atzeni et al. Mism: A platform for model-independent solutions to model management problems
Sharma et al. FLASc: a formal algebra for labeled property graph schema
Abolhassani et al. Knowledge graph-based query rewriting in a relational data harmonization framework
Chytil Adaptation of Relational Database Schema

Legal Events

Date Code Title Description
AS Assignment

Owner name: TATA CONSULTANCY SERVICES LIMITED, INDIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:REDDY, SREEDHAR SANNAREDDY;DAS, PRASENJIT;YEDDULA, RAGHAVENDRA REDDY;SIGNING DATES FROM 20111213 TO 20111214;REEL/FRAME:027402/0010

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION