WO2005067396A2

WO2005067396A2 - Data pattern based odbc/ oledb/ jdbc compliant driver

Info

Publication number: WO2005067396A2
Application number: PCT/IN2004/000334
Authority: WO
Inventors: Vinayak K. Rao
Original assignee: Vaman Technologies (R & D) Limited
Priority date: 2003-10-22
Filing date: 2004-10-25
Publication date: 2005-07-28
Also published as: WO2005067396A3

Abstract

The present invention relates generally to the field of computer database systems and access to such databases. In particular the present invention related to ODBC (Open database Connectivity) / OLEDB (Object Linking and Embedding) / JDBC (Java Database Connectivity) compliant database clients with respect to the database connectivity. More particularly the present invention relates to extending these popularly accepted data access interfaces for any stream of data pattern irrespective of source of generation from any legacy system or any system wherever the data stream can be associated with harmonic defined pattern of data dictated by these standards.

Description

TITLE OF INVENTION >.

DATA PATTERN BASED ODBC/ OLEDB/ JDBC COMPLIANT DRIVER

BACKGROUND OF THE INVENTION

Databases are an important tool for the storage and management of information for businesses. Both relational and non-relational databases exist for this purpose. Examples of relational databases include ORACLE, DB2 and INFORMIX. Examples of non-relational databases include custom databases such as Clipper, FoxPro etc. Database Management Systems (DBMS) provide users the capabilities of controlling read/write access, specifying report generation, and analyzing usage.

Some businesses, particularly large organizations, maintain multiple database types, both relational and non-relational. The simultaneous access to these databases then becomes an issue.

Fig 1 is a block diagram depicting the system using the legacy ctcmc epicting me (a) use of the shared path on the server and (b) the use of network interface that does not require sharing, along with various components and protocols involved.

FIG. 1 (a) illustrates a network of client computer systems running on various platforms, running applications and sharing the path on the server, to access and share the common data from the clients 100, 105, 110, 115 and 120. Earlier a lot of the legacy applications were programmed with popular searching or sorting techniques but with need-based data definition patterns. These were typical in a file based database applications typical examples of such a system are two-tier legacy architecture where the data definition resides in the presentation layer i.e. on the clients rather than on the server typical examples are "C", "Pascal", "Cobol" etc applications. Also many real-time applications like attendance- recording machine, medical equipments where the rate of information generation, the nature of data or the application, does not demand a need or an overhead of a database engine to persist such data. Yet the generated data needs to be interfaced with a central server or an application needing the archived data. Such data files residing on the clients comprise of a header 125 and the data part 130. The header 130 is optional and may include the Data Definition Language (DDL) and the data includes the Data Manipulation Language (DML) 140.

Often to improve the performance languages like C / Pascal were used to create database applications so as to match real-time data usage requirements as applications written in these languages run faster and hence performance improves. These languages were not designed to handle structured data approach though the searching or sorting mechanisms could ^• be achieved fastest. Whenever persistence of the data was needed, the programmers used to rely heavily on the native Operating System (OS) file system to create data files. These legacy applications worked fine but the problem was that the data definition was programmed as a part of presentation layer (i.e. graphical user interface layer) rather than having it defined along with data. Typical C, Pascal or Fortran applications followed these approaches as a lot of these applications design were restricted by the limitation of the developmen^t tool or resource because these languages were not designed to handle database but the developers creativity led to applications and end users suffered as solution was more important rather than the tool used for developing the solution.

Slowly the trend shifted, programmers and tool vendors pushed data definition along with data archival so that external applications could reuse data. FIG. 1 (b) illustrates a network of client computer systems running on various platforms 150, 155, 160, 165 and 170, a database server 180, and using the network interface from the Network Service 190 that does not require sharing. In the later generation of the evolution of such systems the DDL got pushed from the clients to the Database Server 180 such as in FoxPro, Dbase, Clipper. These file-based databases could not interface or interact with other applications as the information needed to manipulate data and data information rested within the application code. The problem with each of databases was that each of these had data definitions that were not standard and proprietary, as defined by each application vendor. So in net affect the Data Definition Language and the Data Manipulation Language were both at the Database Server's end on the data store 195 and in the proprietary file format, so that any client can access the data such that the network interface does not require any sharing. One conventional approach for achieving this goal is to introduce an interface module like ODBC / OLEDB / JDBC between the clients and the back-end database systems. This interface module, also called database middleware or data access middleware, attempts to provide to clients transparent access to the back-end database systems. Generally speaking, the interface module receives data definition and manipulation instructions from clients. The interface module translates these instructions such that they are understandable to the appropriate back-end database systems, and then transfers the translated instructions to the appropriate back-end database systems. Similarly, the interface module translates information and messages received from the back-end database systems such that they are understandable to the appropriate clients, and then transfers the translated information and messages to the appropriate clients. The main function of one such data access middleware such as the Open Database Connectivity (ODBC) driver was to standardize such DDL by standardizing the DDL. The naming convention of DDL varied as per vendor implementation and the Database architecture. Such an access mechanism provided an interface path or mechanism to exchange data as per the DDLs.

ODBC is based on and closely aligned with The Open Group standard Structured Query Language (SQL) Call-Level Interface. It allows programs to use SQL requests that will access databases without having to know the proprietary interfaces to the databases. ODBC handles the SQL request and converts it into a request the individual database system understands.

ODBC was created by the SQL Access Group and first released in September 1992. Although Microsoft Windows was the first to provide an ODBC product, versions now exist for UNIX, OS/2, and Macintosh platforms as well.

In the newer distributed object architecture called Common Object Request Broker Architecture (CORBA), the Persistent Object Service (POS) is a superset of both the Call-Level Interface and ODBC. When writing programs in the Java language and using the Java Database Connectivity (JDBC) application program interface, you can use a product that includes a JDBC-ODBC "bridge" program to reach ODBC-accessible databases.

Hence during legacy data migration or bridging various applications during automation centralization phase, a tool was needed which could extract definition from the application based on persistent data created by the application. Further this tool should provide an interface for these persistent data in any ODBC/OLEDB/JDBC compliant format so that contemporary applications or other tools could interface without disturbing the legacy application. This also facilitated smoother technology transition and allowed building of newer applications based on legacy data.

Most of the times these legacy applications were cheaper to maintain and technology upgrades could not justify the Return On Investment and Total Cost of Ownership demanded by newer technology when a satisfactorily working application existed without the need to chanαe a time-tested system but just a central data view or process view was need to facilitate the data exchange.

Accordingly, there exists a need for a system and method, which creates a data bridge or interface which could exchange data and its definition between legacy application and the current technology. There also exists a need for a system and method to provide with a common interface to all of the database systems in a manner that the client is under the illusion that they are interacting with a single database. Further, it is desirable that such an invention provides a bridge or interface not just for ODBC but also for JDBC/OLEDB etc. It is also desirable that such an invention provides a comprehensive solution to address the issues that arise from traditional mechanism. It is also desirable that such an invention provides a system and method that provided data bridge or interface across all network technologies/ topologies. Further, what was needed is an improved system and method that solves these issues without changing the legacy application and besides simultaneously also allows a data exchange path so that business process automation can be centralized. SUMMARY OF THE INVENTION

The current invention solves these issues without changing the legacy application and simultaneously allowing a data exchange path so that business process automation can be centralized.

A data pattern is a stream of bytes that is repetitive. Any persistent harmonic data generated by applications other than databases (non standards compliant databases too) could be converted to a standards compliant database with the pattern based driver interface proposed in this application which maps these harmonic data patterns to classified and accepted standards of ODBC / OLEDB or JDBC.

The proposed invention also behaves and performs basic operations of a database server without the need of an actual server i.e. the driver engine itself performs and manipulates data like a database oer.'cr.

The entire design is based on state machines and modules comprising of various events communication via messages that is it is event driven using Finite State Machine (FSM) concept the functionality is broken down into a series of events scheduled by kernel.

BRIEF DESCRIPTION OF THE DRAWINGS

The various objects and advantages of the present invention will become apparent to those of ordinary skill in the relevant art after reviewing the following detailed description and accompanying drawings, wherein:

Fig 1 is a block diagram depicting the system using the legacy systems depicting the (a) use of the shared path on the server and (b) the use of network interface that does not require sharing, along with various components and protocols involved. Fig 2 (a) is a block diagram depicting the physical representation of the conventional three- tier architecture.

Fig 2 (b) is a block diagram depicting the logical representation of the conventional three- tier architecture.

Fig 3 (a) is a block diagram depicting the physical representation of the Data Pattern Driver of the proposed invention.

Fig 3 (b) is a block diagram depicting the logical representation of the Data Pattern Driver of the proposed invention.

Fig 4 is a block diagram illustrating the physical representation for the working of the preferred embodiment of the present invention.

Fig. 5 illustrates the working / configuration of the invention during Driver Preload using the configuration wizard.

Fig. 6 illustrates the working / configuration of the invention during Driver Preload using the configuration wizard.

Fig. 7 illustrates the working / configuration of the invention during Driver Preload using the configuration wizard.

Fig. 8 illustrates the working / configuration of the invention during Driver Preload using the configuration wizard. Fig. 10 illustrates the working / configuration of the invention during Driver Preload using the configuration wizard_.

Fig. 11 depicts a logical representation of the binary data in the preferred embodiment of the present invention.

Fig. 12 illustrates the working / configuration of the invention during Driver Preload using the configuration wizard.

Fig. 13 illustrates the working / configuration of the invention during Driver Preload using the configuration wizard.

Fig 14 (a and b) is a flow diagram illustrating the Preload phase / DSN configuration of the Data Pattern Driver using the preferred embodiment of the present invention.

Fig. 15 is a block diagram depicting the architecture of the preferred embodiment of the present invention.

Fig 16 is a flow diagram illustrating the actual driver execution flow process in which execution of the SQL/ODBC commands is carried out when any application uses this driver, using the preferred embodiment invention.

Fig. 17 depicts the application of the driver in the preferred embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION While the present invention is susceptible to embodiment in various forms, there is shown in the drawings and will hereinafter be described a presently preferred embodiment with the understanding that the present disclosure is to be considered an exemplification of the invention and is not intended to limit the invention to the specific embodiment illustrated.

In the present disclosure, the words "a" or "an" are to be taken to include both the singular and the plural. Conversely, any reference to plural items shall, where appropriate, include the singular.

Referring now to the drawing particularly in Fig 1 is a block diagram depicting the system using the legacy systems depicting the (a) use of the shared path on the server and (b) the use of network interface that does not require sharing, along with various components and protocols involved.

In the legacy two-tier architecture as shown in Fig. 1 (a) the clients had to access and share the common data while the data was archived in such database file systems. The data definitions resided in the presentation layer i.e. on the clients rather than on the server as depicted in the figure. These data was accesses and shared using a shared path on the server.

Later in the next generation of evolution, these DDL got pushed from the clients on to a server, however each of such database server had non-standard and proprietary as defined by each application vendor as depicted in the Fig. 1 (b) using the Network Interface which does not require sharing.

Open Database Connectivity (ODBC) is an intermediate standard to transfer data between the front-end application and the heterogeneous database. It resolves the Database Information and fetches the result.

The current ODBC / OLEDB / JDBC interface specification requires a server or a provider to service its requests. Each database vendor as per their architecture provides these interface standards. The Fig 2 shows the conventional interface tier. As in the Fig 2 (a) is a block diagram depicting the physical representation of the conventional three- tier architecture. The application 200 on the client 202 connects to the Database server 204 using the ODBC Driver Manager 206. As depicted in the figure all the data information and manipulation is done by the Database Server 204 and supplied to the client 202 as per the ODBC standards (DDL, DML, DCL etc).

The driver may also act as a server that receives the SQL requests from the driver, executes them, and sends the results back. A driver that uses the data source's underlying data protocol merely acts as a conduit between an application 200 and the Data Source 208. A driver that maps to a data source's native Database API acts a conduit and as a translator. For example, the two-tier drivers include drivers for Microsoft SQL Server (which directly manipulates the data protocol) and drivers for oracle (which map to Oracle's native Database API-OCI).

The three tier Gateway driver connects to a DBMS Server 204 that acts as a gateway to another server that host the target data source, instead of directly connecting to the data .source as incase of the two- tier architecture. As shown in the figure the gateway server 210 can and generally does connect to multiple servers; each server may contain one or more data sources.

In client/server systems using ODBC based applications to access data in multiple data sources, these three-tier moves a lot of complexity from the application on the client to the DBMS on the server. This can be a tremendous help in simplifying the installation, configuration, and administration of drivers, all clients use a single driver to connect to the gateway 210, and the gateway 210 routes to the appropriate data source-specific driver on the appropriate server. At the Client-end the Application 200 communicates, through the ODBC Driver Manager 212 to the Gateway Driver 214 and the network /communication interface 216 to connect to the Gateway server 210. At the Gateway Server-end 210 the connection is handled by the Server (Gateway) layer 218 passed onto the ODBC Driver Manager layer 220. This ODBC Driver Manager 220 in turn communicates with the DBMS Driver 222 that in turn uses the Network /Communication Interface layer 224 to connect to the DBMS Server 226 and the Data Store 228.

The nature of three tier drivers makes it easy to create any number of tiers in a system. For example, an application 200 could talk to a gateway 210 that talks to another rather than to the target data source, effectively making the systems a four-tier configuration. This is not common practice, however, because most performance gains are achieved by off-loading clients to a network server, which can usually be boosted in capacity to meet the needs of more clients without having to add another gateway machine to the configuration.

Fig 3 (a) is a block diagram depicting the physical representation of the Data Pattern Driver of the proposed invention. The proposed interface works as explained in the schematic diagram below.

The application 300 at the client-end 302 fetches either the pure data 304 or data as per the LUT 306 using the data definition and the server supported features implemented in driver using ODBC Driver Manager 308. This block is further explained at a later stage.

The ODBC driver 308 emulates the server function from the definition provided the template.

The figure depicts the Data Pattern Driver architecture comprising of the Client 302, the server 312 and the DBMS Server 320.

Fig 3 (b) is a block diagram depicting the logical representation of the Data Pattern Driver of the proposed invention. The application 300 residing on the client-end 302 communicates to the ODBC Driver Manager 308. This ODBC Driver Manager 308 in turn communicates with the Network/Communication Interface 310 to connect to the server 312.

At the Server 312, the communication happens with the Look Up Table (LUT) 312. The LUT communicates with the Execution Engine 314. This Execution Engine 314 in turn communicates with the Data Pattern Based Driver 316. Further this Data Pattern Based Driver 316 communicates with the Network/Communication Interface 318 that enables to connect the server 312 to the DBMS Server 320.

The DBMS Server 320 comprises of a DBMS 322 and a Data Store 324.

Fiq <t is a block diagram illustrating the physical representation for the working of the preferred embodiment of the present invention. The figure depicts a number of clients 400 connected in the network, the server 410 has the Data Pattern Based Driver 420 configured so as to enable to get the Proprietary Non-ODBC data 430 as well as also data from the Database on a Database Server or any data store 440.

Basically the ODBC driver functions are categorized into three entities and the entire functioning of our proposed invention is explained in four stages: 1. Driver Preload, 2. Initialization, 3. Execution and 4. Termination. I. Driver Preload Phase: This is the phase when the DSN parameters are configured or the DDLs are defined that needs to be mapped and translated. This preload phase has GUI (graphical user interface) wherein the user can configure and select each legacy file, map its DDLs, transform the data pattern as the expect standards compliance and provide a migration strategy without any data loss during the execution phase of the driver as shown in the following screen shots.

I. Driver Preload {Preparation of environment before any ODBC command execution}

Either the DSN-less or DSN-based connection initiated. For every DSN created in the metadata configuration (CFG or DSN repository) is saved. Such a file includes the source and the nature of data. This CFG dictates the functional behavior of a driver and its dependency on a server or a data stream.

The screen snaps (GUI) to accomplish this task are shown in Fig 5.

Fig. 5 illustrates the working / configuration of the invention during the Driver Preload using the configuration wizard. As illustrated in Fig. 5 that shows the screen of the configuration wizard used for the creation of a new data source. The user has the option of selecting a driver 500 for which he wants to set up a data source. The Combo box lists the entire available driver list 510. The Data Pattern Driver is selected as illustrated in the screenshot. After the user selection of the driver, the "Finish" button 520 is to be clicked to activate the selection. The "Cancel" button 530 is also provided in the event the user wishes to cancel configuring using this wizard.

Fig. 6 illustrates the working / configuration of the invention during Driver Preload using the configuration wizard. As illustrated in the Fig. 6 shows the screen of the Data Pattern Driver configuration wizard used for the configuration of the DSN. The user is required to enter any appropriate user-defined name for this DSN 600 along with a field for the description 610. The Directory or file can be selected and opened using the "Directory" button 620. Files residing on any directory can be opened using the "File" button 630. In the event all the files in a directory are to be selected, the entire directory can be selected as depicted in the figure.

Once after the appropriate files or directory is selected sorting them as per the file extension 640 is also possible. To these various file names 650, along with its path 655, definition 660 and transformation 665 can be set.

After the user selection of the appropriate file(s) or directory, the "OK" button 670 is to be clicked to activate the selection. The "Cancel"^' button 680 is also provided in the event the user wishes to cancel configuring using this wizard.

Fig. 7 illustrates the working / configuration of the invention during Driver Preload using the configuration wizard. As illustrated in the Fig. 7 that shows the screen of the D?ta Pattern Definition configuration wizard used for the configuration of the DSN. The user has to select the file type and define the structure 700. The file type and the structure could either be a fixed field 710, delimited 720 and multi DDL 730. The user can further define the Row Delimiter 740 as from the drop-down list. The user can choose from either American Standard Code for Information Interchange (ASCII) 750 or Hexadecimal (Hex) 760 or Unicode 770 as depicted in the figure.

Clicking on the "Next" button 780 takes the user to the next screen of the Data Pattern Definition. The "Cancel" button 790 is also provided in the event the user wishes to cancel configuring using this wizard.

Fig. 8 illustrates the working / configuration of the invention during Driver Preload using the configuration wizard. Once the user selects the ASCII View, the screen appears as depicted in the figure. The scale 800 can be adjusted as per the user requirements. Clicking on the "Next" button 810 takes the user to the next screen of the Data Pattern Definition. The "Cancel" button 820 is also provided in the event the user wishes to cancel configuring using this wizard. Fig. 9 illustrates the working / configuration of the invention during Driver Preload using the configuration wizard. In the event the user selects the transformation of the type Fixed field 900, the screen appears as shown in the figure. The Field name can be given as per the user's choice.

By clicking on the "Back" button 910 takes the user back to the previous Data Pattern Driver Screen. Clicking on the "Apply" button 920 takes the to the next screen of the Data Pattern Definition. The

"Cancel" button 930 is also provided in the event the user wishes to cancel configuring using this wizard.

Fig. 10 illustrates the working / configuration of the invention during Driver Preload using the configuration wizard. The Data Pattern Definition wizard is as depicted in the screenshot. In the event the user selects the Delimited file type 1000 and specifies the Delimiter info such as the Row Delimiter 1010, Text Qualifier 1015, First Row contains field names 1020, further having comma 1025, tab 1030, seriπolon 1035, other delimiter 1040, details regarding number of columns 1045. The screen

as depicted in the following figure.

Clicking on the "Apply" button 1050 takes the to the next screen of the Data Pattern Definition. The "Cancel" button 1060 is also provided in the event the user wishes to cancel configuring using this wizard.

Fig. 11 depicts a logical representation of the binary data in the preferred embodiment of the present invention. The figure depicts Raw Binary Data file 1100 wherein the Data Header 1105 could be optional and the Data stream 1110 defined within the Start of Data (SOD) 1115 and End of Data (EOD) 1120.

A data stream is defined from the Start Of File (SOF) 1125 to the End Of File (EOF) 1130. Further the Data Stream can have an optional header 1105 as depicted in the figure. The SOD 1115 and EOD 1120 of the file are defined which comprise the record (1) 1135. Further these Data byte stream comprises of such "n" records 1135. Each of such record has a record length defined between the Start of Record (SR) 1140 to the End of Record (ER) 1145. Further each of such record comprises of "n" columns 1150. Each of these columns has it's own defined data types 1155. The data column length 1155 is defined using parameters 1160 such as start position including the length, Data type such as string, Data, Number, Discardable etc, Scale including the field width, Precision such as format for presentation logic interpretation, Mask reference value, interpretation logic etc.

Further depending on the Multiple DDL 1165, the data byte stream is defined between the SOF 1125 and the EOF 1130. The first part is the header that can also be optional 1105. A multiple DDL file comprises of multiple tables separated by delimiters 1170. The SOD 1115 is defined followed by the table delimiter 1170, and further followed by the table data. In a multiple DDL file, tables can be of variable length too.

Fig. 12 illustrates the working / configuration of the invention during Driver Preload using the configuration wizard. As depicted the transformation wizard lists the available columns names 1200. The user's selected columns 1210 are displayed including the details such as column name 1220, Data type 1230, Length 1240, Decimal 1250 etc.

Clicking on the "Apply" button 1260 takes the to the next screen of the Data Pattern Definition. The "Close" button 1270 is also provided.

Fig. 13 illustrates the working / configuration of the invention during Driver Preload using the configuration wizard. The Data Pattern Definition wizard is as depicted in the screenshot. In the event the user selects the file type and defines the structure as Multiple DDL 1300, the delimiter information 1310 needs to be given such as the column delimiter, table name in column number, first row contains field names and the table list 1320. By clicking on the "Back" button 1330 takes the user back to the previous Data Pattern Driver Screen. Clicking on the "Next" button 1340 takes the to the next screen of the Data Pattern Definition. The "Cancel" button 1350 is also provided in the event the user wishes to cancel configuring using this wizard.

In the event the file supports Multiple DDL1415, the isolation of these DDL marker(s) 1420 is carried out. After the isolation of the DDL marker 1420, the isolation of the row/record delimiter(s) 1430 that is usually CRCF for text or any other printable or non-printable character is carried out. In the event the data file supports single DDL 1415 then the system proceeds to isolate the row/record delimiter(s) 1430.

After the isolation of the row/record delimiter(s) 1430, the record length is derived based on the occurrence of delimiter(s). After the record length is derived based on the occurrence of the delimiter(s) 1435 the byte stream pattern is analyzed within the record length to isolate the columns 1440. Further after the analysis of byte stream pattern within record length to isolate the columns 1440, the column count is deduced as the counter for DDL definition 1445.

Further after the column count is deduced as the counter for DDL definition 1445, the association of sub-entities in byte stream within record length to a matching data type satisfying data characters in data type criteria 1450. Further after the association of the sub-entities in byte-stream within record length^' to a matching data type satisfying data characters in data type criteria 1450, the system deduces the scale and precision based on the length of associated like pattern data stream 1455. Further the system proceeds to isolate the presentation mask, if any 1460 such as "|" or ":" or "." As part of byte stream which are generally associate with data type like date, time or float. Further after such an isolation of the presentation mask, the transformation logic is derived as per the target data type conformance of constraints 1465. Further check whether the column count is zero 1470. In the event the column count is nonzero the step from associate sub-entities in byte stream within record length to a matching data type satisfying data characters in data type criteria 1450 onwards are repeated till the column count equals to zero 1470. In the event the column count is zero 1470, then persist the data definition and all associated parameters in DSN reporting as per LUT archival format 1475. Further the persistence of the data definition and all associated parameters in DSN reporting as per LUT archival format 1475. Further the system proceeds to report that DSN repository successfully created, which server to map and provide data to SQL commands expected from an ODBC driver by application that relies otherwise on a database server to provide these definitions 1480.

Result of the preload exercise is the DSN repository as shown below

DSN repository schematic layout or the Look Up Table (LUT)

Initialization Phase:

The driver initialization as per the DSN, loads the DSN repository as a LUT (lookup table) that is referred upon by any ODBC command that requires data based on the LUT. For example, generally whenever any object information is required or details related to its definition are required the SQL command that query information from the respective server, gets this data from the DSN repository.

Generally as stated in the SQL f¹ 'notion table below we classify the ODBC commands into three major categories and execution of these commands by the driver uses the LUT or the driver engine (referring to the block diagram as shown later) to successfully execute the ODBC query as if a server was processing their request and responding to the desired information.

SQL Commands Connection commands

ODBC Data definition commands (DDL) Data manipulation commands (D L ODBC Installer Commands

This table below shows the source of the ODBC commands.

III. During Execution

Data source is available as direct access (i.e. Disk Access) or through a disk / network agentj.e. the network activity as if the data source or path, is not local.

The nature of command given by the application may demand data definition or data itself. This request gets redirected to the source of requested server or is executed by the driver, mapping the lookup table with the execution script, if any as defined by the user. The nature of object in the command may demand redirection to a different data source i.e. the object table defined in the source definition maps to a data entity residing outside the data source. For example the linked list or a database linked data sources. This may demand interpreting two or more than two data sources and mapping their data type definitions to interpret data streams, i.e. their SQL data types match (including scale and precision but the names of these data types varies as per vendor implementation). For example, the DBMS or the non-ODBC compatible byte streams generally do not have 'VARCHAR' implementation & most of them have a fixed width character byte stream. Hence a character or text date type may or can be mapped to varchar.

The source to target translation may not have an equivalent support in the target data source hence data may be lost or may be misinterpreted. This has to be taken care by the developer or the user to map these data definition and their interpretational entities.

Like any data migration tools that support ETCL the driver supports scripting which can cleanse, interpret or transform source data presentation logic to target data expected pattern. This behavior may also like in interpreting the values of data types with limited value range options e.g. Boolean data type can be interpreted as True or False, Yes or No & 1 or O.

But in data source 1 (DS1 ), true may be interpreted as -1 & false as O where as in data source 2 (DS 2) the interpretation of true may be 1 and false as O.

The source byte stream may be delimited, fixed size or may map to more than one pattern of DDL. Hence if the pattern of data saved in the source can be deterministic then a script-logic can be deduced to translate the archived pattern and map these byte streams to any ODBC compliant data type. This basic principle is used to convert and use streams of any heterogeneous data sources and translate to a widely accepted defined standard called as ODBC.

IV. Unlog It relinquishes any resource allocated or in use and restores it back to the OS. It maintains a history or any environment change details enforced during execution phase.

Fig. 15 is a block diagram depicting the architecture of the preferred embodiment of the present invention. This block diagram comprises of Network Interface 1500, SQL Command translation Engine 1510, SQL command execution engine 1520, Disk Manager 1530, Data Pattern Definition Translator 1540, Resource Manager 1550, Transaction Manager 1560, Error Handler 1570, ODBC Driver Manager 1570, User Application 1580, DSN Repository or LUT 1590.

Network Interface 1500:

This is th^<=^> basic communication channel which acts as a conduit for data and command transfer between the driver and the application irrespective of underlying physical hardware such as NIC (network interface card) or DUN (dial up network) and the binded protocol stack. Most of the drivers have a configuration setting to tune the transport layer such as defining port number or channel name (such as named pipe) as a part of DSN configuration or associated server supported default communication protocol (native interface).

SQL Command Translation Engine 1510:

Generally the ODBC drivers are dependent on the server for most of the command executions but our proposition embeds basic server features to make the ODBC driver independent of database server. We broadly classify the ODBC commands into three categories i.e. those required during or for initialization, command execution and command or driver termination. The SQL command translation engine is the primary decision maker that analyzes command input and triggers the execution engine for delivering requisite result based on the nature of request and definition or data availability to execute the command. SQL command execution engine 1520:

The command execution engine is the heart of the ODBC driver and substitutes the need of a Database server to work upon data to deliver query results. The SQL command is parsed and executed by the execution engine based on the nature of command. For example, in case if the command needs data definition or executes any catalog functions, then the SQL command execution engine uses the DSN repository as primary source of information and returns results rather than forwarding the query to a Database server and wafting for data object definition result. Many commands, which can be direct these DML statements may require this engine to physically read the file as per DSN repository definition and execute the query on the file data as per conditions in the query. Generally most of the ODBC / OLEDB driver for non-ODBC compliant data support only the SQL "SELECT" operations but in the preferred embodiment of the present invention this engine also supports "INSERT" / "UPDATE" / "DELETE" and other native commands which do not require complex RDBMS functionalities. Since the primary object of this invention is to migrate legacy data and support smooth support for technology transition, complex DML features are not supported but can be extended into if required.

Disk Manager 1530:

This module of the ODBC driver serves to perform and manipulate all file operations irrespective of the nature of request. All "INSERT" (append), "UPDATE" / "DELETE" (write) or "SELECT" (read) operations which require disk drive manipulations tiered on the legacy file system are part of the Disk Manager Module. In conjunction with the Resource Manager and Data Pattern Definition Translator, the Disk Manager is the prime data sourcing or sinking hub for the ODBC driver. All the basic file operations on legacy file data files like open / close / read / write / seek based on their data pattern definitions is managed by the disk agent.

Data Pattern Definition Translator 1540: This module primary maps necessary or unnecessary data as per the DSN definition. Lot of legacy file system or resource constraints forced programmers to support Multiple DDL (more than one file data with different definition) to be embedded in a single file stream for lack of OS features to support large number of simultaneous file operations. Also the expense incurred for such disk resource forced user to derive results from partial or incomplete definition (primary problem of Y2K). Hence much of the archived data were later interpreted or translated to meaningful entities. Also many applications were sensitive to the format of persistence since the presentation logic supports a data-mask that forced the users to adhere to various data capturing convention for example the date can be stored in either of the formats such as dd/mm/yyyy or mm/dd/yy etc. The data pattern definition translator co-relates the presentation layer to the persisted layer to avoid valid anomalies in data interpretation.

Resource Manager 1550:

This module allocates or deallocates i.e. manages and schedules resource used by each command; prior to execution, during execution and upon successful completion of command execution. The resource may comprise of memory, disk (swap) or network. It serves to manage and manipulate hardware resources and co-ordinate buffer, caching or swapping as needed by the driver during various states or stages of command execution.

Transaction Manager 1560:

Typical legacy databases never had the concept of transactions but once the interface to hook to contemporary standards arises transactional management becomes a necessity. The transaction manager along with disk agent manages temporary swap or rollback segment data and indexing data if required (for large volumes). The need for maintaining various states of data buffer during all its transitionary phase between begin and commit commands is managed by this module. Primarily its enforces ACID (A - atomicity, C - Consistency, I - Isolation, D - Durability) properties during any state entity change.

Error Handler 1570:

Apart from being a driver state-notifier notifying successful or unsuccessful command results the error handler also is responsible to free resources during any exception or abnormal termination of driver or command.

ODBC Driver Manager 1580:

This ODBC Driver manager is part of the operating system interface which dictates and abides by the ODBC ^T'andards and is a basic data exchange HUB provided by MS OS itself. Other OS vendors like UNIX and Netware have, now adapted the standard being widely accepted by developers.

User Application 1585:

The User application is any GUI based front end data capturing module developed by programmers using any RAD (rapid application development tool) like Visual Basic / Power Builder / Visual C++ etc. Generally the applications using standard DB connectivity interfaces like ODBC / JDBC or OLEDB to establish a communication mechanism for data exchange.

DSN Repository or LUT 1590:

The persistent data definition and all associated parameters in DSN reporting are per LUT archival format. Further the system reports on successful DSN repository creation, which server to map and provide which data to the SQL commands expected from an ODBC driver by application that relies otherwise on a database server to provide these definitions. Fig 16 is a flow diagram illustrating the actual driver execution flow process in which execution of SQL/ODBC commands is carried out when any application uses this driver, using the preferred embodiment invention. As depicted in the flow diagram, the system waits for the user command 1600. After the command from user application is received 1605, the command is analyzed 1610. After the command analysis 1610, the system proceeds to isolate command as per the state of the driver 1615. After the isolation of the command as per state of the driver 1615, the system proceeds to check whether the command requires resource 1625. In the event the command doesn't require resource 1620, further the system proceeds to check whether the command wants to release resource 1625. In the event the command wants to release resource 1625 then free the resources 1630 accordingly. In the event the command doesn't release the resource 1625 then proceed with the normal execution 1635. In the event the resources are freed 1630 then again proceed with the normal execution 1635.

In the event that the command requires resource 1620, then the allocation of the resources is carried out 1640. Further as seen from either output of normal execution 1635 or after resource allocation 1640 then check for initialization 1645. In the event the initialization has occurred 1645 then proceed to read the DSN repository and load data pattern definitions and constraints and driver DSN metadata 1650. In the event the driver is not initialization 1645 then the system proceeds to check for execution 1655. In the event the driver is in the execution phase 1655, the metadata are referred to as and when require 1660. Further the system proceeds to initiate any read or write activity based on metadata definitions on data directly 1665. After the initiation of any read or write activity based on metadata definitions on data directly 1665, the Driver proceeds to execute command with arguments as^" base information to createesultant data 1670. Further the driver, after execute command with arguments as base information to esultant data 1670, the notification of the status such as success or error to the calling

initialization Phase 1645, read DSN repository and load data pattern definitions and constraints and driver DSN metadata 1650. Further the driver, after reading the DSN repository and load data pattern definitions and constraints and driver DSN metadata 1650 the proceeds to create any lookup information required by the driver related to any DDL or DML command executions 1680. Next after creation of the lookup information required by the driver related to any DDL or DML command executions 1680, proceed to the notification of the status such as success or error to the calling application 1675.

In the event of the Unload Phase 1683, the system proceeds to free resources and restore borrowed resources per command or for the entire driver instance 1685. Further after freeing resources and restore borrowed resources per command or for the entire driver instance 1685, proceed to return control back to parent process 1690. Further the system, after returns control back to parent process 1690 then proceeds to the notification of the status such as success or error to the calling application 1675.

Further after the notification of the status such as success or error to the calling application 1675, the driver waits for user command 1600.

Fig. 17 depicts the application of the driver in the preferred embodiment of the present invention. As explained in the figure the application using multiple connections to access heterogeneous Data server, the proposed Data Pattern Driver 1705 can be linked by any application programming interface 1700 and get access to more than one source of data (homogeneous or heterogeneous). The suggested driver interface 1700 and its command execution can execute any join condition query 1710 from heterogeneous sources are depicted in the schematic.

This figure illustrates how the proposed driver interface 1705 can be extended to use other sources of data that can be in conjunction with RDBMS data. Many applications require mapping data from various sources and prepare the reports or analysis without disturbing the source data such as a RDBMS in a standard compliant database 1715 or/and any Non-ODBC legacy data 1720 in the form of pure file system comprising of files 1725 as depicted in the database. Also since legacy applications were not functionally rich and newer business environment demanded more analysis, a lot of historical data were used to derive or create data marts or warehouses. This is done without disturbing the legacy application in a live environment but the data generated by live transactions needs to be interfaced for generating reports.

The diagram shows a simple query being executed with a join condition 1710 from various sources of data including a legacy proprietary source without any compliance to ODBC standards 1720 and its file 1 being interpreted as table 1730. Yet with the proposed data pattern driver the query gets successfully executed since the Data Pattern Driver 1705 itself delivers the basic server functionalities expected from a database server.

Claims

WHAT IS CLAIMED IS: 1. A method for accessing data by a client application from a stream of data patterns, the method comprising: preloading an environment, wherein the preloading step further comprises of identifying at least a data source, analyzing a data definition, mapping the data definition and persisting the data definition in a repository; initializing a driver wherein the initializing step further comprises loading the data definition from the repository; and executing a query received from the client application as per the repository.

2. The method of Claim 1 , wherein the preloading step further comprises: connecting to the data source; identifying object in the data source; identifying the stream of data pattern for each object of the data source; mapping the stream of data patterns of the data source into the repository, wherein the repository created helps to map the data definitions; and persisting the data definition as per the repository archival format.

3. The method of Claim 2, further comprising deriving a transformation logic as per constraints of a target data type.

4. The method of Claim 2, further comprising isolating a plurality of connection request as per the client specific communication standard.

5. The method of Claim 1 , further comprising of database client applications accessing the stream of data pattern.

6. The method of Claim 1 , further comprising the client application accessing data from a non- dbms source, the non-dbms source comprising a text file, binary file.

7. The method of Claim 1 , wherein the initialization step further comprises: loading the repository for a query execution; identifying and analyzing the query; classifying the query; creating the repository for the data definition;

8. The method of Claim 1 , wherein the execution step further comprises:

Isolating the query; analyzing the query; parsing the query; validating the query as per the definition in the repository; executing the query as per the operation requested in query to obtain a result.

9. The method of Claim 8, wherein the execution step further comprises receiving at least a client query from the client application.

10. The method of Claim 8, wherein the execution step further comprises allocating resources necessary for execution.

11. The method of Claim 8, wherein the execution step further comprises of executing at least a query wherein the query comprises a sub-query wherein a result of the sub-query is required for the execution of the query.

12. The method of Claim 8, wherein the execution step further comprises of translating the result to a widely accepted defined standard.

13. The method of Claim 1 , further comprising an unload phase relinquishing at least a resource allocated

14. The method of Claim 1 , further comprising an unload phase relinquishing at least a resource in use after the completion of query execution.

15. The method of Claim 1 , wherein the unload phase comprising a log entry for an environment change details enforced during the execution.

16. The method of Claim 1 , wherein the result of the query execution that changes at least a data definition, the data definition is updated to the repository.

17. The method of Claim 8, wherein the execution of the query is done by the driver.

18. A method to provide a common interface to a plurality of databases, the method comprising: preloading the environment wherein a repository is formed; initializating data source connectivity and loading the repository; executing at least a query;

19. A system for connecting to and accessing a plurality of computer data source using a client application, the system com^prising: a client computer including a client memory, a client processor, and a client transceiver in communication with one another, the client memory including a driver, the driver being used to initialize a connection between at least one client application and at least one data source; the client application in conjunction with the client processor and transceiver being used to send a query to the server via the driver, the query comprising at least one parameter; the driver comprising of: a command translation engine for analyzing the parameters of the query and executing the query, the command translation engine producing an analyzed parameter; a data pattern definition translator for mapping the data, the data mapping is as per the requisite' definition; a repository of the data definition, the repository providing the data definition to a command execution engine for executing the query; and a command execution engine for parsing and executing the query, the command execution engine receiving the analyzed parameter of the query for parsing and executing the query using the repository.

20. The system of Claim 19, wherein the system further comprises a disk manager, the disk manager residing in a driver memory to perform and manage manipulation for a file operation, manipulation of the file operation irrespective of the nature of the query.

21. The system of Claim 19, wherein the system further com.Drises of a resource manager, the resource manager to manage and schedule resources used by the query.

22. The system of Claim 19, wherein the system further comprises of a transaction manager for transactional management working along with the disk agent for managing temporary swap, rollback segment data and log file data.

23. The system of claim 19, wherein the system further comprising of error handler for notifying command results and handling exceptions or abnormal termination of driver or command.

24. The system of claim 19, wherein the system further comprising of an ODBC driver manager, the ODBC driver manager being part of the operating system interface dictates and abides by the standards acting as a hub for data exchange. whereby connecting a plurality of database client applications.