WO2002019154A1

WO2002019154A1 - Virtual groups

Info

Publication number: WO2002019154A1
Application number: PCT/US2001/012343
Authority: WO
Inventors: Walter Lindsay
Original assignee: Contivo, Inc.
Priority date: 2000-08-29
Filing date: 2001-04-10
Publication date: 2002-03-07
Also published as: EP1328873A1; AU2001253546A1; JP2004507839A; CA2420401A1

Abstract

A method including disambiguating a source logical structure (610), a target logical structure (620) having a similar format, wherein each logical structure has fields (615, 625), wherein each field has a meaning based on data in the logical structure is disclosed.

Description

VIRTUAL GROUPS

FIELD OF INVENTION

The invention is related to the field of representation and translation of electronic documents.

BACKGROUND OF THE INVENTION

A field in some document forms can have multiple meanings. In some concrete documents, the meaning of a structure in a concrete document is apparent only by examining the data in fields. Such structures often repeat, such as loops in EDI or SAP IDOC documents, or elements that have content lists in XML. Such structures often have their meanings defined by one (or more) fields in the document (in XML, usually by values in attributes).

For example, in an EDI 850 (i.e. a purchase order), a "segment" (group) called "Nl" is shown. The hierarchy of a Nl segment can be represented as shown in Table 1 (this is a partial description):

Nl (group)

N101 (qualifier field) N102 (data field) N2 (group)

N201 (data field)

N202 (data field) N3 (group)

N301 (data field)

N302 (data field) N4 (group)

N401 (data field)

N402 (data field)

N403 (data field)

N404 (data field) NX2 (group)

NX201 (data field) NX202 (data field) NX203 (data field) Table 1

In this case, NlOl is a qualifier field, and all other fields hold data. The meaning of every data field here derives from the value in NlOl. In a concrete document, if NlOl is "ST", then all other fields under Nl describe "Ship To" information: N102 might "mean" the person's name to ship to; N401 is the city name in the ship to, address and N402 is the state or province in the ship to address.

However, if in a concrete document, NlOl is "BT", then all the fields above - N102, N401, N402, etc. - hold "Bill To" information. Their meanings are analogous to when NlOl is "ST": for example, if NlOl is "BT", then N401 is the "Bill To" city name.

The structure possible under Nl does not vary based on the qualifier code in NlOl. In a concrete document, the meanings of the fields under Nl depend on their location and on the value of NlOl.

Conventional solutions to translate one document, such as a customer document, to another document, such as a vendor document, relies on writing customized code to handle such cases. For example, suppose Vendorl has a mapping tool that Customerl uses. Customerl is mapping an EDI 850 (see Table 1 above) to Document2. Customerl writes code, such as the code shown in Table 2. if (NlOl == "ST")

{ move N102 to Document2.fieldl move N301 to Document2.field2 move N401 to Document2.field3 move N402 to Document2.field4 move N403 to Document2.field5

} else if (NlOl == "BT")

move N102 to Document2.fielda move N301 to Document2.fieldb move N401 to Document2.fieldc move N402 to Document2.fieldd move N403 to Document2.fielde

}

Table 2

Locations in a document can have more than one meaning. This means that mapping is hard to automate. Instead, the mappings must be manually done and require customized code, which does not allow reuse of mapping knowledge and rules.

There are significant disadvantages to using the conventional solutions. For example, both mapping and the mapping rules are one-off. That is, each time a user wants to define how to perform a document translation, similar code must be written and tested. This increases the time needed to define how to translate from the source to the target document.

Another disadvantage is that both mapping and the mapping rules depend on user-written code. This makes it hard to automatically validate the integrity of the mapping. It also sets a minimum bar for the skill level of anyone trying to define a mapping, as they must then know all the document locations that might hold a particular meaning, and must be skilful enough to write the code to handle the case. This imposes a maintenance burden, as fixing a problem in a mapping requires altering code. Furthermore, the mapping and the mapping rules are translation- language dependent. The code that must be written and tested depends on the underlying translation engine that will translate the documents. Thus, mapping rules will be translation-engine dependent, and a translation defined for one translation engine will likely need adjusting to make the mapping work on a different translation engine. Moving a transform from one translation engine to another is difficult.

The source and target mappings must be significantly different. The code for handling the case described above will differ whether the document is the source or the target document. If one has mapped from A to B, mapping from B to A requires major rework, as the code for the mapping would have to be rewritten using different logic.

Mapping tools previously used superficial similarities in field names or document structure as the basis for automapping. They could not automap to virtual structures, forcing users to write code.

SUMMARY OF THE INVENTION

A method including disambiguating a source logical structure, a target logical structure having a similar format, wherein each logical structure has fields, wherein each field has a meaning based on data in the logical structure is disclosed.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which:

Figure 1 shows an example of an embodiment of a data structure for a document.

Figure 2 shows another example of an embodiment of a data structure for a document. Figure 3 illustrates an example of a data structure used to create a virtual group.

Figure 4 is an example of a network that uses virtual groups to translate documents.

Figure 5 is an example of a computer system that uses virtual groups to translate documents.

Figure 6 is an example of a translation system that uses virtual groups to translate documents.

Figure 7 is an embodiment of a method for automatically doing translation using virtual groups.

DETAILED DESCRIPTION

A method of using virtual groups to automatically generate a transform is disclosed. Using virtual groups in concrete documents that have structures that can have multiple meanings allows meta-data of each document to be represented so that each structure has a single meaning. This enables a transform to be automatically generated, as discussed below.

Figure 1 depicts an example of a data structure for part of a document. Groupl has other groups under it, GroupA, GroupB and GroupC. GroupB has other groups under it, including Groupl, which has field Fieldl.l in it. The full path through this document, from the top as shown down to Field 1.1, can be described as:

Groupl/GroupB/Groupl/Fieldl .1

Fieldl.l holds the qualifier code that helps specify the meaning of other fields somewhere under GroupB. If the document in Figure 1 is the source in a mapping, the data of mapped fields other than Fieldl.l will be moved to the target. If Figure 1 describes a target document, Fieldl.l will have the correct value filled in, when other fields that have data are moved over from the source.

It is important to realize that in many concrete documents a structure like GroupB can repeat many times. Figure 1 therefore shows one example of an embodiment of meta-data. In a concrete document, GroupB can appear multiple times. Each appearance is a particular instance of GroupB. Each instance of GroupB can have different data in its fields, and the meaning of all fields under the group depend on the value of NlOl.

Enabling and Disabling Virtual Groups

Figure 2 depicts a data structure for the same document as Figure 1, except that GroupB_ST is defined. Thus, if a mapping accesses a field under GroupB_ST, then the Fieldl.l in the concrete document is known to have "ST" in it. The structure under a virtual group is identical as the structure in its "to virtualize" group. GroupB_ST has under it the same structure as GroupB. The qualifier code for the virtual group appears in the virtual group's name.

GroupB_ST is an enabled virtual group. That is, it appears in the document. There are events, such as the user in the GUI, that trigger the virtual group to be enabled - inserted into the document, and the substructure under its "to virtualize" group made available under it. Similarly, a virtual group can be disabled, causing its structure to disappear from the document.

Source Data for Defining Virtual Groups

The descriptions of virtual groups are stored in an external data source. Figure 3 illustrates a data structure including the information needed to create the virtual group of Figure 2. The information needed to generate a virtual group is:

• ToVirtualizeGroup - the group to be "virtualized". That is, the group whose structure is represented by the virtual group, and the group that has fields somewhere under it that are modified by the value of the qualifier field.

• QualifierField - the field that holds the qualifier code.

• Qualifier - the qualifier code, that when in the QualifierField in a concrete document, identifies the meaning of the fields under the ToVirtualizeGroup . • Description - a textual description of the general meaning of the virtual group.

Using Virtual Groups in Source Documents to Automatically Generate a Transform

Users can apply mapping rules to meta-data to map data from a field under a virtual group in the source document to corresponding field(s) in the target document. A field under a virtual group in a source document can be treated just like any other field. Whatever operations - move, or any other mapping rule that might be applied to standard fields - apply to fields in a virtual group as well.

A transform is the code used by a translation engine to convert one concrete document to another. A transform is generated by applying mapping rules to the meta-data of the source and target documents. After mapping rules and meta-data, including virtual groups, are defined, a transform can be automatically generated which performs the following processing on fields under virtual groups defined for a concrete target document: If the qualifier field holds the qualifier value map oilfields in the group or any subgroup according to the defined mapping rules to the target document. Else if the qualifier field holds any other value do nothing for this instance of the group.

Virtual groups in a source document are conditional - the fields under them exist if and only if the qualifier field in a concrete document under the "to virtualize" group holds the qualifier value. In the example shown in Figure 2, the "to virtualize" group is GroupB, and the qualifier field is Groupl/Fieldl.l. Thus, a mapping from any field under GroupB_ST is conditional on GroupB/Groupl/Fieldl.l holding the qualifier value "ST".

Note that simple extension to support default values, etc., are within the scope of the invention. For example a constant value for a field under a virtual group could be added to the information shown in Figure 3, such that if none of the qualifier fields match the qualifier value, then the field's value is equal to the default.

Using Virtual Groups in Target Documents to Automatically Generate a Transformation

Users can apply mapping rules to meta-data to map from field(s) in the source document to a corresponding field under a virtual group in the target document. A field under a virtual group in a target document can be treated just like any other field. Whatever operations - move, or any other mapping rule that might be applied to standard fields - apply to fields under a virtual group.

A transform is the code used by a translation engine to convert one concrete document into another. A transform is generated by applying mapping rules to meta-data of the source and target documents. After mapping rules and meta-data, including virtual groups, are defined, a transform can be automatically generated which performs the following processing on fields under virtual groups defined for a concrete target document:

For any field in a virtual group or a subgroup of a virtual group, if the qualifier field for the virtual group is already set to a value other than the virtual field's qualifier value do nothing.

Else put the qualifier value in the qualifier field and put the data value into the data field.

For example, in Figure 2, if the user mapped to GroupB_ST/Group2/Field2.2, then the transform code would specify this:

If the current GroupB /Groupl /Fieldl.l is empty or holds "ST", then put the value from the source document into GroupB/Group2/Field2.2. Note that simple extension to support default values, etc., are within the scope of the invention. For example, a constant value for a field under a virtual group could be added to the information in Figure 3, such that if a field has nothing mapped to it, then the field's value is equal to the default.

Hardware Overview

According to the present invention, a host computer system transmits and receives data over a computer network or standard telephone line. According to one embodiment, the steps of accessing, downloading, and manipulating the data, as well as other aspects of the present invention are implemented by a central processing unit (CPU) in the host computer executing sequences of instructions stored in a memory. The memory may be a random access memory (RAM), read-only memory (ROM), a persistent store, such as a mass storage device, or any combination of these devices. Execution of the sequences of instructions causes the CPU to perform steps according to the present invention.

The instructions may be loaded into the memory of the host computer from a storage device, or from one or more other computer systems over a network connection. For example, a server computer may transmit a sequence of instructions to the host computer in response to a message transmitted to the server over a network by the host. As the host receives the instructions over the network connection, it stores the instructions in memory. The host may store the instructions for later execution or execute the instructions as they arrive over the network connection. In some cases, the downloaded instructions may be directly supported by the CPU. In other cases, the instructions may not be directly executable by the CPU, and may instead be executed by an interpreter that interprets the instructions. In other embodiments, hardwired circuitry may be used in place of, or in combination with, software instructions to implement the present invention. Thus, the present invention is not limited to any specific combination of hardware circuitry and software, nor to any particular source for the instructions executed by the host computer. Figure 4 illustrates a system 400 in which a host computer 402 is connected to a remote computer 404 through a network 410. The network interface between host computer 402 and remote 404 may also include one or more routers, such as routers 406 and 408, which serve to buffer and route the data transmitted between the host and client computers. Network 410 may be the Internet, a Wide Area Network (WAN), a Local Area Network (LAN), or any combination thereof. The remote computer 404 may be a World-Wide Web (WWW) server that stores data in the form of 'web pages' and transmits these pages as Hypertext Markup Language (HTML) files over the Internet network 410 to host computer 402. To access these files, host computer 402 runs a 'web browser', which is simply an application program for accessing and providing links to web pages available on various Internet sites. Host computer 402 is also configured to communicate to telephone system 412 through a telephone interface, typically a modem.

Figure 5 is a block diagram of a representative networked computer, such as host computer 402 illustrated in Figure 4. The computer system 500 includes a processor 502 coupled through a bus 501 to a random access memory (RAM) 504, a read only memory (ROM) 506, and a mass storage device 507. Mass storage device 507 could be a disk or tape drive for storing data and instructions. A display device 520 for providing visual output is also coupled to processor 502 through bus 501. Keyboard 521 is coupled to bus 501 for communicating information and command selections to processor 502. Another type of user input device is cursor control unit 522, which may be a device such as a mouse or trackball, for communicating direction commands that control cursor movement on display 520. Also coupled to processor 502 through bus 501 is an audio output port 524 for connection to speakers that output audio signals produced by computer 500.

Further coupled to processor 502 through bus 501 is an input/ output

(I/O) interface 525, and a network interface device 523 for providing a physical and logical connection between computer system 500 and a network. Network interface device 523 is used by various communication applications running on computer 500 for communicating over a network medium and may represent devices such as an ethernet card, ISDN card, or similar devices.

Modem 526 interfaces computer system 500 to a telephone line and translates digital data produced by the computer into analog signals that can be transmitted over standard telephone lines, such as by telephone system 412 in Figure 4. In an embodiment of the present invention, modem 526 provides a hardwired interface to a telephone wall jack, however modem 526 could also represent a wireless modem for communication over cellular telephone networks. It should be noted that the architecture of Figure 5 is provided only for purposes of illustration, and that a host computer used in conjunction with the present invention is not limited to the specific architecture shown.

The network and computer systems shown in Figures 4 and 5 are used by the present invention to automatically generate a transform to convert the first document into the second documents. Documents are composed of discrete data elements called fields. A field is a named location in the document that holds data. A group is a named container in the document that holds several fields or subgroups. A map is a recipe that describes how the fields of one document (the source) are transformed into the fields of another (the target). Different document formats have many abstract concepts in common, even though they may represent these abstractions in different ways. For example, a postal address can be represented in a variety of ways, depending on a given document format. The postal address can contain several fields, such as name, street and city. This collection of fields is an example of a group, as discussed with respect to Figures 1 and 2 above.

A mapping engine 650 creates a translation map, as shown in Figure 6. The translation map is used by a translation engine 630 to convert, or translate a message from a source format to a target format. The translation map is a description of the fields in the source document that will be used to populate a field in the target document of the instructions to move and manipulate the source fields to produce the target document. Figure 6 shows an example of the groups and fields of two different documents, a source document format 610 and a target document format 620. In this embodiment, the document is a purchase order. However, the document may convey any information that one person or business wants to send to another person or business. The source group 615 includes the source fields of name, address, city, description, price, quantity, and total. The target group 625 includes the fields name, location, information, cost, number, and amount. Although the formats of the fields in the source and target groups are structurally different, they have similarities and common abstractions such as name, amount, and place to ship the goods. Thus, the names of the fields in groups 615 and 625 may be different, such as "price" and "cost," for example, but the data 617 and 627 contained in these fields functionally equivalent.

A virtual group that corresponds to the source and target groups 615 and 625 can be used to capture these common abstractions using meta-data. For example, meta-data associated with the source document can be used by the mapping engine to define a virtual group that corresponds to source group 615. The meta-data used to define the virtual group can be obtained from a data structure such as the data structure of Figure 3. After the virtual group is defined, the mapping engine can apply mapping rules to the meta-data associated with the source group, including the virtual group, to automatically generate a transform. The transform is then provided to the translation engine, which uses the transform to convert the source concrete document into the target document.

Figure 7 shows an embodiment of a method for automatically generating a transform using virtual groups. One or more virtual groups for a first document are defined, step 710. The virtual groups are defined using meta-data contained in the data structure of Figure 3. One or more of these virtual groups are enabled, so that the enabled virtual groups appear in the first document, step

720. One or more of the virtual groups may be disabled, so that the disabled virtual groups do not appear in the first document, step 730. Mapping rules to map data from fields in the first document to fields in a second document are defined, step 740. Then, a transform to convert the first document into the second document is automatically generated by applying the mapping rules to the meta-data, including the enabled virtual groups, of the first and second documents.

Using virtual groups provides several advantages. First, the method of using virtual groups enables a mapping engine to automatically generate a mapping that describes how to map data between source and target documents. Automatically generating the mapping enables virtual groups as needed - if it discovers that a field under a virtual group that could potentially be enabled is required by the mapping, it enables the virtual group.

Second, merely doing the mapping tends to be sufficient if a virtual group is involved. The user does not need to write code to identify when the data under a group has a particular meaning, or to put the qualifier code into a target virtual group in the target document.

Third, the ability to write code is not compromised. This new technique can co-exist with the older way of doing things. Fourth, transformation instructions for the translation engines can be successfully generated.

Fifth, mapping from document A to B is much closer to mapping from B to A than mapping without virtual groups. Thus, the mapping from B to A has been made closer to the transposition of the mapping from A to B. Mapping one direction then provides most of the information needed to map the other direction. If users had to write code to map from A to B, such a transposition would be far more work. With virtual groups, transposing a mapping is far less work.

Sixth, mapping to or from fields under a virtual group is translation- engine independent. The method of using virtual groups merely generates the code appropriate for a particular translation engine when writing out the transform in the way that translation engine requires.

Seventh, without needing to perform complicated analyses of user- written code, the method and apparatus can validate mappings to and from fields under virtual groups, as most cases do not require the user to write code. Because fewer mappings require the user to write code, mapping difference checking is easier. Eighth, a non-programmer can do most of the work of mapping.

Ninth, maps are more translation-engine independent, as less overall code is needed. Tenth, creating a transform is faster, because the automapping generated code has a better hit rate. Eleventh, maps have fewer bugs, because users do not need to write code. Thus, debugging a mapping is faster. Twelfth, time to market is faster for users. Thirteenth, virtual groups work with virtual fields. Fourteenth, nesting of virtual groups works. That is, the meaning of a structure in a concrete document can depend on several qualifier fields as discussed above.

These and other embodiments of the present invention may be realized in accordance with these teachings and it should be evident that various modifications and changes may be made in these teachings without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than restrictive sense and the invention measured only in terms of the claims.

Claims

CLAIMSWhat is claimed is:

1. A method comprising: disambiguating fields of a source logical structure or a target logical structure, each field having a meaning based on data in the logical structure.

2. The method of claim 1 further comprising: defining meta-data including virtual groups for the source or target logical structures; and creating a map from one logical structure to the other logical structure using meta-data.

3. The method of claim 2 wherein creating a map further comprises: defining mapping rules to map data from a virtual group in one of the logical structures to a from a location to a virtual group in the other logical structure.

4. The method of claim 3 further comprising automatically generating a transform from the mapping rules and the meta-data, including the virtual groups.

5. A computer readable medium having instructions which, when executed by a processing system, cause the system to: disambiguate fields of a source logical structure or a target logical structure each field having a meaning based on data in the logical structure.

6. The medium of claim 5 wherein the executed instructions further cause the system to create a map from one logical structure to the other logical structure using meta-data.

7. The medium of claim 6 wherein the executed instructions further cause the system to define meta-data including virtual groups for the source and target logical structures; and define mapping rules to map data from a virtual group in one of the logical structures to a virtual group in the other logical structure.

8. The medium of claim 7 wherein the executed instructions further cause the system to automatically generate a transform from the mapping rules and the meta-data, including the virtual groups.

9. An apparatus comprising: means for disambiguating fields of a source logical structure; and means for disambiguating fields of a target logical structure; wherein each field has a meaning based on data in the logical structure.

10. The apparatus of claim 9 wherein each of said means for disambiguating further comprises means for creating a map from one logical structure to the other logical structure using meta-data.

11. The apparatus of claim 10 further comprising means for defining meta-data including virtual groups for the source and target logical structures; and means for defining mapping rules to map data from a virtual group in one of the logical structures to a virtual group in the other logical structure.

12. The apparatus of claim 11 further comprising means for automatically generating a transform from the mapping rules and the meta-data, including the virtual groups.