WO2012028978A1 - Method and system for converting digital documents - Google Patents

Method and system for converting digital documents Download PDF

Info

Publication number
WO2012028978A1
WO2012028978A1 PCT/IB2011/053498 IB2011053498W WO2012028978A1 WO 2012028978 A1 WO2012028978 A1 WO 2012028978A1 IB 2011053498 W IB2011053498 W IB 2011053498W WO 2012028978 A1 WO2012028978 A1 WO 2012028978A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
document
file
procedure
document type
Prior art date
Application number
PCT/IB2011/053498
Other languages
French (fr)
Inventor
Enrico Basso
Original Assignee
B + B Holding S.R.L.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by B + B Holding S.R.L. filed Critical B + B Holding S.R.L.
Publication of WO2012028978A1 publication Critical patent/WO2012028978A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/84Mapping; Conversion

Definitions

  • the invention relates to a method and system for converting digital documents.
  • the main object of the invention is to provide a new electronic method and system for converting documents into digital format.
  • the first procedure which by having at its disposal formatting information to locate data such a string or numeric fields in the document body, extracts the located data from the document body.
  • a characteristic of the invention is to be able to extract the data from the file to be converted by means of a user-specific mapping procedure and map file. This enables him and the processing system to disregard the standard format or formatting, and/or any format and formatting used internally solely by the user.
  • formatting of files is taken to mean the standardisation of information in a specific format and its arrangement, as generally understood.
  • format is taken to mean here the convention used to read, write and interpret or represent the contents of a file.
  • the format of a specific file is, for example, generally indicated by its extension. Examples of formats are a flat file type ASCII or an EXCEL file, a representation of a class in XML format and so on.
  • indexing has a broad meaning and includes all the logical and/or algorithm systems for finding the specific procedures to execute for a user's file using indexing fields. For example, a procedure may be searched for from those of a database individually memorised in a Hard Disk, ROM or permanent support such as a Compact Disk or DVD. Or the procedure may be identified and selected (in systems with Windows® OS) in a dedicated DLL file which contains a series thereof, in an .EXE or .COM file. The procedure is selected (and then executed) by indexing from a program for example by means of IF-THEN, CASE, or SWITCH instruction sequences.
  • the method can be executed in a data processing device such as a dedicated program in a microprocessor, or by a processing device such as a normal computer or server loaded with specific software.
  • the data extracted from the file to be converted can be used in all known processing methods. In particular it can be transmitted for example to a Second User. To do this, and also to facilitate the scalability of the system and reduce the dimensions of user archives, the method may envisage that
  • the intermediate file contains the data which the sender user wishes to send, in a unique format, easy for the system to manage and regardless of the format or formatting used in the file to be converted and in the incoming file. Since the transmissions contain the data to be transferred and not how it is presented or ordered, it is clear that the intermediate file can be processed in very many ways while continuing to contain the nucleus of information to be transferred.
  • the intermediate file may be used as a matrix or data source to generate another file or document with specific formatting and/or format, standard or otherwise. This other document may be received by a recipient user in his preferred format and formatting, regardless of those of the original document.
  • the method comprises
  • the step of identifying a Format of the document body is optional because the system may be single-format but with many Document Types.
  • the second method is conceptually analogous to the first, that is to say used to map data between files and/or areas of memory.
  • the destination file is the file to send to the Second User, while in one case the data is taken from the intermediate file, in another case it is taken directly from the set of data extracted from the file to be converted.
  • the result is always that the Second User receives the data formatted as he prefers.
  • the file to be converted is received by a first remote user and the converted file is sent to a second remote user.
  • the system may operate as a central node and serve remote users, e.g. via a known data network. Each user may therefore send messages to the others using his own standards without bothering about those of the others;
  • the first and/or second procedure is transferred and/or stored in the relative database by the users with whom the procedures are associated. This permits maximum customisation of the system, and independence of the format or formatting of each system user;
  • an interface is arranged with which a user can build a mapping to map the formatting of his own document body, such mapping specifying the position and/or coordinates of, and/or a pointer to, one or more data contained in the document body, and/or sending to said database a map file containing the user-generated mapping so that it can be used by or with said first and/or second procedure.
  • the interface also allows mapping of a user's document body by associating with that document body a map file which is a predefined template and is contained in a database of templates. The use of templates simplifies and speeds up the insertion operation of the mapping;
  • the file received or converted is received or sent by transfer means selected from the many available by a field present in the file.
  • Each user can thus select, directly for example in the message, which transmission means to use for routing/sending and delivering the message; - use of the conversion service is granted after validating the user's identity.
  • - use of the conversion service is granted after validating the user's identity.
  • the file to be converted and/or the converted file and/or the intermediate file is saved in a database which can be consulted remotely by a user. This makes it possible to trace messages and recover their content.
  • a method according to the invention is carried out in/by a data processing device for converting a digital document into another digital document differently formatted and eventually using another format and comprises
  • the method is advantageously realised by a data processing system adapted to convert a digital document into another digital document differently formatted and eventually using another format, comprising
  • - means (such as a motherboard which reads a Hard Disk) for accessing a file comprising a document body to be converted and fields identifying at least a First User, a Second User and a Document Type;
  • - a database of data-mapping procedures specific to each User and univocally associated with a Document Type of the User, - means (such as a programmed microprocessor) for identifying or selecting or searching in a database, by indexing with at least the First User and Document Type fields, a procedure associated with the Document Type of the First User and executing it, the first procedure having at its disposal formatting information to locate in the document body data such as string or numeric fields, and being able to extract from the document body the located data;
  • - means for identifying or selecting or searching in the database, by indexing with at least the Second User and Document Type fields, a second procedure associated with the Document Type of the Second User and executing it, the second procedure being able to generate a document body and, having at its disposal formatting information associated with the Document Type of the Second User, being able to insert said extracted data into the generated document body building it with the formatting associated with the Document Type of the Second User.
  • the procedures may be stored in different databases or in one single database.
  • the databases may be accomplished for example by using archives of procedures on a Hard Disk, or integrated in a program which selects/finds an internal procedure each time from a list or set, for example by means of conditional jumps.
  • the system comprises means (such a program) for generating an intermediate file containing said localised and extracted data, with the advantages already listed for the method.
  • system comprises:
  • mapping procedures for enabling the users to transfer and/or memorise mapping procedures in the relative database, such as a computer or remote terminal connected to the network.
  • a computer or remote terminal connected to the network.
  • the interface may for example be software (a GUI program) or a dedicated terminal provided with a keyboard and video;
  • the interface may have means for mapping a document body of a user by associating with that document body a map file which is a predefined template and is contained in a database of templates;
  • mapping procedure templates associated with standard documents.
  • Fig. 1 shows a block diagram of a system according to the invention
  • Fig. 2 shows a file structure and the logic conversion path for the system in Fig l.
  • a digital message 10 is incoming from a remote sender user Ul (such as a computer), and a digital message 20 is outgoing towards another remote recipient user U2 (such as another computer).
  • the message 10 (see fig. 2) is composed of one or more files (such as an ASCII file and a PDF file), which form a document body Bl, and an optional header HI comprising sender information Ul, recipient information U2 and other additional properties such as Document type or others defined by the user Ul.
  • the header data H may be included directly in the document body B 1 : in this case no further data or files external to the body Bl or header HI would be necessary.
  • the message 20 may have the same logical structure as the message 10.
  • the user U2 may specify if he wants a header H2 and the data fields it contains. If the data is all inside the document body received by the user U2, very often a header H2 is not needed.
  • the message 10 may be compressed, by means of software present on the remote computer, to make transport lighter, preferably also encrypted.
  • the document types in transit in the system 100 may be industrial or commercial documents such as delivery notes, invoices, orders and accessory files such as PDF, images, or diagrams which accompany the document but which are not needed for its correct interpretation.
  • the message 10 has a formatting and eventually a format different from that requested by the recipient user U2, and that therefore the message 20 has been converted by the system 100 to the format used by the user U2.
  • the diversity of formatting in the messages 10, 20 is therefore what mainly counts.
  • connector modules generally indicated by reference numeral 30. These are executables or web pages or physical means which enable the user to insert or receive files. Among the various connectors, there may be
  • the connectors 30 are configured to monitor several addresses, selecting only the e-mails coming from a given address to avoid spam; and may therefore also host an SMTP server and create dedicated user mailboxes;
  • FTP services which may be scalable by installing them on separate Servers monitoring specific FTP addresses; reading and writing on remote FTP Servers and /or hosting a local FTP Server;
  • the main task of the incoming connector modules 30 is to receive a message 10 and save it inside an incoming database 40, where for example it may be managed by a user Ul via a web portal 32.
  • the user Ul may for example download a message sent to him, view the list of messages sent or not yet sent, receive or delete some of them.
  • the main task of the outgoing connector modules 30 is to pick up a message 20 converted by an outgoing database 60 and send it to the user U2, for example by e-mail, using web services, fax, SMS or FTP.
  • Each connector module 30 is preferably a service in itself to which the databases 40, 60 are visible, this way each service may be delocalised onto various dedicated machines. For example using the web portal 32 the user U2 my automatically download a message 20 addressed to him , view the list of messages to be sent and not yet sent, delete some or send them or download them again (without having to newly generate them).
  • the heart of the system 100 is a logical architecture 110 which enables it to convert the formatting, and eventually the formats too, of the messages 10, 20 efficiently and with maximum scalability.
  • a Server is preferably used on which appropriate software runs.
  • modules and procedures will be generally understood to mean software or programs or routines, preferably specialised in one function.
  • Such software processes or programs run on one or more microprocessors and associated hardware devices, such as RAM or ROM memories (known and not shown), of the Server 110.
  • a control module 70 scans the database 40 checking for the presence of new messages 10 received. When it finds one it activates a selector module 80 associated with a sender or recipient, indicated in the message header 10, to perform the eventual formatting and if necessary format, translation.
  • the task of the module 70 is to obtain from the module 30 the list of messages completely received, and create a selector module 80 to process each message.
  • the selector 80 receives as a command string the univocal identification of a message received and analyses it to see if the associated files need to be translated or not.
  • the module 70 checks that the modules 80 (a different one for each document format regardless of the user, e.g. EXCEL, PDF, TXT etc.) do not take too long to execute their task.
  • the selector module 80 has the task of identifying the formatting to be used
  • the module 80 does not need to know which recipient U2 receives the document, since it is assigned only to translate it into a target formatting which may, for example, be an intermediate format, see below, or that of the user U2.
  • the module 80 then activates a formatting (and format) conversion procedure to which it sends the information and document body to be converted.
  • the Recipient and Document Type fields are read from the body B 1 or from the header HI and then passed to the selector module 80 and the conversion procedure univocally associated with it.
  • the data relative to the user U2 preferences Document Type and Format and Means of transport
  • the sender Ul does not know how the recipient wishes to receive.
  • a message 20 to be sent will be picked up from the database 60 by a transmission /reception module 30 which periodically scans the messages in the database 60, and sent to the correct user U2.
  • the conversion service or procedure For each file contained in the received document body Bl, the conversion service or procedure extracts, for example from the header HI, the following:
  • the Document Type received Tl (e.g. an "invoice” or “delivery note” or “order” document type).
  • the format Fl may be an independent field or be a datum associated with the Document Type Tl.
  • the mapping procedure found in the database 90 comprises a map file or database which contains the information /instructions/coordinates which localise and identify inside a file of the document body Bl the data, its formatting and position, and is or may be used to activate or be used for a data extraction/injection function from/into a file
  • the map file relative for example to an EXCEL or TXT file contains for example the line-column co-ordinates.
  • mapping procedure can extract the data from the received file (the document body Bl) which must be involved in the formatting (and format) translation.
  • the map file relative to the recipient U2 the data extracted from the received file can be inserted in the converted file.
  • the UDX file is either processed in real time or may be sent for example to a remote server closer to the user U2.
  • a second mapping procedure MP2 with associated file map is recovered from a database, such as the same database 90, for the user U2 relative to the same Document Type sent Tl by the user Ul.
  • MP2 g (U2, Tl), with g(.) being a general search function in a database of mapping procedures and associated map files, relative to the users.
  • the file format desired by the user U2 for the type Tl may for example be information contained inside the (univocal) mapping procedure for that type Tl.
  • transferring the UDX file makes it possible to limit the size of the database 90: on each single machine only the served users may be memorised.
  • the function f (.) will search (in the example) only from the sender users Ul while the function g (.) from the recipient users U2.
  • the two users Ul, U2 will insert their own two mapping procedures and relative file maps of an invoice, respectively MPl and MP2, in the system 100.
  • Ul uses PDF format and U2 uses the EXCEL format.
  • the user using PDF is the sender.
  • the system 100 receives the PDF, extracts the sensitive data thanks to the mapping procedure MPl and builds an EXCEL file inserting the data extracted in the positions indicated by the mapping procedure MP2.
  • EXCEL For the user receiving the file in EXCEL it is as if the sender had written it so in a native manner, and in the format and with the formatting of the recipient.
  • the map files associated with a mapping procedure may be either (i) an integral part of the procedure such as DDL files, or separate and univocally selected databases or databases incorporated in the procedure body but still constructed as collections of compact data, or (ii) codified within the program itself composing the procedure, for example codified in the instructions to be run.
  • the system according to the invention has the great advantage of being indifferent to the Document Type of the user.
  • a user may create as many types of documents as he wishes.
  • the definition "Invoice Document Type” for example becomes identificative of a specific class of documents, which is used to recognise the Document Types to be converted.
  • Each member of a Document Type class such as the "Invoice Document” class, is characterised for example by a conversion/mapping procedure having one map file, one format field and one field to define what transport means to use to transfer the document univocally associated with it.
  • the user U2 could define an Invoice Document Type 1, an Invoice Document Type 2, an Invoice Document Type 3 etc. all belonging to the "Invoice Document Type" class.
  • the user U2 may therefore receive in the three Document Types the same invoice sent by Ul with his Document Type.
  • the system, or user needs only to memorise in the database 90 three different mapping procedures (and the map files) for the three different Document Types of the recipient U2.
  • the logical system 110 For each incoming document from a user sender to be sent to a recipient, the logical system 110:
  • mapping procedures indexed by fields contained in the message in transit is worthwhile because it enables the system 110 to easily add or remove users, by simply updating the relative databases.
  • the procedures may, for example, be indexed within a single management program by means of conditional instructions such as IF-THEN or SWITCH.
  • the connector modules are generic (e-mail, FTP, web service etc) and deal only with receiving and sending the files, there is not one for each user.
  • the selector modules are also generic and deal only with recognising and reading/writing the document format (PDF; EXEL; TXT etc). Only the translator modules are characterised by the user mapping present in the database.
  • Plug-in applications can be created and installed in the system 110, dedicated to specific formatting translations and inserted in place of the translations automatisms. For example in the case in which the documents to be translated are too complex or where there are calculations with particular algorithms or logics, an application may be created which is installed as plug-in and which will translate the documents from one Type and Format to another. In this case the selector module recognises that it must execute a plug-in and instead of starting the translation starts the plug-in which performs the translation, following a particular calculation logic.
  • the UDX need not necessarily be created. It may be merely a decomposition in memory of the document body Bl needed for the time necessary to retrieve the mapping procedure or routine which recomposes the converted file.
  • the message 10 contains in the header HI at least the information of the user Ul, with whom a Document Type is associated.
  • These fields may also be transformed in other forms for example with a document identification TAG which must be somehow specified, for example - in the subject of the message 10 if arriving by e-mail;
  • the sender user field Ul may also be identified for example by his user name and password in the case of transport via web services, by his e-mail address in the case of transport via e-mail, by the folder name in the case of transport via FTP, or with the fax number or ID of the sender fax.
  • the system 110 may have conditional access. For example each user is recognised by the system 110 (also) by means of the sender/recipient field. Only if he provides valid access credentials, such as a valid login and password account details received when subscribing the service, may he operate in the system sending or receiving messages.

Abstract

A method is described to be carried out in/by a data processing device (110) for converting a digital document (10) in another digital document (20) differently formatted and eventually using another format, comprising - accessing a file to be converted (10) comprising a document body (Bl) to be converted, - reading fields (U1, U2, T1) identifying at least a First User and a Document Type; - by indexing at least with the First User and Document Type fields, identifying or selecting or searching a first procedure associated with the Document Type of the First User, - executing the first procedure, which, by having at its disposal formatting information to locate in the document body data such as string or numeric fields, extracts from the body document the located data.

Description

METHOD AND SYSTEM FOR CONVERTING DIGITAL DOCUMENTS oo oo oo
The invention relates to a method and system for converting digital documents.
Conversion systems of digital documents are known, such as US6397232. In these systems there is the problem that digital documents of different formats cannot be converted in those cases, for example, where the format of the file to be converted is not known a priori, or when the file extension, denoting the format, is incorrect, non-existent or invented by a user, that is to say non-standard.
The user cannot avail of personalised formats because the available format translation systems only work with standard formats. If a user wishes to convert a file written in his own format into a file of a different format, he must forcedly invent the converter. And sometimes this is not possible because the destination format is owned by another person, who does not wish to divulge confidential information.
The main object of the invention is to provide a new electronic method and system for converting documents into digital format.
Other objects are to resolve all or some of the aforesaid problems so that a user
- does not have to use an imposed communication standard or format to send or to receive;
- can use his own format or even invent one in transmission and reception without bothering about those used by the other users with whom he is exchanging documents;
- can use a transport/receipt method at will to deliver/receive documents without bothering about which transport/reception system is used by the recipient.
Such objects are achieved by a method defined in claim 1, that is to say carried out in/by a data processing device for converting a digital document into another digital document differently formatted and eventually using another format, comprising
- accessing a file to be converted comprising a document body to be converted,
- reading fields identifying at least a First User and a Document Type;
- identifying or selecting or searching a first procedure associated with the Document Type of the First User by indexing at least with the First User and Document Type fields,
- executing the first procedure, which by having at its disposal formatting information to locate data such a string or numeric fields in the document body, extracts the located data from the document body.
A characteristic of the invention is to be able to extract the data from the file to be converted by means of a user-specific mapping procedure and map file. This enables him and the processing system to disregard the standard format or formatting, and/or any format and formatting used internally solely by the user.
In this text the word "formatting" of files is taken to mean the standardisation of information in a specific format and its arrangement, as generally understood. The word "format" is taken to mean here the convention used to read, write and interpret or represent the contents of a file. The format of a specific file is, for example, generally indicated by its extension. Examples of formats are a flat file type ASCII or an EXCEL file, a representation of a class in XML format and so on.
In this text "indexing" has a broad meaning and includes all the logical and/or algorithm systems for finding the specific procedures to execute for a user's file using indexing fields. For example, a procedure may be searched for from those of a database individually memorised in a Hard Disk, ROM or permanent support such as a Compact Disk or DVD. Or the procedure may be identified and selected (in systems with Windows® OS) in a dedicated DLL file which contains a series thereof, in an .EXE or .COM file. The procedure is selected (and then executed) by indexing from a program for example by means of IF-THEN, CASE, or SWITCH instruction sequences.
The method can be executed in a data processing device such as a dedicated program in a microprocessor, or by a processing device such as a normal computer or server loaded with specific software.
The data extracted from the file to be converted can be used in all known processing methods. In particular it can be transmitted for example to a Second User. To do this, and also to facilitate the scalability of the system and reduce the dimensions of user archives, the method may envisage that
- a field that identifies a Second User is read from the file to be converted,
- the localized data is extracted and grouped into a set, - an intermediate file is generated with the data contained in the set;
- at least the Second User and Document Type field is associated with the intermediate file, and
- the intermediate file is transferred to another user system.
The intermediate file contains the data which the sender user wishes to send, in a unique format, easy for the system to manage and regardless of the format or formatting used in the file to be converted and in the incoming file. Since the transmissions contain the data to be transferred and not how it is presented or ordered, it is clear that the intermediate file can be processed in very many ways while continuing to contain the nucleus of information to be transferred. In particular the intermediate file may be used as a matrix or data source to generate another file or document with specific formatting and/or format, standard or otherwise. This other document may be received by a recipient user in his preferred format and formatting, regardless of those of the original document. To such purpose the method comprises
- optionally identifying a Format of the document body by reading a field relative to the file to be converted or by deducing it from the Document Type field,
- identifying or selecting or searching a second procedure associated with the Document Type of a Second User document by indexing at least with the Second
User and Document Type fields extracted from the file to be converted or from the intermediate file,
- executing the second procedure, which
has at its disposal the Format information of the Second User,
is capable of generating a document body in the relative format, and has formatting information associated with the Document Type of the Second User at its disposal, and
is able to insert the data extracted from the file to be converted or from the intermediate file into the generated document, building it with the formatting associated with the Document Type of the Second User.
The step of identifying a Format of the document body is optional because the system may be single-format but with many Document Types.
The second method is conceptually analogous to the first, that is to say used to map data between files and/or areas of memory. The destination file is the file to send to the Second User, while in one case the data is taken from the intermediate file, in another case it is taken directly from the set of data extracted from the file to be converted. The result is always that the Second User receives the data formatted as he prefers.
Other variations of the method are contained in the dependent claims, for example
- the first and/or second procedure are searched for in a procedure database by indexing. This makes it possible to place users in order with their respective desired formatting, entire user databases to be transferred between system modules, and efficiently manage additions, deletions or modifications by users (if the databases are accessible to them) or of users by system administrators;
- the file to be converted is received by a first remote user and the converted file is sent to a second remote user. The system may operate as a central node and serve remote users, e.g. via a known data network. Each user may therefore send messages to the others using his own standards without bothering about those of the others;
- the first and/or second procedure is transferred and/or stored in the relative database by the users with whom the procedures are associated. This permits maximum customisation of the system, and independence of the format or formatting of each system user;
- to make it easier for users to insert the formatting mapping procedure, an interface is arranged with which a user can build a mapping to map the formatting of his own document body, such mapping specifying the position and/or coordinates of, and/or a pointer to, one or more data contained in the document body, and/or sending to said database a map file containing the user-generated mapping so that it can be used by or with said first and/or second procedure. The interface also allows mapping of a user's document body by associating with that document body a map file which is a predefined template and is contained in a database of templates. The use of templates simplifies and speeds up the insertion operation of the mapping;
- the file received or converted is received or sent by transfer means selected from the many available by a field present in the file. Each user can thus select, directly for example in the message, which transmission means to use for routing/sending and delivering the message; - use of the conversion service is granted after validating the user's identity. As well as ensuring the security of transmissions it is thereby possible to verify and monitor system access, eventually offering the same against payment or as a resource limited by access criteria;
- the file to be converted and/or the converted file and/or the intermediate file is saved in a database which can be consulted remotely by a user. This makes it possible to trace messages and recover their content.
For a typically multi-user system, a method according to the invention is carried out in/by a data processing device for converting a digital document into another digital document differently formatted and eventually using another format and comprises
- accessing a file comprising a document body to be converted and fields identifying at least a First User, a Second User and a Document Type;
- identifying or selecting or searching in a database of procedures, by indexing with at least the First User and Document Type fields, a first procedure associated with the Document Type of the First User and executing it, the first procedure having at its disposal formatting information to locate in the document body data such as string or numeric fields, and being able to extract the located data from the document body;
- identifying or selecting or searching in a database of procedures, by indexing with at least the Second User and Document Type fields, a second procedure associated with the Document Type of the Second User and executing it, the second procedure being able to generate a document body and, having at its disposal formatting information associated with the Document Type of the Second User, being able to insert said extracted data into the generated document building it with the formatting associated with the Document Type of the Second User.
The method is advantageously realised by a data processing system adapted to convert a digital document into another digital document differently formatted and eventually using another format, comprising
- means (such as a motherboard which reads a Hard Disk) for accessing a file comprising a document body to be converted and fields identifying at least a First User, a Second User and a Document Type;
- a database of data-mapping procedures specific to each User and univocally associated with a Document Type of the User, - means (such as a programmed microprocessor) for identifying or selecting or searching in a database, by indexing with at least the First User and Document Type fields, a procedure associated with the Document Type of the First User and executing it, the first procedure having at its disposal formatting information to locate in the document body data such as string or numeric fields, and being able to extract from the document body the located data;
- means for identifying or selecting or searching in the database, by indexing with at least the Second User and Document Type fields, a second procedure associated with the Document Type of the Second User and executing it, the second procedure being able to generate a document body and, having at its disposal formatting information associated with the Document Type of the Second User, being able to insert said extracted data into the generated document body building it with the formatting associated with the Document Type of the Second User.
The procedures may be stored in different databases or in one single database.
The databases may be accomplished for example by using archives of procedures on a Hard Disk, or integrated in a program which selects/finds an internal procedure each time from a list or set, for example by means of conditional jumps.
Advantageously the system comprises means (such a program) for generating an intermediate file containing said localised and extracted data, with the advantages already listed for the method.
In preferred variations the system comprises:
- means (for enabling the users) to transfer and/or memorise mapping procedures in the relative database, such as a computer or remote terminal connected to the network. In particular there may be an interface with which a user may build a mapping to map the formatting of his own document body, the mapping specifying the position and/or coordinates of, and/or a pointer to, one or more data contained in the document body. The interface may for example be software (a GUI program) or a dedicated terminal provided with a keyboard and video;
- the interface may have means for mapping a document body of a user by associating with that document body a map file which is a predefined template and is contained in a database of templates;
- means for sending a map file containing the generated mapping to said database, by means of which the user can load the procedure onto the system;
- a database of predefined mapping procedure templates associated with standard documents.
Other advantages of the invention will be clearer from the following description of a preferred embodiment, together with the appended drawing wherein
Fig. 1 shows a block diagram of a system according to the invention;
Fig. 2 shows a file structure and the logic conversion path for the system in Fig l.
See Fig. 1 for the block representation and logical form of a system 100 according to the invention. A digital message 10 is incoming from a remote sender user Ul (such as a computer), and a digital message 20 is outgoing towards another remote recipient user U2 (such as another computer). The message 10 (see fig. 2) is composed of one or more files (such as an ASCII file and a PDF file), which form a document body Bl, and an optional header HI comprising sender information Ul, recipient information U2 and other additional properties such as Document type or others defined by the user Ul. The header data H may be included directly in the document body B 1 : in this case no further data or files external to the body Bl or header HI would be necessary. The message 20 may have the same logical structure as the message 10. The user U2 may specify if he wants a header H2 and the data fields it contains. If the data is all inside the document body received by the user U2, very often a header H2 is not needed.
Optionally, the message 10 may be compressed, by means of software present on the remote computer, to make transport lighter, preferably also encrypted.
The document types in transit in the system 100 may be industrial or commercial documents such as delivery notes, invoices, orders and accessory files such as PDF, images, or diagrams which accompany the document but which are not needed for its correct interpretation.
It is assumed here that the message 10 has a formatting and eventually a format different from that requested by the recipient user U2, and that therefore the message 20 has been converted by the system 100 to the format used by the user U2. The diversity of formatting in the messages 10, 20 is therefore what mainly counts.
The messages reach and leave the system 100 by means of connector modules, generally indicated by reference numeral 30. These are executables or web pages or physical means which enable the user to insert or receive files. Among the various connectors, there may be
- e-mail services, scalable by installing them on separate Servers monitoring specific e-mail addresses. The connectors 30 are configured to monitor several addresses, selecting only the e-mails coming from a given address to avoid spam; and may therefore also host an SMTP server and create dedicated user mailboxes;
- FTP services, which may be scalable by installing them on separate Servers monitoring specific FTP addresses; reading and writing on remote FTP Servers and /or hosting a local FTP Server;
- Fax;
- SMS, service which may be scalable by installing it on separate Servers monitoring specific telephone numbers.
The main task of the incoming connector modules 30 is to receive a message 10 and save it inside an incoming database 40, where for example it may be managed by a user Ul via a web portal 32. The user Ul may for example download a message sent to him, view the list of messages sent or not yet sent, receive or delete some of them.
In the same way the main task of the outgoing connector modules 30 is to pick up a message 20 converted by an outgoing database 60 and send it to the user U2, for example by e-mail, using web services, fax, SMS or FTP.
Each connector module 30 is preferably a service in itself to which the databases 40, 60 are visible, this way each service may be delocalised onto various dedicated machines. For example using the web portal 32 the user U2 my automatically download a message 20 addressed to him , view the list of messages to be sent and not yet sent, delete some or send them or download them again ( without having to newly generate them).
The heart of the system 100 is a logical architecture 110 which enables it to convert the formatting, and eventually the formats too, of the messages 10, 20 efficiently and with maximum scalability. As implementation a Server is preferably used on which appropriate software runs.
Hereinafter modules and procedures will be generally understood to mean software or programs or routines, preferably specialised in one function. Such software processes or programs run on one or more microprocessors and associated hardware devices, such as RAM or ROM memories (known and not shown), of the Server 110.
A control module 70 scans the database 40 checking for the presence of new messages 10 received. When it finds one it activates a selector module 80 associated with a sender or recipient, indicated in the message header 10, to perform the eventual formatting and if necessary format, translation.
In particular the task of the module 70 is to obtain from the module 30 the list of messages completely received, and create a selector module 80 to process each message. The selector 80 receives as a command string the univocal identification of a message received and analyses it to see if the associated files need to be translated or not.
In addition the module 70 checks that the modules 80 (a different one for each document format regardless of the user, e.g. EXCEL, PDF, TXT etc.) do not take too long to execute their task.
The selector module 80 has the task of identifying the formatting to be used
(and the digital format). The module 80 does not need to know which recipient U2 receives the document, since it is assigned only to translate it into a target formatting which may, for example, be an intermediate format, see below, or that of the user U2. The module 80 then activates a formatting (and format) conversion procedure to which it sends the information and document body to be converted. The Recipient and Document Type fields are read from the body B 1 or from the header HI and then passed to the selector module 80 and the conversion procedure univocally associated with it. The data relative to the user U2 preferences (Document Type and Format and Means of transport) are searched and found in a database. The sender Ul does not know how the recipient wishes to receive. He might know implicitly when sending by fax or SMS, because he must insert the telephone number. The converted message is then saved in the database 60. A message 20 to be sent will be picked up from the database 60 by a transmission /reception module 30 which periodically scans the messages in the database 60, and sent to the correct user U2.
Having described the general architecture of the system 100 and of the server 110, we will now move on to expound the specific logical-hardware structure enabling not just the translation of the formatting (and format) but also the ease for each user Ul, U2 to send and receive documents in their own format, with their own formatting and means of communication without bothering about the format, formatting and means of communication used by the other person. See fig. 2 also.
For each file contained in the received document body Bl, the conversion service or procedure extracts, for example from the header HI, the following:
- a recipient field U2;
- a sender field Ul, and
- the format Fl of the file used in the document Bl of the sender Ul (e.g. PDF or TXT), and
- the Document Type received Tl (e.g. an "invoice" or "delivery note" or "order" document type).
The format Fl may be an independent field or be a datum associated with the Document Type Tl.
It then searches a mapping procedure in a database or table 90 indexing the search with
- the sender field Ul, and
- the Document Type received Tl,
- and eventually also the format Fl.
In fact a univocal association with a format Tl of the user Ul may be sufficient, case in which the Document Type also defines the format.
The mapping procedure found in the database 90 comprises a map file or database which contains the information /instructions/coordinates which localise and identify inside a file of the document body Bl the data, its formatting and position, and is or may be used to activate or be used for a data extraction/injection function from/into a file The map file relative for example to an EXCEL or TXT file contains for example the line-column co-ordinates.
By means of the map file relative to the sender Ul the mapping procedure can extract the data from the received file (the document body Bl) which must be involved in the formatting (and format) translation. By means of the map file relative to the recipient U2 the data extracted from the received file can be inserted in the converted file.
By calling the mapping function or procedure with the file map associated MP1, we can write MP1 = f (Ul, Tl) or MP1 = f (Ul, Fl, Tl), with f(.) being a general search function in a database of mapping procedures and associated map files, relative to the users.
By then executing the mapping procedure MPl an intermediate UDX file or metafile is generated containing all the data which the user Ul wishes to transmit to the user U2, contained in the document body Bl. That is, UDX = MPl (Bl). That is to say the mapping procedure MPl is applied to the document body Bl (by launching it or passing to it, for example a handle to Bl).
The UDX file is either processed in real time or may be sent for example to a remote server closer to the user U2. In any case a second mapping procedure MP2 with associated file map is recovered from a database, such as the same database 90, for the user U2 relative to the same Document Type sent Tl by the user Ul. In expression terms, one has MP2 = g (U2, Tl), with g(.) being a general search function in a database of mapping procedures and associated map files, relative to the users.
The file format desired by the user U2 for the type Tl may for example be information contained inside the (univocal) mapping procedure for that type Tl.
Note that transferring the UDX file makes it possible to limit the size of the database 90: on each single machine only the served users may be memorised. In other words, the function f (.) will search (in the example) only from the sender users Ul while the function g (.) from the recipient users U2.
The procedure with MP2 file map is used to generate a converted document body B2 for the user U2, simply executing B2=MP2 (UDX). That is to say that the mapping procedure sequentially reads the data in the UDX file and formats it in a file according to the instructions/information read in the file MP2.
For example the two users Ul, U2 will insert their own two mapping procedures and relative file maps of an invoice, respectively MPl and MP2, in the system 100. Ul uses PDF format and U2 uses the EXCEL format. Let us imagine that the user using PDF is the sender. The system 100 receives the PDF, extracts the sensitive data thanks to the mapping procedure MPl and builds an EXCEL file inserting the data extracted in the positions indicated by the mapping procedure MP2. For the user receiving the file in EXCEL it is as if the sender had written it so in a native manner, and in the format and with the formatting of the recipient.
The map files associated with a mapping procedure may be either (i) an integral part of the procedure such as DDL files, or separate and univocally selected databases or databases incorporated in the procedure body but still constructed as collections of compact data, or (ii) codified within the program itself composing the procedure, for example codified in the instructions to be run.
The system according to the invention has the great advantage of being indifferent to the Document Type of the user. A user may create as many types of documents as he wishes. For each user the definition "Invoice Document Type" for example becomes identificative of a specific class of documents, which is used to recognise the Document Types to be converted. There is not necessarily only one Document Type for each user, even though usually one is enough.
Each member of a Document Type class, such as the "Invoice Document" class, is characterised for example by a conversion/mapping procedure having one map file, one format field and one field to define what transport means to use to transfer the document univocally associated with it.
For example, the user U2 could define an Invoice Document Type 1, an Invoice Document Type 2, an Invoice Document Type 3 etc. all belonging to the "Invoice Document Type" class. The user U2 may therefore receive in the three Document Types the same invoice sent by Ul with his Document Type. The system, or user, needs only to memorise in the database 90 three different mapping procedures (and the map files) for the three different Document Types of the recipient U2.
In brief, for each incoming document from a user sender to be sent to a recipient, the logical system 110:
- determines, from among all the Document Type classes defined for that sender user, the Document Type class to which the document belongs;
- for the identified class it selects a specific procedure or function for the received Document Type which permits, thanks to the mapping, the extraction of the data to be converted defined by the user from the document received;
- selects a corresponding second procedure or function of the same class but this time relative to the recipient user; and
- executes a formatting - and if necessary a format - conversion - by launching the second procedure, which the user recipient's formatting rules for that Document Type are described in. Such rules are used to build a file for the recipient taking the extracted data from the incoming document.
If the fields U2, Ul, Fl and Tl are not in a header HI, but inside the main document body Bl, then only their position inside the document body Bl need be defined by means of the mapping. The system is thus informed that those are the fields which also identify the transmission. In practice the header HI is incorporated inside the body Bl, and the same thing may be done for the file to be transmitted to the user U2.
It should be noted that using mapping procedures indexed by fields contained in the message in transit is worthwhile because it enables the system 110 to easily add or remove users, by simply updating the relative databases. However, the procedures may, for example, be indexed within a single management program by means of conditional instructions such as IF-THEN or SWITCH.
As can be seen the connector modules are generic (e-mail, FTP, web service etc) and deal only with receiving and sending the files, there is not one for each user. The selector modules are also generic and deal only with recognising and reading/writing the document format (PDF; EXEL; TXT etc). Only the translator modules are characterised by the user mapping present in the database.
There are other ways performing the functions of the system 100, for example without using connector modules and/or using other software structures or algorithms.
Plug-in applications can be created and installed in the system 110, dedicated to specific formatting translations and inserted in place of the translations automatisms. For example in the case in which the documents to be translated are too complex or where there are calculations with particular algorithms or logics, an application may be created which is installed as plug-in and which will translate the documents from one Type and Format to another. In this case the selector module recognises that it must execute a plug-in and instead of starting the translation starts the plug-in which performs the translation, following a particular calculation logic.
The UDX need not necessarily be created. It may be merely a decomposition in memory of the document body Bl needed for the time necessary to retrieve the mapping procedure or routine which recomposes the converted file.
It was said that the message 10 contains in the header HI at least the information of the user Ul, with whom a Document Type is associated. These fields may also be transformed in other forms for example with a document identification TAG which must be somehow specified, for example - in the subject of the message 10 if arriving by e-mail;
- in the body B 1 ;
- with the name of the file /files attached or contained in the body Bl, which gives rise to an association between the file name and Document Type, with which a certain format corresponds;
- in one or more fields of those contained in the file to be sent.
The sender user field Ul may also be identified for example by his user name and password in the case of transport via web services, by his e-mail address in the case of transport via e-mail, by the folder name in the case of transport via FTP, or with the fax number or ID of the sender fax.
The system 110 may have conditional access. For example each user is recognised by the system 110 (also) by means of the sender/recipient field. Only if he provides valid access credentials, such as a valid login and password account details received when subscribing the service, may he operate in the system sending or receiving messages.

Claims

1. Method carried out in/by a data processing device (110) for converting a digital document (10) in another digital document (20) differently formatted and eventually using another format, comprising
- accessing a file to be converted (10) comprising a document body (Bl) to be converted,
- reading fields (Ul, U2, Tl) identifying at least a First User and a Document Type;
- by indexing at least with the First User and Document Type fields, identifying or selecting or searching a first procedure associated with the Document Type of the First User,
- executing the first procedure, which, by having at its disposal formatting information to locate in the document body data such as string or numeric fields, extracts from the body document the located data.
2. Method according to claim 1, wherein
- from the file to be converted a field (U2) that identifies a Second User is read,
- the localized data are extracted and grouped into a set,
- with the data contained in the set an intermediate file is generated;
- at least the Second User and Document Type field is associated with the intermediate file, and
- the intermediate file is transferred to another user system.
3. Method according to claim 1 or 2, comprising
- optionally identifying a format (Fl) of the document body by reading a field relative to the file to be converted or by deducing it from the Document Type field,
- identifying or selecting or searching a second procedure associated with the Document Type of a Second User document by indexing at least with the Second User and Document Type fields extracted from the file to be converted or the intermediate file,
- executing the second procedure, which
has at its disposal the Format information of the Second User,
is capable of generating a document in the relative format, and has formatting information associated with the Document Type of the Second User, and
is able to insert the data extracted from the file to be converted or from the intermediate file into the generated document building it with the formatting associated with the Document Type of the Second User.
4. Method according to one of the preceding claims, wherein the first and/or second procedure are searched in a database of procedures (90) by indexing.
5. Method according to claim 4, wherein the first and/or second procedure is transferred and/or stored in its relative database by the same users the procedures are associated to.
6. Method according to one of the claims above, wherein an interface is arranged with which a user can
- build a mapping to map the formatting of his own document body, the mapping specifying the position and/or coordinates of, and/or a pointer to, one or more data contained in the document body, and/or
- send in said database a map file containing the user-generated mapping so that it can be used by or with said first and/or second procedure.
7. Method according to claim 6, wherein the interface allows mapping a document body of a user by associating with that document body a map file which is a predefined template and is contained in a database of templates.
8. Method performed in a data processing device (110) for converting a digital document (10) in another digital document (20) differently formatted and eventually using another format, comprising
- accessing a file to be converted (10) comprising a document body (Bl) to be converted and fields (Ul, U2, Tl) identifying at least a First User, a Second User and a Document Type;
- identifying or selecting or searching in a database (90) of procedures, by indexing with at least the First User and Document Type fields, a first procedure associated with the Document Type of the First User and executing it, the first procedure having at its disposal formatting information to locate in the document body data such as string or numeric fields, and being able to extract from the document body the located data;
- identifying or selecting or searching in a database of procedures (90), by indexing with at least the Second User and Document Type fields, a second procedure associated with the Document Type of the Second User and executing it, the second procedure being able to generate a document body and, having at its disposal formatting information associated with the Document Type of the Second User, being able to insert said extracted data into the generated document building it with the formatting associated to the Document Type of the Second User.
9. Data processing system (110) adapted to convert a digital document (10) in another digital document (20) differently formatted and eventually using another format, comprising
- means to access a file (10) comprising a document body (Bl) to be converted and fields (Ul, U2, Tl) identifying at least a First User, a Second User and a Document Type;
- a database (90) of data-mapping procedures specific for each User and univocally associated with a Document Type of the User,
- means for identifying or selecting or searching in the database (90), by indexing with at least the First User and Document Type fields, a first procedure associated with the Document Type of the First User and for executing it, the first procedure having at its disposal formatting information to locate in the document body data such as string or numeric fields, and being able to extract from the document body the located data;
- means for searching or identifying or selecting in a database of procedures
(90), by indexing with at least the Second User and Document Type fields, and for executing a second procedure associated with the Document Type of the Second User, the second procedure being able to generate a document body and, by having at its disposal formatting information associated with the Document Type of the Second User, being able to insert said extracted data into the generated document building it with the formatting associated to the Document Type of the Second User.
10. System according to claim 9, comprising means for generating an intermediate file containing the located and extracted data.
PCT/IB2011/053498 2010-09-03 2011-08-05 Method and system for converting digital documents WO2012028978A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IT000122A ITTV20100122A1 (en) 2010-09-03 2010-09-03 METHOD AND SYSTEM FOR CONVERTING DIGITAL DOCUMENTS
ITTV2010A000122 2010-09-03

Publications (1)

Publication Number Publication Date
WO2012028978A1 true WO2012028978A1 (en) 2012-03-08

Family

ID=43666913

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2011/053498 WO2012028978A1 (en) 2010-09-03 2011-08-05 Method and system for converting digital documents

Country Status (2)

Country Link
IT (1) ITTV20100122A1 (en)
WO (1) WO2012028978A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104321738A (en) * 2012-03-19 2015-01-28 因特伟特公司 Document processing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001040967A2 (en) * 1999-11-29 2001-06-07 Medical Data Services Gmbh Filtering medical information based on document type and requesting user access rights
FR2811782A1 (en) * 2000-07-12 2002-01-18 Jaxo Europ Conversion of documents organized in a tree structure by selective traversal of the structure, uses program invoking templates to convert HTML pages to alternate formats without prior translation to XML
US6397232B1 (en) 2001-02-02 2002-05-28 Acer Inc. Method and system for translating the format of the content of document file
US20030233344A1 (en) * 2002-06-13 2003-12-18 Kuno Harumi A. Apparatus and method for responding to search requests for stored documents
EP1998261A1 (en) * 2007-05-31 2008-12-03 Research In Motion Limited Method and apparatus for processing XML for display on a mobile device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001040967A2 (en) * 1999-11-29 2001-06-07 Medical Data Services Gmbh Filtering medical information based on document type and requesting user access rights
FR2811782A1 (en) * 2000-07-12 2002-01-18 Jaxo Europ Conversion of documents organized in a tree structure by selective traversal of the structure, uses program invoking templates to convert HTML pages to alternate formats without prior translation to XML
US6397232B1 (en) 2001-02-02 2002-05-28 Acer Inc. Method and system for translating the format of the content of document file
US20030233344A1 (en) * 2002-06-13 2003-12-18 Kuno Harumi A. Apparatus and method for responding to search requests for stored documents
EP1998261A1 (en) * 2007-05-31 2008-12-03 Research In Motion Limited Method and apparatus for processing XML for display on a mobile device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104321738A (en) * 2012-03-19 2015-01-28 因特伟特公司 Document processing
US20150039707A1 (en) * 2012-03-19 2015-02-05 Intuit Inc. Document processing
US10528626B2 (en) * 2012-03-19 2020-01-07 Intuit Inc. Document processing

Also Published As

Publication number Publication date
ITTV20100122A1 (en) 2012-03-04

Similar Documents

Publication Publication Date Title
US6466968B2 (en) Information processing system capable of file transmission and information processing apparatus in the system
US7921166B2 (en) Methods and systems for accessing email
US11734771B2 (en) System and method for detecting and mapping data fields for forms in a financial management system
CN111817946B (en) Processing method, device, equipment, medium and system for document sharing
US8867076B2 (en) Installing printer applications on a printer using messages
US8589433B2 (en) Dynamic tagging
JP2005182353A (en) Electronic mail retrieving system, electronic mail retrieving device and electronic mail retrieval control program
EP3552376B1 (en) Card-based information management method and system
US8615560B2 (en) Document data sharing system and user apparatus
US10037370B2 (en) Method, a server, a system and a computer program product for copying data from a source server to a target server
CN104753771A (en) Mail processing method and terminal
US7165093B2 (en) Active electronic messaging system
KR20120093433A (en) Electronic mail server and method for automatically generating address lists
US20170244660A1 (en) Taking Actions On Notifications Using An Incomplete Data Set From A Message
JP3241634B2 (en) Information processing method and information processing apparatus using electronic mail, storage medium storing program for controlling information processing apparatus
JP3485416B2 (en) E-mail transfer method and device
CN102404446B (en) Mobile communication terminal and method for content processing
US7042586B2 (en) Network based system and method for universal printing
US20100228772A1 (en) Method and apparatus for performing an information search and retrieval by submitting an electronic form over e-mail
US6609156B1 (en) Method and apparatus for reducing redundant multiple recipient message handling in a message handling system
WO2012028978A1 (en) Method and system for converting digital documents
US9098560B2 (en) Client message distribution in a distributed directory based on language and character sets
CN102368753A (en) Mail receiving and sending processing method under thin client mode
JP7041675B2 (en) Methods and systems for collecting digital documents from multiple suppliers
JP2010061476A (en) Electronic mail system, mail server, program for mail server, and information sharing method by use of e-mail

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11763989

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11763989

Country of ref document: EP

Kind code of ref document: A1