WO2007040380A1 - Method of generating a printed signature in order to secure the contents of text documents - Google Patents

Method of generating a printed signature in order to secure the contents of text documents Download PDF

Info

Publication number
WO2007040380A1
WO2007040380A1 PCT/MX2005/000089 MX2005000089W WO2007040380A1 WO 2007040380 A1 WO2007040380 A1 WO 2007040380A1 MX 2005000089 W MX2005000089 W MX 2005000089W WO 2007040380 A1 WO2007040380 A1 WO 2007040380A1
Authority
WO
WIPO (PCT)
Prior art keywords
signature
document
word
stage
printed
Prior art date
Application number
PCT/MX2005/000089
Other languages
Spanish (es)
French (fr)
Inventor
Sergio Antonio Fernandez Orozco
Leo Hendrik Reyes Lozano
Original Assignee
Fernandez Orozco Sergio Antoni
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fernandez Orozco Sergio Antoni filed Critical Fernandez Orozco Sergio Antoni
Priority to PCT/MX2005/000089 priority Critical patent/WO2007040380A1/en
Publication of WO2007040380A1 publication Critical patent/WO2007040380A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N1/32101Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/22Character recognition characterised by the type of writing
    • G06V30/226Character recognition characterised by the type of writing of cursive writing
    • G06V30/2264Character recognition characterised by the type of writing of cursive writing using word shape
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N2201/00Indexing scheme relating to scanning, transmission or reproduction of documents or the like, and to details thereof
    • H04N2201/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N2201/3201Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N2201/3225Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title of data relating to an image, a page or a document
    • H04N2201/3233Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title of data relating to an image, a page or a document of authentication information, e.g. digital signature, watermark
    • H04N2201/3236Details of authentication information generation

Definitions

  • This invention relates to the area of security and integrity of the content of printed text documents.
  • the objective of this invention is to protect the content of documents printed on common paper against malicious changes and counterfeits; However, it allows slight alterations of the document that do not change the message written on it. It describes a system and method that can generate a signature that can be attached in printed form to a text document to ensure that its content has not been modified maliciously. In addition, a system and method is described that allows to verify if a document signed by the previous procedure has been modified.
  • US 6,764,000 describes a system that uses a scanner to identify points of interest or signs (such as watermarks, holograms, serial numbers, patterns, colors, etc.) present in the document to be secured. These clues are compared to a database that has been previously built. If sufficient clues are similar to those stored, the document is classified as authentic. Unlike that method, the present invention does not require the use of a database to authenticate the document. In addition, the invention generates a signature from the content or message written in the original document. In US 6,934,845 an invention is described that uses an encoding based on the blanks of a document to hide a signature that allows the document to be authenticated.
  • points of interest or signs such as watermarks, holograms, serial numbers, patterns, colors, etc.
  • This invention requires that a document be used in electronic format to generate the signature, and the signature is not generated with the content of the document.
  • the present invention differs from that in that it does not require the document in electronic format and uses the content of the document to generate the authentication information.
  • the invention also does not use a coding based on blanks and is clearly visible.
  • US 6,427,921 describes a method and system that uses various types of overlapping patterns with the original image that it is desired to secure to produce a watermark. Unlike this method and system, the present invention uses the contents of the document to generate the signature that will authenticate it.
  • HKl 028662 a method is described for securing printed documents that requires the use of a laser to illuminate the questionable document and then verify that the reflected pattern meets certain characteristics.
  • the present invention does not require laser technology.
  • US Patent 6,785,405 describes a system that uses digital images of printed documents to verify their authenticity. The system performs the segmentation of the images and then compares these segments with images contained in a database obtaining a correlation number for each type of document and segment. This phase serves to categorize the document. With this information, authentication information is read that the same document must possess and this information is used to verify that the document is original. Unlike this system, the present invention does not require segmentation or correlation to be used. In addition, it does not require the use of a database with known images of documents.
  • AIp Vision SafePaper technology uses a watermark that is invisible to secure a document. Unlike HP technology, this watermark can be read with a scanner; but disappears when copied or reproduced by other means. On the other hand, this system can only sign electronic documents. The present invention allows copying of documents as long as the content is intact.
  • there are several methods that allow you to attach a digital signature to a printed document S. Bhattacharjee and M. Kutter describe one of these algorithms in "Compression Tolerant Image Authentication", IEEE inter. Conference on Image processing, USA, pp. 435-439, 1998.
  • Zhang describes a method (called Iterated Closest Points or ICP) to align two-dimensional figures in the article "Iterative Point Matching for Registration of Free-Form Curves and Surfaces", International Journal of Computer Vision, VoI. 13, No. 2, pp. 119-152, 1994.
  • Veltkamp and Hagedoom describe similar algorithms (such as Chamfer Matching) in "State-of-the-ari in shape matching", technical report UU-CS- 1999-27, Utretcht University, Netherlands, 1999.
  • figure alignment algorithms cannot be used by themselves to generate secure documents. In addition to these techniques, our invention uses other algorithms to achieve this goal.
  • the method presented in this document is based on recognizing the patterns and repetitions of the words of a text document to generate a signature that can be attached to the paper where the document is located. But at no time is an optical character recognizer (OCR) used, as these usually present some failures (due to noise in the digital image) or limitations (due to the need to use word dictionaries).
  • OCR optical character recognizer
  • Our method has the advantage that it is not required to have a digital document.
  • the methods based on correlation and wavelets use textures, which are very sensitive to non-malicious changes such as rotation, translation and scaling that may be present when scanning a document.
  • the method presented here is robust to this kind of changes. In addition, it does not require the use of special materials such as holograms, special inks or laser systems to sign a document.
  • the invention allows the document to be copied and transmitted by any electronic, electrical or mechanical method as long as the written message has not been modified.
  • Figure 1 shows a diagram of the method for printing secure documents.
  • Figure 2 shows a diagram of the method to verify the integrity of a signed document.
  • Figure 3 shows a diagram showing the internal parts of the signature generation stage.
  • the method to ensure the integrity of the content of printed text documents consists of:
  • a method for printing secure documents which, in turn, is constituted of an image acquisition stage 3, to convert a printed text document 1 into a digital image (hereinafter simply referred to as "image") -
  • image a digital image
  • This stage can be implemented with any device digitizer such as a scanner, a digital copier, a multifunction printer, digital camera, etc.
  • the image obtained in said acquisition stage passes to a signature generation stage 5, to generate the element that ensures the integrity of the document (hereinafter simply called “signature”).
  • signature generation stage to generate the element that ensures the integrity of the document (hereinafter simply called “signature”).
  • the sub-stages that make up this stage are described in detail later.
  • the signature generated by this stage is sent along with the image to the signature annexation and printing stage 6, to append said signature to the original document and thus produce a secure document printed on paper 7.
  • the printing stage can be implemented with a device as a printer, digital copier, plotter or similar.
  • this section of the system may contain an image raking stage (image raster processor, in English) 4, to convert an electronic document 2 (such as those generated by conventional programs such as Microsoft Word or Excel) into a digital image, in order to pass this image to the signature generation stage 5.
  • image raking stage image raster processor, in English
  • These raking processors are usually an integral part of the hardware of many printers or can be obtained with the driver of these devices.
  • a method for the verification of secure documents which, in turn, is made up of an image acquisition stage 9 (such as the one described in the previous section) that converts a printed document 8 into a digital image.
  • This image is Go to a signature analysis stage 10 to extract the signature contained in the document.
  • This stage can be a simple two-dimensional barcode reader.
  • the image of the document goes to the signature generation stage 11 (described in detail in the next section of this document), to obtain the integrity check (signature) element from the textual content of the document.
  • the signatures extracted by these modules are passed to the document certification stage 13 which is responsible for verifying that both signatures coincide to finally issue a certificate of authenticity 13 of the signed document.
  • Two stages of signature generation (5 and 11) that are, in turn, a binarization stage 14 that converts a color or grayscale image to black and white.
  • a noise elimination stage 15 which eliminates the noise of the binarized image in the previous module.
  • a horizontal alignment stage 16 that modifies the inclination of the image so that the lines of text appear horizontal.
  • a word segmentation stage 17 to find the two-dimensional limits of each word in the document.
  • a step identification step 18 to obtain the dominant traces of each word obtained in the previous module.
  • a line alignment step 19 to align the words obtained in previous modules with each other.
  • a combination and encryption stage 20 that combines the information obtained in these modules to obtain a signature that can be attached to the original document to ensure its content.
  • the user scans a printed document 1 at the image acquisition stage 3, or, through some electronic text editor (such as Word or Excel) generates a file 2 that is converted into an image by the processor image raking 4.
  • the image is passed to the signature generator method 5.
  • the signature generator first binarizes the image in step 14 to obtain a black and white bitmap.
  • the binarization method can be any of those described in "Comparison of Some Thresholding Algorithms for Text / Background Segmentation in Difflcult Document Images” by Leedham, Yan, Takru and Tan published in The Seventh International Conference on Document Analysis and Recognition, VoI. 2, pg. 859
  • step 15 eliminates all those small points that are produced by the noise in the measurement system.
  • Gonzalez and Woods describe several such algorithms in Chapter 5 of Image Restoration in Digital Image Processing. Second Edition, USA, New Jersey, Addison-Wesley, 2002.
  • the clean image is then aligned so that the lines of text appear horizontal in the image (step 16).
  • Some procedures to achieve this are described in Chapter 9: Optical Character Recognition in Algorithms for Image Processing and Computer Vision, by J. R. Parker, Wiley, 1996.
  • word segmenter 17 Once the document has been aligned in this way. It is passed to word segmenter 17.
  • a vertical histogram of the image is obtained. This histogram contains one entry for each line of the image. The amount of black pixels in that line is stored in each entry. In this way, the text lines appear as maximums and the blanks between lines, as minimums. It is easy then to identify the beginning and end of each line by a simple differential analysis.
  • each line is given by a maximum that goes from a small value to a large value and the end of each line passes from a high value to a small value. It is clear that one skilled in the art can easily implement an algorithm that performs the described identification.
  • the segmenter After identifying the lines of the text, the segmenter obtains a horizontal histogram for each line where the amount of black pixels per column in that line is now stored. This histogram can be used once again to identify the beginning and end of each word, using a procedure similar to that described in the previous paragraph. In this way, the segmenter produces a list of words by line, where each word is assigned a unique number (coordinate).
  • the positions of each word and the image are passed to the step identification stage 18.
  • This stage individually analyzes each word to obtain the most representative straight line segments (strokes) of each word. This is achieved, in particular, through tensor voting techniques (described in detail in "A Computational Framework for Segmentation and Grouping” by Medioni, Lee and Tang, Elsevier 2000); although the use of alternative techniques such as Hough transform (US Patent 3,069,654 "Method and Meansfor Recognizing Complex Patterns”), active contours (snakes in English, such as those described in the article “Snakes, Shapes, and Gradient Vector Flow “IEEE Transactions on Image Processing, 7 (3), p. 359-369, March 1998 by C.
  • the strokes identified in this way are stored in a list of strokes per word and are passed, together with the positions of each word, to the line alignment stage 19. At this stage, all occurrences of each word are found in the rest of the document To detect the occurrence of a word in the rest of the document, said word is placed on a target word to be tested. If it is possible to properly align the main lines of the original word with the target word, then there is an occurrence.
  • Zhang the ICP
  • Zhang the ICP in "Iterative Point Matching for Registration of Free-Form Curves and Surfaces"
  • the list of strokes of each word and the list of repetitions of each word are appended to the list of words by line in step 20 to generate a series of numbers that uniquely identify each document.
  • This series of numbers can be encoded by itself as a two-dimensional barcode (or some other form of binary image coding); but usually, it is compressed and the result is obtained a cyclic redundancy code (CRC) which is normally used as a signature. It is also possible to obtain the original signature CRC (uncompressed). The signature thus obtained can be as small as a 16-bit number. This number is finally printed as a normal or two-dimensional barcode on the printed image you wish to secure. All these methods (compression and CRC) are widely known and used.
  • step 20 may use the information from the main lines of some critical sections of the document (such as figures on a check) to generate the signature, or as an annex to the regular signature described in the previous paragraph. It is also possible to systematically insert the list of words by line, the list of lines by word (for the critical sections of the document) and the list of repetitions, according to the needs of each particular document or the user's instructions, to generate the signature. Additionally, the signature generated with any of the procedures described (the sequence of numbers that describe the words and their repetitions, the sequence of the main strokes of the words or the systematic combination of all of them), can be encrypted with the public key of the issuer of the document to give greater security to the document. The encrypted signature can then be processed normally (it is compressed and its CRC is obtained).
  • OCR optical character recognizers
  • the signature obtained by this method is sent together with the image to the signature and printing annexation stage 6.
  • This stage converts the signature to a small image (two-dimensional barcode, widely known and used technology) and appends it into some unused part of the original document (for example, at the edges of the page).
  • the modified image is printed to finally generate a secure document 7.
  • step 9 To verify the integrity of a document 8, it is scanned and converted into an image in step 9. This image is sent to a signature analysis stage 10 that simply reads and interprets the signature printed on the document.
  • This stage can be implemented with a two-dimensional barcode reader, the use of which is widespread.
  • step 10 the image is sent to the signature generation stage 11 (whose operation has already been explained above) to obtain the signature from the content of document 8.
  • Both signatures are compared in the certification stage 12.
  • the signatures will be different.
  • the certifier 12 responds by indicating that the document has been altered. Otherwise, an integrity certificate 13 is issued, indicating that the document is true to the original.
  • the method of generating the signature constitutes the novelty of the invention.
  • the novelty of the method is that, through computational vision algorithms, the textual content of the printed document is analyzed to generate a signature without using a character recognizer (OCR). That is, all the information necessary to verify the integrity of the document is found in the document itself, and the signature can be as small as a 16-bit number. Instead of recognizing individual characters, the form and position of the words in the text are used as a signature to ensure that the content has not been modified.
  • OCR character recognizer

Abstract

The invention relates to a method of generating secure documents. The invention is characterised by the way in which the signatures used to secure the documents are generated. Said signatures are formed from the text content of a printed document without requiring an electronic transcription of the text. The inventive method is based on automatically recognising the form of the words and locating the positions at which the words are repeated, without necessarily knowing the significance of same. In this way, the document can be copied or modified using any means without altering the original message contained in the document. Finally, the method can be used to secure documents printed with normal paper.

Description

MÉTODO PARA GENERAR UNA FIRMA IMPRESA PARA ASEGURAR EL METHOD FOR GENERATING A PRINTED SIGNATURE TO ENSURE THE
CONTENIDO DE DOCUMENTOS DE TEXTO.CONTENTS OF TEXT DOCUMENTS.
Campo técnico de la invención Esta invención está relacionada con el área de seguridad e integridad del contenido de documentos de texto impresos. El objetivo de esta invención es proteger el contenido de documentos impresos en papel común contra cambios maliciosos y falsificaciones; que sin embargo, permite alteraciones leves del documento que no cambien el mensaje escrito en él. Se describe un sistema y método que puede generar una firma que puede ser anexada en forma impresa a un documento de texto para asegurar que el contenido del mismo no haya sido modificado de forma maliciosa. Además, se describe un sistema y método que permite verificar si un documento firmado por el procedimiento anterior ha sido modificado. Las características principales de estos métodos es que no requieren el uso de tecnologías o materiales especiales (como láser, hologramas o plásticos); el documento original no tiene que ser digital; el documento firmado puede copiarse sin perder las propiedades de la firma; la firma es claramente visible; no se necesita el reconocimiento óptico de caracteres (como en un OCR); y sólo la parte que contiene texto puede ser protegida.Technical Field of the Invention This invention relates to the area of security and integrity of the content of printed text documents. The objective of this invention is to protect the content of documents printed on common paper against malicious changes and counterfeits; However, it allows slight alterations of the document that do not change the message written on it. It describes a system and method that can generate a signature that can be attached in printed form to a text document to ensure that its content has not been modified maliciously. In addition, a system and method is described that allows to verify if a document signed by the previous procedure has been modified. The main characteristics of these methods is that they do not require the use of special technologies or materials (such as lasers, holograms or plastics); The original document does not have to be digital; the signed document can be copied without losing the signature properties; the signature is clearly visible; Optical character recognition is not required (as in an OCR); and only the part that contains text can be protected.
Antecedentes de la invenciónBackground of the invention
En la actualidad, existen muchos métodos para proteger documentos electrónicos contra modificaciones malintencionadas. Sin embargo, existen relativamente pocos avances para proteger documentos físicos. Entre las protecciones disponibles para estos últimos se encuentran las marcas de agua, el uso de códigos de barras, papeles especiales, hologramas y firmas o marcas de agua digitales impresas.At present, there are many methods to protect electronic documents against malicious modifications. However, there are relatively few advances to protect physical documents. Among the protections available for the latter are watermarks, the use of bar codes, special papers, holograms and digital signatures or printed watermarks.
Cualquier documento que requiera tener la seguridad de que su contenido no haya sido modificado se beneficiará de la tecnología que se expone en este documento. Ejemplos de tales documentos son los cheques bancarios, escrituras, testamentos, documentos notariados, pasaportes, etc. A continuación se presenta un breve resumen del estado de la técnica relacionado con la presente invención.Any document that requires assurance that its content has not been modified will benefit from the technology set forth in this document. Examples of such documents are bank checks, deeds, wills, notarized documents, passports, etc. A brief summary of the state of the art related to the present invention is presented below.
En las patentes U.S. 4,835,028, U.S. 6,414,761 y U.S. 4,630,845 se describen técnicas que requieren el uso de papeles especiales o la adición de elementos ópticos o magnéticos difíciles de reproducir (hologramas o cintas magnéticas, respectivamente) para asegurar la autenticidad e integridad de un documento. Este tipo de protección tiene la desventaja de requerir materiales especiales. La presente invención funciona con cualquier tipo de papel y aun así mantiene el mismo nivel de protección. En la patente U.S. 6,496,933 se describe un método que produce una marca en forma de imagen que puede agregarse a un documento para asegurar su autenticidad. Sin embargo, la principal desventaja de esta invención con respecto de la que se presenta aquí es que se requiere que el documento esté originalmente en un formato electrónico; mientras que la invención aquí descrita puede trabajar con documentos impresos. En la patente U.S. 6,764,000 se describe un sistema que utiliza un escáner para identificar puntos de interés o indicios (tales como marcas de agua, hologramas, números de serie, patrones, colores, etc.) presentes en el documento a asegurar. Estos indicios son comparados con una base de datos que se ha construido previamente. Si suficientes indicios son similares con los almacenados, el documento se clasifica como auténtico. A diferencia de aquel método, el presente invento no requiere el uso de una base de datos para autentificar el documento. Además, la invención genera una firma a partir del contenido o mensaje escrito en el documento original. En la patente U.S. 6,934,845 se describe un invento que utiliza una codificación basada en los espacios en blanco de un documento para ocultar una firma que permite autentificar el documento. Este invento requiere que se utilice un documento en formato electrónico para generar la firma, y la firma no se genera con el contenido del documento. Por el contrario, el presente invento se diferencia de aquél en que no requiere el documento en formato electrónico y utiliza el contenido del documento para generar la información de autentificación. La invención tampoco utiliza una codificación basada en espacios en blanco y es claramente visible. En la patente U.S. 6,427,921 se describe un método y sistema que utiliza diversos tipos de patrones superpuestos con la imagen original que se desea asegurar para producir una marca de agua. A diferencia de este método y sistema, la invención presente utiliza el contenido del documento para generar la firma que lo autentificará. En la patente HKl 028662 se describe un método para asegurar documentos impresos que requiere el uso de un láser para iluminar el documento cuestionable y luego verificar que el patrón reflejado cumpla con ciertas características. El invento presente no requiere de tecnología láser. La patente U.S. 6,785,405 describe un sistema que utiliza imágenes digitales de documentos impresos para verificar su autenticidad. El sistema realiza la segmentación de las imágenes y luego compara estos segmentos con imágenes contenidas en una base de datos obteniendo un número de correlación para cada tipo de documento y segmento. Esta fase sirve para categorizar el documento. Con esta información, se lee información de autentificación que el mismo documento debe poseer y esta información se utiliza para verificar que el documento sea original. A diferencia de este sistema, la invención presente no requiere que se utilice segmentación ni correlación. Además, no requiere el uso de una base de datos con imágenes conocidas de documentos.In US patents 4,835,028, US 6,414,761 and US 4,630,845 techniques are described that require the use of special papers or the addition of optical or magnetic elements difficult to reproduce (holograms or magnetic tapes, respectively) to ensure the authenticity and integrity of a document. This type of protection has the disadvantage of requiring special materials. The present invention works with any type of paper and still maintains the same level of protection. US 6,496,933 describes a method that produces an image mark that can be added to a document to ensure its authenticity. However, the main disadvantage of this invention with respect to that presented here is that it is required that the document be originally in an electronic format; while the invention described herein can work with printed documents. US 6,764,000 describes a system that uses a scanner to identify points of interest or signs (such as watermarks, holograms, serial numbers, patterns, colors, etc.) present in the document to be secured. These clues are compared to a database that has been previously built. If sufficient clues are similar to those stored, the document is classified as authentic. Unlike that method, the present invention does not require the use of a database to authenticate the document. In addition, the invention generates a signature from the content or message written in the original document. In US 6,934,845 an invention is described that uses an encoding based on the blanks of a document to hide a signature that allows the document to be authenticated. This invention requires that a document be used in electronic format to generate the signature, and the signature is not generated with the content of the document. On the contrary, the present invention differs from that in that it does not require the document in electronic format and uses the content of the document to generate the authentication information. The invention also does not use a coding based on blanks and is clearly visible. US 6,427,921 describes a method and system that uses various types of overlapping patterns with the original image that it is desired to secure to produce a watermark. Unlike this method and system, the present invention uses the contents of the document to generate the signature that will authenticate it. In HKl 028662 a method is described for securing printed documents that requires the use of a laser to illuminate the questionable document and then verify that the reflected pattern meets certain characteristics. The present invention does not require laser technology. US Patent 6,785,405 describes a system that uses digital images of printed documents to verify their authenticity. The system performs the segmentation of the images and then compares these segments with images contained in a database obtaining a correlation number for each type of document and segment. This phase serves to categorize the document. With this information, authentication information is read that the same document must possess and this information is used to verify that the document is original. Unlike this system, the present invention does not require segmentation or correlation to be used. In addition, it does not require the use of a database with known images of documents.
La patente U.S. 3,069,654 "Method and Means for Recognizing Complex Patterns" describe un método para detectar figuras en imágenes. Sin embargo, este método no puede usarse por sí mismo para proteger el contenido de documentos de texto. La tecnología Safe Paper de HP utiliza una marca de agua que sólo es visible cuando se coloca una carta de plástico sobre el documento firmado. Esta marca de agua puede imprimirse en cualquier documento utilizando impresoras y tintas normales. Esta marca de agua no puede ser reproducida por medios normales, de forma que los documentos no pueden ser copiados. A diferencia de este sistema, el invento aquí presentado no utiliza marcas de agua ni requiere el uso de plásticos especiales para verificar la integridad del documento. Por otro lado, la invención permite que se nagan múltiples copias de un documento siempre y cuando el contenido de dicho documento no sea alterado. Esto permite que los documentos generados con este sistema puedan ser enviados por medios electrónicos o electromecánicos (como un facsímil). De nombre semejante, la tecnología SafePaper de AIp Vision utiliza una marca de agua que es invisible para asegurar un documento. A diferencia de la tecnología HP, esta marca de agua puede ser leída con un escáner; pero desaparece al ser copiada o reproducida por otros medios. Por otro lado, este sistema sólo puede firmar documentos electrónicos. El invento presente permite el copiado de documentos siempre que el contenido quede intacto. En el área de las publicaciones científicas, existen varios métodos que permiten anexar una firma digital a un documento impreso. S. Bhattacharjee y M. Kutter, describen uno de estos algoritmos en "Compression Tolerant Image Authentication ", IEEE ínter. Conference on Image processing, USA, pp. 435-439, 1998. Este algoritmo está basado en el uso de wavelets para generar una firma de un documento digital. J. Fridrich, en "Methods for Detecting Changes in Digital Images ", Proc. IEEE Int. Workshop on Intell. Signal Processing and Communi catión Systems, 1998. Utiliza la señal del espectro de dispersión para generar firmas de documentos digitales.US Patent 3,069,654 "Method and Means for Recognizing Complex Patterns" describes a method for detecting figures in images. However, this method cannot be used by itself to protect the content of text documents. HP Safe Paper technology uses a watermark that is only visible when a plastic letter is placed on the signed document. This watermark can be printed on any document using normal printers and inks. This watermark cannot be reproduced by normal means, so documents cannot be copied. Unlike this system, the invention presented here does not use watermarks or require the use of special plastics to verify the integrity of the document. On the other hand, the invention allows multiple copies of a document to be napped as long as the content of said document is not altered. This allows documents generated with this system to be sent electronically or electromechanically (such as a facsimile). Of similar name, AIp Vision SafePaper technology uses a watermark that is invisible to secure a document. Unlike HP technology, this watermark can be read with a scanner; but disappears when copied or reproduced by other means. On the other hand, this system can only sign electronic documents. The present invention allows copying of documents as long as the content is intact. In the area of scientific publications, there are several methods that allow you to attach a digital signature to a printed document. S. Bhattacharjee and M. Kutter describe one of these algorithms in "Compression Tolerant Image Authentication", IEEE inter. Conference on Image processing, USA, pp. 435-439, 1998. This algorithm is based on the use of wavelets to generate a digital document signature. J. Fridrich, in "Methods for Detecting Changes in Digital Images", Proc. IEEE Int. Workshop on Intell. Signal Processing and Communi cation Systems, 1998. Uses the scatter spectrum signal to generate digital document signatures.
C-Y. Lin y S.-F. Chang en "A Robust Image Authentication Method Surviving JPEG Lossly Compression ", SPIE Storage and Retrieval of Image/Video Datábase, VoI 3312, San José, 1998, usan wavelets para asegurar el contenido de documentos digitales. En ninguno de los artículos citados hasta ahora se trabaja con documentos impresos. Además, el presente sistema no utiliza wavelets y está basado en codificar el contenido del texto impreso en el documento.C-Y. Lin and S.-F. Chang in "A Robust Image Authentication Method Surviving JPEG Lossly Compression", SPIE Storage and Retrieval of Image / Video Datábase, VoI 3312, San Jose, 1998, use wavelets to secure the content of digital documents. None of the articles cited so far work with printed documents. In addition, this system does not use wavelets and is based on encoding the content of the text printed in the document.
Baoshi Zhu, Jiankang Wu y Mohán Kankanhalli en "Print Signatures for Document Authentication", en Proceedings of the lOth ACM conference on Computer and Communications security Washington D.C., USA, pp. 145-154, 2003, utilizan la aleatoriedad intrínseca al proceso de impresión láser para verificar la autenticidad de un documento impreso. La presente invención no requiere el empleo de una tecnología de impresión específica. En "Comparison ofSome Thresholding Algorithms for Text/Background Segmentation in Difflcult Document Images", publicado en The Seventh International Conference on Document Analysis and Recognition, VoI. 2, pg. 859, Leedham, Yan, Takru y Tan describen varios algoritmos de binarización para la segmentación de documentos de texto. Sin embargo, la binarización por sí misma no puede ser empleada para asegurar el contenido de documentos de texto impresos. Nuestro invento, utiliza otras técnicas además de la binarización para lograr este objetivo.Baoshi Zhu, Jiankang Wu and Mohán Kankanhalli in "Print Signatures for Document Authentication", in Proceedings of the lOth ACM conference on Computer and Communications security Washington D.C., USA, pp. 145-154, 2003, use the intrinsic randomness to the laser printing process to verify the authenticity of a printed document. The present invention does not require the use of a specific printing technology. In "Comparison ofSome Thresholding Algorithms for Text / Background Segmentation in Difflcult Document Images", published in The Seventh International Conference on Document Analysis and Recognition, VoI. 2, pg. 859, Leedham, Yan, Takru and Tan describe various binarization algorithms for the segmentation of text documents. However, binarization by itself cannot be used to secure the content of printed text documents. Our invention uses other techniques besides binarization to achieve this goal.
González y Woods en el capítulo 5 de Image Restoration de Digital Image Processing, Segunda Edición, E.U.A, Nueva Jersey, Addison-Wesley, 2002, describen algunos algoritmos de eliminación de ruido en la imagen. Sin embargo, ninguno de estos algoritmos puede usarse por sí mismo para asegurar la integridad de documentos de texto. Además de estos algoritmos, nuestro invento utiliza otras técnicas para producir documentos seguros.González and Woods in Chapter 5 of Image Restoration of Digital Image Processing, Second Edition, U.S.A., New Jersey, Addison-Wesley, 2002, describe some algorithms to eliminate image noise. However, none of these algorithms can be used by itself to ensure the integrity of text documents. In addition to these algorithms, our invention uses other techniques to produce secure documents.
En el capítulo Optical Character Recognition del libro Algorithms for Image Processing and Computer Vision, de J. R. Parker, Wiley, 1996, se describe un algoritmo que alinea el texto de una imagen con los renglones horizontales de la misma. Sin embargo, este algoritmo, por sí mismo, no puede usarse para asegurar el contenido de documentos de texto. Nuestro invento utiliza otras técnicas además de esta para lograr dicho objetivo. En "A Computacional Frameworkfor Segmentation and Grouping" de Medioni, Lee y Tang, Elsevier 2000, se describen algoritmos que permiten encontrar segmentos de línea orientados utilizando una metodología llamada votación tensorial. Sin embargo, la votación tensorial por sí misma no puede usarse para asegurar documentos de texto. La presente invención utiliza otras técnicas además de ésta para llegar a este fin. Zhang describe un método (llamado Iterated Closest Points o ICP) para alinear figuras en dos dimensiones en el artículo "Iterative Point Matchingfor Registration ofFree- Form Curves and Surfaces" , International Journal of Computer Vision, VoI. 13, No. 2, pp. 119-152, 1994. Veltkamp y Hagedoom describen algoritmos semejantes (como el Chamfer Matching) en "State-of-the-arí in shape matching", technical report UU-CS- 1999-27, Utretcht University, Netherlands, 1999. También es posible alinear figuras mediante una búsqueda exhaustiva que consiste simplemente en probar todas las posibles formas en las cuales las figuras pueden coincidir. Este método no suele ser eficiente y por este motivo rara vez es mencionado en la literatura, pero resulta obvia su implementación. Sin embargo, los algoritmos de alineación de figuras no pueden usarse por sí mismos para generar documentos seguros. Además de estas técnicas, nuestro invento utiliza otros algoritmos para lograr esta meta.In the chapter Optical Character Recognition of the book Algorithms for Image Processing and Computer Vision, by JR Parker, Wiley, 1996, a algorithm that aligns the text of an image with the horizontal lines of it. However, this algorithm, by itself, cannot be used to secure the content of text documents. Our invention uses other techniques in addition to this to achieve that goal. In "A Computational Framework for Segmentation and Grouping" of Medioni, Lee and Tang, Elsevier 2000, algorithms are described that allow to find oriented segments of line using a methodology called tensor voting. However, tensor voting by itself cannot be used to secure text documents. The present invention uses techniques other than this to reach this end. Zhang describes a method (called Iterated Closest Points or ICP) to align two-dimensional figures in the article "Iterative Point Matching for Registration of Free-Form Curves and Surfaces", International Journal of Computer Vision, VoI. 13, No. 2, pp. 119-152, 1994. Veltkamp and Hagedoom describe similar algorithms (such as Chamfer Matching) in "State-of-the-ari in shape matching", technical report UU-CS- 1999-27, Utretcht University, Netherlands, 1999. Also It is possible to align figures by means of an exhaustive search that consists simply in trying out all the possible ways in which the figures can coincide. This method is not usually efficient and for this reason it is rarely mentioned in the literature, but its implementation is obvious. However, figure alignment algorithms cannot be used by themselves to generate secure documents. In addition to these techniques, our invention uses other algorithms to achieve this goal.
C. Xu y J. L. Prince describen un algoritmo de reconocimiento y vectorización de formas (llamado comúnmente snakes o contornos activos) en "Snakes, Shapes, and Gradient Vector Flow " IEEE Transactions on Image Processing, 7(3), pg. 359-369, Marzo 1998. Sin embargo, este algoritmo no puede usarse por sí solo para obtener documentos seguros. Nuestro invento utiliza técnicas semejantes en una etapa del método; pero utiliza otros algoritmos además de éste para producir documentos seguros. Bernd Jáhne describe varios algoritmos de reconocimiento de orientación local en el capítulo 13 de su libro "Digital Image Processing", Springer-Verlag, 1997. Estos algoritmos están basados en la Transformada de Fourier, análisis de gradientes, representación tensorial, números de onda locales y fase, la transformada de Hilbert y el filtro de Hilbert, filtros de cuadratura, filtros de Gabor, y variantes de éstos. Sin embargo, todos estos métodos producen la orientación local de una imagen. Además de esto, se necesitan otros algoritmos para producir documentos seguros, tal como lo hace el presente invento.C. Xu and JL Prince describe an algorithm for recognition and vectorization of forms (commonly called snakes or active contours) in "Snakes, Shapes, and Gradient Vector Flow" IEEE Transactions on Image Processing, 7 (3), pg. 359-369, March 1998. However, this algorithm cannot be used by itself to obtain secure documents. Our invention uses similar techniques at one stage of the method; but it uses other algorithms besides this one to produce secure documents. Bernd Jáhne describes several local orientation recognition algorithms in Chapter 13 of his book "Digital Image Processing", Springer-Verlag, 1997. These algorithms are based on Fourier Transform, gradient analysis, tensor representation, local wave numbers and phase, the Hilbert transform and the Hilbert filter, quadrature filters, Gabor filters, and variants thereof. Without However, all these methods produce the local orientation of an image. In addition to this, other algorithms are needed to produce secure documents, just as the present invention does.
A. J. Menezes. P. C. van Oorshot, y S.A. Vanstone describen algoritmos de verificación de redundancia cíclica (CRC) y encriptación con llave pública y privada en el libro "Handbook of Applied CRC Press", 1997. Sin embargo, estos algoritmos no pueden usarse para verificar la integridad de documentos impresos, pues estos algoritmos requieren que la secuencia de bytes original sea idéntica en todo momento. Esta condición no se cumple al escanear un documento impreso, donde las condiciones de iluminación pueden variar y producir diferentes secuencias de bytes en cada caso. Nuestro invento funciona a pesar de estas variaciones en iluminación. Finalmente, cabe señalar que esta patente es una extensión de la solicitud de patente PCTVMX2005/000019 de Sergio Fernández. En ese documento, se reclama un sistema y método para la impresión segura de documentos. Nuestro invento se diferencia de aquél por el método empleado para asegurar los documentos (la firma).A. J. Menezes. P. C. van Oorshot, and S.A. Vanstone describe cyclic redundancy verification (CRC) and public and private key encryption algorithms in the book "Handbook of Applied CRC Press", 1997. However, these algorithms cannot be used to verify the integrity of printed documents, as these algorithms require that the original byte sequence be identical at all times. This condition is not met when scanning a printed document, where lighting conditions may vary and produce different byte sequences in each case. Our invention works despite these variations in lighting. Finally, it should be noted that this patent is an extension of the patent application PCTVMX2005 / 000019 by Sergio Fernández. In that document, a system and method for secure printing of documents is claimed. Our invention differs from that by the method used to secure the documents (the signature).
El método que se expone en este documento está basado en reconocer los patrones y repeticiones de las palabras de un documento de texto para generar una firma que puede ser anexada al papel donde se encuentra el documento. Pero en ningún momento se hace uso de un reconocedor óptico de caracteres (OCR), pues éstos suelen presentar algunas fallas (por el ruido en la imagen digital) o limitaciones (por la necesidad de usar diccionarios de palabras). Nuestro método tiene la ventaja de que no se requiere tener un documento digital. Los métodos basados en correlación y wavelets utilizan texturas, las cuales son muy sensibles a cambios no maliciosos como la rotación, traslación y escala que pueden estar presentes al escanear un documento. El método aquí expuesto es robusto a esta clase de cambios. Además, no requiere el empleo de materiales especiales como hologramas, tintas especiales o sistemas láser para firmar un documento. Finalmente, a diferencia de algunas tecnologías descritas, la invención permite que el documento sea copiado y transmitido por cualquier método electrónico, eléctrico o mecánico siempre y cuando el mensaje escrito no haya sido modificado. Breve descripción de las figurasThe method presented in this document is based on recognizing the patterns and repetitions of the words of a text document to generate a signature that can be attached to the paper where the document is located. But at no time is an optical character recognizer (OCR) used, as these usually present some failures (due to noise in the digital image) or limitations (due to the need to use word dictionaries). Our method has the advantage that it is not required to have a digital document. The methods based on correlation and wavelets use textures, which are very sensitive to non-malicious changes such as rotation, translation and scaling that may be present when scanning a document. The method presented here is robust to this kind of changes. In addition, it does not require the use of special materials such as holograms, special inks or laser systems to sign a document. Finally, unlike some of the described technologies, the invention allows the document to be copied and transmitted by any electronic, electrical or mechanical method as long as the written message has not been modified. Brief description of the figures
La figura 1 muestra un diagrama del método para imprimir documentos seguros. La figura 2 muestra un diagrama del método para verificar la integridad de un documento firmado. La figura 3 muestra un diagrama donde se muestran las partes internas de la etapa de generación de firmas.Figure 1 shows a diagram of the method for printing secure documents. Figure 2 shows a diagram of the method to verify the integrity of a signed document. Figure 3 shows a diagram showing the internal parts of the signature generation stage.
Descripción detallada de la invención.Detailed description of the invention.
Con referencia a dichas figuras, el método para asegurar la integridad del contenido de documentos de texto impresos se constituye de:With reference to these figures, the method to ensure the integrity of the content of printed text documents consists of:
Un método para la impresión de documentos seguros. El cual, a su vez, se constituye de una etapa de adquisición de imágenes 3, para convertir un documento de texto impreso 1 en una imagen digital (denominada de ahora en adelante simplemente como "imagen")- Esta etapa puede implementarse con cualquier dispositivo digitalizador como un escáner, una copiadora digital, una impresora multifunción, cámara digital, etc. La imagen obtenida en dicha etapa de adquisición pasa a una etapa de generación de firma 5, para generar el elemento que asegura la integridad del documento (llamado de aquí en adelante simplemente "firma"). Las subetapas que componen esta etapa se describen en detalle después. La firma generada por esta etapa se envía junto con la imagen a la etapa de anexión e impresión de firma 6, para anexar dicha firma al documento original y así producir un documento seguro impreso en papel 7. La etapa de impresión puede implementarse con un dispositivo como una impresora, copiadora digital, plotter o similar. De manera alternativa, esta sección del sistema puede contener una etapa de rastrillado de imágenes (image ráster processor, en inglés) 4, para convertir un documento electrónico 2 (como los generados por programas convencionales como Microsoft Word o Excel) en una imagen digital, para así poder pasar esta imagen a la etapa de generación de firma 5. Estos procesadores de rastrillado suelen ser parte integral del hardware de muchas impresoras o bien, pueden conseguirse con el driver de estos dispositivos.A method for printing secure documents. Which, in turn, is constituted of an image acquisition stage 3, to convert a printed text document 1 into a digital image (hereinafter simply referred to as "image") - This stage can be implemented with any device digitizer such as a scanner, a digital copier, a multifunction printer, digital camera, etc. The image obtained in said acquisition stage passes to a signature generation stage 5, to generate the element that ensures the integrity of the document (hereinafter simply called "signature"). The sub-stages that make up this stage are described in detail later. The signature generated by this stage is sent along with the image to the signature annexation and printing stage 6, to append said signature to the original document and thus produce a secure document printed on paper 7. The printing stage can be implemented with a device as a printer, digital copier, plotter or similar. Alternatively, this section of the system may contain an image raking stage (image raster processor, in English) 4, to convert an electronic document 2 (such as those generated by conventional programs such as Microsoft Word or Excel) into a digital image, in order to pass this image to the signature generation stage 5. These raking processors are usually an integral part of the hardware of many printers or can be obtained with the driver of these devices.
Un método para la verificación de documentos seguros. El cual, a su vez, se constituye de una etapa de adquisición de imágenes 9 (como la que se describió en la sección anterior) que convierte un documento impreso 8 en una imagen digital. Esta imagen se pasa a una etapa de análisis de firma 10 para extraer la firma contenida en el documento. Esta etapa puede ser un simple lector de código de barras bidimensional. A su vez, la imagen del documento pasa a la etapa de generación de firma 11 (descrito en detalle en la siguiente sección de este documento), para obtener el elemento de comprobación de integridad (firma) a partir del contenido textual del documento. Las firmas extraídas por estos módulos se pasan a la etapa de certificación de documento 13 que se encarga de verificar que ambas firmas coincidan para finalmente emitir un certificado de autenticidad 13 del documento firmado. Dos etapas de generación de firma (5 y 11) que se constituyen, a su vez, de una etapa de binarización 14 que convierte una imagen en color o escala de grises a blanco y negro. Una etapa de eliminación de ruido 15, que elimina el ruido de la imagen binarizada en el módulo anterior. Una etapa de alineación horizontal 16 que modifica la inclinación de la imagen para que las líneas de texto aparezcan horizontales. Una etapa de segmentación de palabras 17 para encontrar los límites bidimensionales de cada palabra en el documento. Una etapa de identificación de trazos 18 para obtener los trazos dominantes de cada palabra obtenida en el módulo anterior. Una etapa de alineación de trazos 19 para alinear las palabras obtenidas en módulos anteriores entre sí. Y, finalmente, una etapa de combinación y encriptación 20 que combina la información obtenida en estos módulos para obtener una firma que puede anexarse al documento original para asegurar su contenido.A method for the verification of secure documents. Which, in turn, is made up of an image acquisition stage 9 (such as the one described in the previous section) that converts a printed document 8 into a digital image. This image is Go to a signature analysis stage 10 to extract the signature contained in the document. This stage can be a simple two-dimensional barcode reader. In turn, the image of the document goes to the signature generation stage 11 (described in detail in the next section of this document), to obtain the integrity check (signature) element from the textual content of the document. The signatures extracted by these modules are passed to the document certification stage 13 which is responsible for verifying that both signatures coincide to finally issue a certificate of authenticity 13 of the signed document. Two stages of signature generation (5 and 11) that are, in turn, a binarization stage 14 that converts a color or grayscale image to black and white. A noise elimination stage 15, which eliminates the noise of the binarized image in the previous module. A horizontal alignment stage 16 that modifies the inclination of the image so that the lines of text appear horizontal. A word segmentation stage 17 to find the two-dimensional limits of each word in the document. A step identification step 18 to obtain the dominant traces of each word obtained in the previous module. A line alignment step 19 to align the words obtained in previous modules with each other. And, finally, a combination and encryption stage 20 that combines the information obtained in these modules to obtain a signature that can be attached to the original document to ensure its content.
Método para la impresión de documentos seguros.Method for printing secure documents.
Para generar un documento seguro, el usuario escanea un documento impreso 1 en la etapa de adquisición de imágenes 3, o bien, mediante algún editor de texto electrónico (como Word o Excel) genera un archivo 2 que es convertido en una imagen por el procesador de rastrillado de imágenes 4. La imagen se pasa al método generador de firma 5.To generate a secure document, the user scans a printed document 1 at the image acquisition stage 3, or, through some electronic text editor (such as Word or Excel) generates a file 2 that is converted into an image by the processor image raking 4. The image is passed to the signature generator method 5.
El generador de firma primero binariza la imagen en la etapa 14 para obtener un mapa de bits a blanco y negro. El método de binarización puede ser cualquiera de los descritos en "Comparison of Some Thresholding Algorithms for Text/Background Segmentation in Difflcult Document Images" de Leedham, Yan, Takru y Tan publicado en The Seventh International Conference on Document Analysis and Recognition, VoI. 2, pg. 859. Una vez que la imagen ha sido binarizada, la etapa 15 elimina todos aquellos puntos pequeños que son producidos por el ruido en el sistema de medición. González y Woods describen varios de tales algoritmos en el capítulo 5 de Image Restoration en Digital Image Processing. Segunda Edición, E.U.A, Nueva Jersey, Addison-Wesley, 2002.The signature generator first binarizes the image in step 14 to obtain a black and white bitmap. The binarization method can be any of those described in "Comparison of Some Thresholding Algorithms for Text / Background Segmentation in Difflcult Document Images" by Leedham, Yan, Takru and Tan published in The Seventh International Conference on Document Analysis and Recognition, VoI. 2, pg. 859 Once the image has been binarized, step 15 eliminates all those small points that are produced by the noise in the measurement system. Gonzalez and Woods describe several such algorithms in Chapter 5 of Image Restoration in Digital Image Processing. Second Edition, USA, New Jersey, Addison-Wesley, 2002.
La imagen limpia es entonces alineada de forma que las líneas de texto aparezcan horizontales en la imagen (etapa 16). Algunos procedimientos para lograr esto se describen en el capítulo 9: Optical Character Recognition en Algor ithms for Image Processing and Computer Vision, de J. R. Parker, Wiley, 1996. Una vez que el documento ha sido alineado de esta forma. Se pasa al segmentador de palabras 17. En esta etapa se obtiene un histograma vertical de la imagen. Este histograma contiene una entrada por cada renglón de la imagen. En cada entrada se almacena la cantidad de píxeles negros en dicho renglón. De esta forma, los renglones de texto aparecen como máximos y los espacios en blanco entre líneas, como mínimos. Resulta fácil entonces identificar el comienzo y final de cada renglón mediante un sencillo análisis diferencial. El comienzo de cada renglón viene dado por un máximo que pasa de un valor pequeño a un valor grande y el final de cada renglón pasa de un valor alto a un valor pequeño. Queda claro que un experto en la materia puede implementar fácilmente un algoritmo que realice la identificación descrita. Después de identificar los renglones del texto, el segmentador obtiene un histograma horizontal para cada renglón donde ahora se almacena la cantidad de píxeles negros que hay por cada columna en dicho renglón. Este histograma puede usarse una vez más para identificar los comienzos y finales de cada palabra, usando un procedimiento similar al descrito en el párrafo anterior. De esta forma, el segmentador produce una lista de palabras por renglón, donde a cada palabra se le asigna un número único (coordenada). Esta numeración sólo depende del contenido del documento en sí y no se ve alterada por cambios circunstanciales (como cambio en la resolución de la imagen, traslación, rotación, o métodos de copiado que no alteren el contenido). Otras técnicas, como los wavelets, se ven afectadas por esta clase de transformaciones inofensivas.The clean image is then aligned so that the lines of text appear horizontal in the image (step 16). Some procedures to achieve this are described in Chapter 9: Optical Character Recognition in Algorithms for Image Processing and Computer Vision, by J. R. Parker, Wiley, 1996. Once the document has been aligned in this way. It is passed to word segmenter 17. At this stage a vertical histogram of the image is obtained. This histogram contains one entry for each line of the image. The amount of black pixels in that line is stored in each entry. In this way, the text lines appear as maximums and the blanks between lines, as minimums. It is easy then to identify the beginning and end of each line by a simple differential analysis. The beginning of each line is given by a maximum that goes from a small value to a large value and the end of each line passes from a high value to a small value. It is clear that one skilled in the art can easily implement an algorithm that performs the described identification. After identifying the lines of the text, the segmenter obtains a horizontal histogram for each line where the amount of black pixels per column in that line is now stored. This histogram can be used once again to identify the beginning and end of each word, using a procedure similar to that described in the previous paragraph. In this way, the segmenter produces a list of words by line, where each word is assigned a unique number (coordinate). This numbering only depends on the content of the document itself and is not altered by circumstantial changes (such as a change in image resolution, translation, rotation, or copying methods that do not alter the content). Other techniques, such as wavelets, are affected by this kind of harmless transformations.
Las posiciones de cada palabra y la imagen se pasan a la etapa de identificación de trazos 18. Esta etapa analiza individualmente cada palabra para obtener los segmentos de línea recta (trazos) más representativos de cada palabra. Esto se logra, en particular, mediante técnicas de votación tensorial (descritas en detalle en "A Computacional Frameworkfor Segmentation and Grouping" de Medioni, Lee y Tang, Elsevier 2000); aunque no se descarta el uso de técnicas alternativas como la transformada de Hough (Patente U.S. 3,069,654 "Method and Meansfor Recognizing Complex Patterns"), los contornos activos (snakes en inglés, como los que se describen en el artículo "Snakes, Shapes, and Gradient Vector Flow " IEEE Transactions on Image Processing, 7(3), pg. 359-369, Marzo 1998 de C. Xu y J. L. Prince), o alguna de las técnicas que se describieron en los antecedentes de este documento. Estas tecnologías han estado en uso por varios años y son ampliamente conocidas por cualquier experto en el área. La votación tensorial produce un trazo por cada píxel de la imagen. Para comprimir estos datos, el siguiente paso es agrupar los trazos de acuerdo a la afinidad de su dirección. Esto se logra mediante una búsqueda en profundidad cuyo criterio de terminación es una diferencia de ángulo excesivo entre pares de trazos. La búsqueda en profundidad es una técnica de recorrido de grafos que es ampliamente conocida y utilizada desde hace varios años. Una vez más, implementar una búsqueda en profundidad con las características señaladas es una tarea trivial para cualquier programador.The positions of each word and the image are passed to the step identification stage 18. This stage individually analyzes each word to obtain the most representative straight line segments (strokes) of each word. This is achieved, in particular, through tensor voting techniques (described in detail in "A Computational Framework for Segmentation and Grouping" by Medioni, Lee and Tang, Elsevier 2000); although the use of alternative techniques such as Hough transform (US Patent 3,069,654 "Method and Meansfor Recognizing Complex Patterns"), active contours (snakes in English, such as those described in the article "Snakes, Shapes, and Gradient Vector Flow "IEEE Transactions on Image Processing, 7 (3), p. 359-369, March 1998 by C. Xu and JL Prince), or any of the techniques described in the background of this document. These technologies have been in use for several years and are widely known to any expert in the area. Tensor voting produces a stroke for each pixel in the image. To compress this data, the next step is to group the strokes according to the affinity of your address. This is achieved by an in-depth search whose termination criterion is an excessive angle difference between pairs of strokes. In-depth search is a graph tour technique that is widely known and used for several years. Once again, implementing an in-depth search with the characteristics indicated is a trivial task for any programmer.
Los trazos identificados de esta forma se almacenan en una lista de trazos por palabra y se pasan, junto con las posiciones de cada palabra, a la etapa de alineación de trazos 19. En esta etapa, se encuentran todas las ocurrencias de cada palabra en el resto del documento. Para detectar la ocurrencia de una palabra en el resto del documento, se coloca dicha palabra sobre una palabra destino a probar. Si es posible alinear adecuadamente los trazos principales de la palabra original con la palabra destino, entonces existe una ocurrencia. Existe una variedad de algoritmos que permiten hacer esta alineación, como el descrito por Zhang (el ICP) en "Iterative Point Matchingfor Registration of Free-Form Curves and Surfaces", International Journal of Computer Vision, VoI. 13, No. 2, pp. 119-152, 1994; o por alguno de los métodos descritos por Veltkamp y Hagedoom (como el Chamfer Matching) en "State-of-the-art in shape matching", technical report UU-CS-1999-27, Utretcht University, Netherlands, 1999. Estos algoritmos se han venido empleando desde hace algunos años y resultan fáciles de implementar para un experto en el área. Al finalizar esta etapa se tiene entonces una lista de repeticiones que indica las posiciones en las cuales vuelve a ocurrir cada palabra del texto. Puede argumentarse que si el texto no contiene repeticiones de palabras, entonces puede ser alterado fácilmente sin que este cambio sea notorio en la firma. Sin embargo, un documento significativo no suele tener estas características. De manera que, en general, cualquier cambio en la posición o frecuencia de las palabras produce una lista distinta, la cual, a su vez, produce una firma distinta y la alteración es detectable.The strokes identified in this way are stored in a list of strokes per word and are passed, together with the positions of each word, to the line alignment stage 19. At this stage, all occurrences of each word are found in the rest of the document To detect the occurrence of a word in the rest of the document, said word is placed on a target word to be tested. If it is possible to properly align the main lines of the original word with the target word, then there is an occurrence. There are a variety of algorithms that allow this alignment to be made, such as the one described by Zhang (the ICP) in "Iterative Point Matching for Registration of Free-Form Curves and Surfaces", International Journal of Computer Vision, VoI. 13, No. 2, pp. 119-152, 1994; or by any of the methods described by Veltkamp and Hagedoom (such as Chamfer Matching) in "State-of-the-art in shape matching", technical report UU-CS-1999-27, Utretcht University, Netherlands, 1999. These algorithms They have been used for some years and are easy to implement for an expert in the area. At the end of this stage there is a list of repetitions that indicates the positions in which each word of the text reoccurs. It can be argued that if the text does not contain repetitions of words, then it can be easily altered without this change being noticeable in the signature. However, a significant document does not usually have these characteristics. So, in general, any change in the position or frequency of the words produces a different list, which, in turn, produces a different signature and the alteration is detectable.
Finalmente, la lista de trazos de cada palabra y la lista de repeticiones de cada palabra se anexan al listado de palabras por renglón en la etapa 20 para generar una serie de números que identifican de manera única a cada documento. Esta serie de números puede codificarse por sí misma como un código de barras bidimensional (o alguna otra forma de codificación de imagen binaria); pero por lo general, se comprime y al resultado se le obtiene un código de redundancia cíclica (CRC) el cual es usado normalmente como firma. También es posible obtener el CRC de la firma original (sin comprimir). La firma así obtenida puede ser tan pequeña como un número de 16 bits. Este número se imprime finalmente como un código de barras normal o bidimensional en la imagen impresa que se desea asegurar. Todos estos métodos (compresión y CRC) son ampliamente conocidos y utilizados.Finally, the list of strokes of each word and the list of repetitions of each word are appended to the list of words by line in step 20 to generate a series of numbers that uniquely identify each document. This series of numbers can be encoded by itself as a two-dimensional barcode (or some other form of binary image coding); but usually, it is compressed and the result is obtained a cyclic redundancy code (CRC) which is normally used as a signature. It is also possible to obtain the original signature CRC (uncompressed). The signature thus obtained can be as small as a 16-bit number. This number is finally printed as a normal or two-dimensional barcode on the printed image you wish to secure. All these methods (compression and CRC) are widely known and used.
De manera alternativa, la etapa 20 puede utilizar la información de los trazos principales de algunas secciones críticas del documento (como las cifras en un cheque) para generar la firma, o como anexo a la firma regular descrita en el párrafo anterior. También es posible intercalar sistemáticamente la lista de palabras por renglón, la lista de trazos por palabra (para las secciones críticas del documento) y la lista de repeticiones, según las necesidades de cada documento particular o las indicaciones del usuario, para generar la firma. Adicionalmente, la firma generada con cualquiera de los procedimientos descritos (la secuencia de números que describen las palabras y sus repeticiones, la secuencia de los trazos principales de las palabras o la combinación sistemática de todas éstas), puede ser encriptada con la llave pública del emisor del documento para dar mayor seguridad al documento. La firma encriptada puede ser entonces procesada normalmente (se comprime y se le obtiene su CRC). Es de notar que procedimientos semejantes a los descritos para las etapas 14, 15, 16 y 17 ya se utilizan en algunos reconocedores ópticos de caracteres (OCR). La novedad estriba en que en lugar de intentar reconocer caracteres para formar palabras significativas (como en un OCR), el presente proceso reconoce cualquier patrón y las repeticiones de éstos. Un OCR tiene los inconvenientes de que no funciona para cualquier tipo de letra (fuente) y además necesita un diccionario de cada lenguaje para desambiguar algunas palabras. Nuestra invención se distingue de un OCR en que no requiere ningún diccionario y funciona para cualquier idioma y tipo de letra. Más aún, al no requerir el reconocimiento de palabras, el invento puede funcionar fácilmente con texto escrito a mano, lo cual todavía se considera un problema difícil de resolver actualmente mediante un OCR.Alternatively, step 20 may use the information from the main lines of some critical sections of the document (such as figures on a check) to generate the signature, or as an annex to the regular signature described in the previous paragraph. It is also possible to systematically insert the list of words by line, the list of lines by word (for the critical sections of the document) and the list of repetitions, according to the needs of each particular document or the user's instructions, to generate the signature. Additionally, the signature generated with any of the procedures described (the sequence of numbers that describe the words and their repetitions, the sequence of the main strokes of the words or the systematic combination of all of them), can be encrypted with the public key of the issuer of the document to give greater security to the document. The encrypted signature can then be processed normally (it is compressed and its CRC is obtained). It should be noted that procedures similar to those described for stages 14, 15, 16 and 17 are already used in some optical character recognizers (OCR). The novelty is that instead of trying to recognize characters to form meaningful words (as in an OCR), the present process recognizes any pattern and repetitions of these. An OCR has the disadvantages that it does not work for any typeface (font) and also needs a dictionary of each language to disambiguate some words. Our invention is distinguished from an OCR in that it does not require any dictionary and works for any language and typeface. Moreover, by not requiring word recognition, the invention can easily work with handwritten text, which is still considered a difficult problem to solve at present by an OCR.
Finalmente, la firma obtenida por este método se envía junto con la imagen a la etapa de anexión de firma e impresión 6. Esta etapa convierte la firma a una imagen pequeña (código de barras bidimensional, tecnología ampliamente conocida y usada) y la anexa en alguna parte no utilizada del documento original (por ejemplo, en los bordes de la página). La imagen modificada se imprime para generar finalmente un documento seguro 7.Finally, the signature obtained by this method is sent together with the image to the signature and printing annexation stage 6. This stage converts the signature to a small image (two-dimensional barcode, widely known and used technology) and appends it into some unused part of the original document (for example, at the edges of the page). The modified image is printed to finally generate a secure document 7.
Método para la verificación de documentos seguros.Method for the verification of secure documents.
Para verificar la integridad de un documento 8, se escanea y se convierte en una imagen en la etapa 9. Esta imagen se envía a una etapa de análisis de firma 10 que simplemente lee e interpreta la firma impresa en el documento. Esta etapa puede implementarse con un lector de código de barras bidimensional, cuyo uso está ampliamente difundido.To verify the integrity of a document 8, it is scanned and converted into an image in step 9. This image is sent to a signature analysis stage 10 that simply reads and interprets the signature printed on the document. This stage can be implemented with a two-dimensional barcode reader, the use of which is widespread.
De forma adicional, después de que la firma ha sido procesada con un lector de código de barras, puede ser necesario desencriptarla con la llave privada del emisor, para verificar que la integridad de la firma por sí misma no haya sido comprometida. Esto también se realiza en la etapa 10. Simultáneamente a esta etapa, la imagen se envía a la etapa de generación de firma 11 (cuyo funcionamiento ya ha sido explicado anteriormente) para obtener la firma a partir del contenido del documento 8.Additionally, after the signature has been processed with a barcode reader, it may be necessary to decrypt it with the issuer's private key, to verify that the integrity of the signature itself has not been compromised. This is also done in step 10. Simultaneously to this stage, the image is sent to the signature generation stage 11 (whose operation has already been explained above) to obtain the signature from the content of document 8.
Ambas firmas se comparan en la etapa de certificación 12. Cuando el documento ha sido modificado de forma maliciosa, las firmas serán diferentes. En ese caso, el certificador 12 responde indicando que el documento ha sido alterado. En caso contrario, se emite un certificado de integridad 13, indicando que el documento es fiel al original. El método para generar la firma constituye la novedad de la invención. Lo novedoso del método es que, mediante algoritmos de visión computacional, se analiza el contenido textual del documento impreso para generar una firma sin usar un reconocedor de caracteres (OCR). Es decir, toda la información necesaria para verificar la integridad del documento se encuentra en el documento en sí mismo, y la firma puede ser tan pequeña como un número de 16 bits. En vez de reconocer caracteres individuales, la forma y posición de las palabras en el texto se utilizan como firma para asegurar que el contenido no haya sido modificado. Ningún otro sistema actual puede firmar documentos de esta forma sin utilizar un OCR. Por otro lado, los sistemas que generan firmas digitales basados en wavelets tienen el inconveniente de que la firma generada suele ser de un tamaño relativamente grande. Este tamaño dificulta la inclusión de la firma en los espacios en blanco del documento original (normalmente sólo los márgenes del texto). La ventaja de nuestro método con respecto a los métodos basados en wavelets es que la firma puede reducirse a un tamaño de 16 bits, facilitando su inclusión en los espacios en blanco de cualquier documento de texto impreso. Además de esto, los métodos basados en wavelets no suelen soportar las alteraciones que preservan el contenido del documento, tales como el cambio de resolución y la traslación o rotación de la imagen. Nuestro invento funciona a pesar de dichos cambios. De forma adicional a todo lo que se ha mencionado, nuestro invento no necesita un conjunto de diccionarios de palabras. No requiere tener el documento original en formato electrónico. No precisa el uso de materiales especiales en el papel o anexos a éste. No necesita el uso de plásticos de colores para visualizar la firma. No requiere el empleo de lásers para iluminar el documento y detectar la firma. Finalmente, el método puede emplearse con textos en cualquier lenguaje. Both signatures are compared in the certification stage 12. When the document has been maliciously modified, the signatures will be different. In that case, the certifier 12 responds by indicating that the document has been altered. Otherwise, an integrity certificate 13 is issued, indicating that the document is true to the original. The method of generating the signature constitutes the novelty of the invention. The novelty of the method is that, through computational vision algorithms, the textual content of the printed document is analyzed to generate a signature without using a character recognizer (OCR). That is, all the information necessary to verify the integrity of the document is found in the document itself, and the signature can be as small as a 16-bit number. Instead of recognizing individual characters, the form and position of the words in the text are used as a signature to ensure that the content has not been modified. No other current system can sign documents in this way without using an OCR. On the other hand, the systems that generate digital signatures based on wavelets have the disadvantage that the generated signature is usually of a relatively large size. This size makes it difficult to include the signature in the blanks of the original document (usually only the margins of the text). The advantage of our method with respect to wavelet-based methods is that the signature can be reduced to a size of 16 bits, facilitating its inclusion in the blanks of any printed text document. In addition to this, wavelet-based methods do not usually support alterations that preserve the content of the document, such as the resolution change and the translation or rotation of the image. Our invention works despite these changes. In addition to everything that has been mentioned, our invention does not need a set of word dictionaries. It does not require having the original document in electronic format. It does not require the use of special materials on paper or annexes to it. You do not need the use of colored plastics to visualize the signature. It does not require the use of lasers to illuminate the document and detect the signature. Finally, the method can be used with texts in any language.

Claims

ReivindicacionesClaims
1 Un método para generar una firma impresa, que comprende las etapas de binarización, eliminación de ruido, alineación horizontal, segmentación de palabras para generar una lista de palabras por renglón donde cada palabra tiene asociada una coordenada única, que se caracteriza por las etapas de: a) Identificación de los trazos de las palabras. En esta etapa, se utiliza la votación tensorial para obtener la orientación local a cada píxel. Después, las orientaciones locales se agrupan mediante una búsqueda en profundidad de acuerdo a la afinidad de su ángulo. Esto se hace para identificar las orientaciones más representativas de cada palabra y sus posiciones relativas dentro de la misma. Esta etapa produce una lista de trazos de cada palabra. b) Alineación de trazos. Utilizando los trazos más representativos de cada palabra, se busca alinear los trazos de cada palabra con el resto del documento. Aquellas posiciones donde esta alineación es posible marcan las repeticiones de dicha palabra. Las posiciones donde cada palabra se repite son almacenadas en una lista de repeticiones. Esta etapa se implementa mediante una versión modificada del algoritmo ICP que sólo utiliza traslaciones para alinear los trazos. c) Combinación y encriptación. En esta etapa la lista de palabras por renglón, la lista de repeticiones, y la lista de los trazos de cada palabra se combinan para crear la firma. La firma se puede constituir de la simple anexión de estas listas, de la anexión de un par de ellas, o de solamente una de estas listas. También es posible intercalar sistemáticamente partes de estas tres listas para generar la firma, según las indicaciones dadas por el usuario o según las características de cada documento. d) Etapa de anexión e impresión de firma. En esta etapa, la firma obtenida con este método se convierte en un código de barras bidimensional mediante técnicas conocidas y se imprime en el documento original utilizando una impresora o un dispositivo semejante.1 A method to generate a printed signature, which includes the stages of binarization, noise elimination, horizontal alignment, word segmentation to generate a list of words by line where each word has a unique coordinate associated, characterized by the stages of : a) Identification of the word strokes. At this stage, tensor voting is used to obtain the local orientation of each pixel. Then, the local orientations are grouped by an in-depth search according to the affinity of their angle. This is done to identify the most representative orientations of each word and their relative positions within it. This stage produces a list of strokes of each word. b) Line alignment. Using the most representative strokes of each word, we seek to align the strokes of each word with the rest of the document. Those positions where this alignment is possible mark the repetitions of that word. The positions where each word is repeated are stored in a repetition list. This stage is implemented by a modified version of the ICP algorithm that only uses translations to align the strokes. c) Combination and encryption. At this stage the list of words by line, the list of repetitions, and the list of the strokes of each word are combined to create the signature. The signature can be constituted by the simple annexation of these lists, the annexation of a couple of them, or only one of these lists. It is also possible to systematically insert parts of these three lists to generate the signature, according to the indications given by the user or according to the characteristics of each document. d) Signature annexation and printing stage. At this stage, the signature obtained with this method is converted into a two-dimensional barcode by known techniques and is printed on the original document using a printer or similar device.
2 Un método para generar una firma impresa según la reivindicación 1 caracterizado por que la identificación de trazos se hace mediante la transformada de Hough, contornos activos (snakes en ingles), Transformada de Fourier, análisis de gradientes, representación tensorial, números de onda locales y fase, la transformada de Hilbert, el filtro de Hilbert, filtros de cuadratura, filtros de Gabor, y las variantes de éstos, o algún otro método similar que permita obtener la orientación local de una imagen. 3 Un método para generar una firma impresa según las reivindicaciones 1 y 2 caracterizado por que la alineación de trazos se hace mediante búsqueda exhaustiva, Chamfer Matching, contornos activos (snakes) o algún algoritmo similar.2 A method for generating a printed signature according to claim 1 characterized in that the identification of strokes is done by Hough transform, active contours (snakes in English), Fourier transform, gradient analysis, tensor representation, local wave numbers and phase, the Hilbert transform, the Hilbert filter, quadrature filters, Gabor filters, and the variants thereof, or some other similar method that allows obtaining the local orientation of an image. A method for generating a printed signature according to claims 1 and 2 characterized in that the alignment of strokes is done by exhaustive search, Chamfer Matching, active contours (snakes) or some similar algorithm.
4 Un método para generar una firma impresa según las reivindicaciones 1, 2 y 3 caracterizado por que la firma se protege adicionalmente mediante la encriptación con la llave pública del emisor del documento.4 A method for generating a printed signature according to claims 1, 2 and 3 characterized in that the signature is further protected by encryption with the public key of the issuer of the document.
5 Un método para generar una firma impresa según las reivindicaciones 1, 2, 3 y 4 caracterizado por que el tamaño de la firma se reduce utilizando algoritmos de compresión o redundancia cíclica (CRC). A method for generating a printed signature according to claims 1, 2, 3 and 4 characterized in that the size of the signature is reduced using compression algorithms or cyclic redundancy (CRC).
PCT/MX2005/000089 2005-10-04 2005-10-04 Method of generating a printed signature in order to secure the contents of text documents WO2007040380A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/MX2005/000089 WO2007040380A1 (en) 2005-10-04 2005-10-04 Method of generating a printed signature in order to secure the contents of text documents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/MX2005/000089 WO2007040380A1 (en) 2005-10-04 2005-10-04 Method of generating a printed signature in order to secure the contents of text documents

Publications (1)

Publication Number Publication Date
WO2007040380A1 true WO2007040380A1 (en) 2007-04-12

Family

ID=37906381

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MX2005/000089 WO2007040380A1 (en) 2005-10-04 2005-10-04 Method of generating a printed signature in order to secure the contents of text documents

Country Status (1)

Country Link
WO (1) WO2007040380A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3200255A2 (en) 2016-01-06 2017-08-02 Konica Minolta, Inc. Organic electroluminescent element, method for producing organic electroluminescent element, display, and lighting device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4641347A (en) * 1983-07-18 1987-02-03 Pitney Bowes Inc. System for printing encrypted messages with a character generator and bar-code representation
EP0702322A2 (en) * 1994-09-12 1996-03-20 Adobe Systems Inc. Method and apparatus for identifying words described in a portable electronic document
US6279828B1 (en) * 1999-03-01 2001-08-28 Shawwen Fann One dimensional bar coding for multibyte character
US20010047476A1 (en) * 2000-05-25 2001-11-29 Jonathan Yen Authenticatable graphical bar codes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4641347A (en) * 1983-07-18 1987-02-03 Pitney Bowes Inc. System for printing encrypted messages with a character generator and bar-code representation
EP0702322A2 (en) * 1994-09-12 1996-03-20 Adobe Systems Inc. Method and apparatus for identifying words described in a portable electronic document
US6279828B1 (en) * 1999-03-01 2001-08-28 Shawwen Fann One dimensional bar coding for multibyte character
US20010047476A1 (en) * 2000-05-25 2001-11-29 Jonathan Yen Authenticatable graphical bar codes

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3200255A2 (en) 2016-01-06 2017-08-02 Konica Minolta, Inc. Organic electroluminescent element, method for producing organic electroluminescent element, display, and lighting device

Similar Documents

Publication Publication Date Title
US9594993B2 (en) Two dimensional barcode and method of authentication of such barcode
ES2758791T3 (en) Document assurance procedure and device
KR100405828B1 (en) Apparatus and method for producing a document which is capable of preventing a forgery or an alteration of itself, and apparatus and method for authenticating the document
US8190901B2 (en) Layered security in digital watermarking
CN110073368B (en) Method for authenticating an illustration
ES2802448T3 (en) Device and process to protect a digital document, and corresponding process to verify the authenticity of a hard copy
KR20110028311A (en) Method and device for identifying a printing plate for a document
RU2458395C2 (en) Methods and apparatus for ensuring integrity and authenticity of documents
WO2009149408A2 (en) Method, system, and computer-accessible medium for authentication of paper using a speckle pattern
HUE026760T2 (en) Secure item identification and authentication system and method based on unclonable features
EP2048867A1 (en) Method and system for generation and verification of a digital seal on an analog document
Tan et al. Print-Scan Resilient Text Image Watermarking Based on Stroke Direction Modulation for Chinese Document Authentication.
Richter et al. Forensic analysis and anonymisation of printed documents
Picard et al. Towards fraud-proof id documents using multiple data hiding technologies and biometrics
US20050206158A1 (en) Certificate issuing method and certificate verifying method
WO2015073830A1 (en) System and method for printing a hidden and secure barcode
Noore et al. Embedding biometric identifiers in 2D barcodes for improved security
Eskenazi et al. When document security brings new challenges to document analysis
EP2812848B1 (en) Forensic verification utilizing forensic markings inside halftones
US20080164328A1 (en) Tamper detection of documents using encoded dots
WO2007040380A1 (en) Method of generating a printed signature in order to secure the contents of text documents
Mayer et al. Fundamentals and applications of hardcopy communication
US9361516B2 (en) Forensic verification utilizing halftone boundaries
RU2446464C2 (en) Method and system for embedding and extracting hidden data in printed documents
Sale et al. Graduation certificate verification model: a preliminary study

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC OF 220708

122 Ep: pct application non-entry in european phase

Ref document number: 05803573

Country of ref document: EP

Kind code of ref document: A1