US20020191847A1 - Portable text capturing method and device therefor - Google Patents
Portable text capturing method and device therefor Download PDFInfo
- Publication number
- US20020191847A1 US20020191847A1 US10/214,291 US21429102A US2002191847A1 US 20020191847 A1 US20020191847 A1 US 20020191847A1 US 21429102 A US21429102 A US 21429102A US 2002191847 A1 US2002191847 A1 US 2002191847A1
- Authority
- US
- United States
- Prior art keywords
- image
- viewfinder
- imaging device
- portable imaging
- user input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims description 22
- 238000003384 imaging method Methods 0.000 claims abstract description 66
- 239000000284 extract Substances 0.000 claims 4
- 238000001454 recorded image Methods 0.000 abstract description 9
- 230000006870 function Effects 0.000 abstract description 4
- 230000002452 interceptive effect Effects 0.000 abstract description 3
- 230000011218 segmentation Effects 0.000 abstract description 3
- 238000012545 processing Methods 0.000 description 28
- 238000012015 optical character recognition Methods 0.000 description 20
- 238000012805 post-processing Methods 0.000 description 6
- 238000013519 translation Methods 0.000 description 6
- 230000007704 transition Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 2
- 230000010339 dilation Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000994 depressogenic effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000008014 freezing Effects 0.000 description 1
- 238000007710 freezing Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000003134 recirculating effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/142—Image acquisition using hand-held instruments; Constructional details of the instruments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/1444—Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- the present invention relates generally to a digital camera, and more particularly, to a system integral with the digital camera for identifying, translating, and recording text in images.
- Performing a pre-scan pass and then rescanning an image to record document content with a handheld imaging device such as a digital camera is not practical.
- a problem encountered when using digital image cameras to record textual content, in for example documents is that digital image cameras generally do not have a high enough resolution to guarantee that the textual and formatting content in the recorded bitmap image will be properly detected by a post-processing application.
- Some digital cameras attempt to solve this problem by including a text-mode feature that is adapted to sharpen text features in a recorded image. Examples of digital cameras with a text-mode feature are the Power Shot 600 digital camera by Canon and the RDC-2E digital camera by Ricoh.
- the recorded images may not be of sufficient resolution for post-processing applications such as TextBridge® to identify textual and other formatting content in a recorded image. Consequently, it is not until an image has been recorded using a digital camera and downloaded to a post-processing device such as computer that it is known whether the recorded image can be properly analyzed to identify textual and formatting content in the image.
- a post-processing device such as computer that it is known whether the recorded image can be properly analyzed to identify textual and formatting content in the image.
- the identifying information must be remembered and input at the time the image is post-processed.
- a digital imaging device that overcomes these and other problems of recording digital images that consist of textual and formatting content.
- a digital camera that alerts a user when it is not likely that the digital camera is capable of recording an image with sufficient resolution to evaluate the recorded image for textual and formatting content. It would also be advantageous if such an improved digital camera provided a user with the ability to identify and preview those regions of the recorded image that contain textual data. It would be further advantageous if such a digital camera provided translation of detected textual data from one language to another.
- a method and a portable imaging device therefor for capturing text Initially, an image recorded with an imaging unit is displayed on a viewfinder of the portable imaging device. A first user input is received from a shutter release button. The first user input is adjusted using a pointing device for identifying a first position within the displayed image on the viewfinder. In response to the first user input, the image displayed on the viewfinder is recorded in a memory of the portable imaging unit. In addition, a second user input is received from the shutter release button. The second user input is also adjusted using the pointing device for identifying a second position within the displayed image on the viewfinder. Finally, an image segment is extracted from the image stored in the memory using the first position and the second position and examined to identify textual content.
- an error rate for the textual content identified in the image segment is determined.
- a warning indicator is displayed on the viewfinder when the estimated error rate exceeds a threshold value. The purpose of the warning indicator is to alert the user of the portable imaging device when a recorded image cannot be accurately post-processed for the identification of textual or other formatting content.
- textual content is translated from one language to another. In one embodiment, the language from which to translate from is determined using a GPS system.
- FIG. 1 illustrates a perspective view of a portable imaging device according to one embodiment of the invention
- FIG. 2 is a schematic block diagram of the internal hardware of the device of FIG. 1;
- FIG. 3 schematically illustrates the sequence of steps for operating the portable imaging device shown in FIG. 1 in accordance with the present invention
- FIG. 4 shows the processing steps for implementing the INITIALIZE routine referenced in FIG. 3;
- FIG. 5 shows the processing steps for implementing the REPOSITION routine referenced in FIG. 3;
- FIG. 6 shows the processing steps for implementing the CAPTURE routine referenced in FIG. 3;
- FIG. 7 shows the processing steps for implementing the REMOVE SKEW routine referenced in FIG. 6;
- FIG. 8 shows the processing steps for implementing the FIND MARGINS routine referenced in FIG. 6;
- FIG. 9 illustrates an example of dilation of two lines of text in an image
- FIG. 10 illustrates an example of one manner of computing the distance between two points using a seed point
- FIG. 11 shows the processing steps for implementing the FIND TEXT OBJECTS routine referenced in FIG. 6;
- FIG. 12 shows the processing steps for implementing the UPDATE routine referenced in FIG. 3;
- FIG. 13 shows an example of an image displayed in the viewfinder after performing the UPDATE routine set forth FIG. 12;
- FIG. 14 shows the processing steps for implementing the OCR routine referenced in FIG. 3;
- FIG. 15 shows the processing steps for implementing the STORE routine referenced in FIG. 3;
- FIG. 16 shows the processing steps for implementing the DISPLAY TEXT routine referenced in FIG. 3;
- FIG. 17 shows the image displayed in the viewfinder with text overlaid on the original image after performing the DISPLAY TEXT routine in FIG. 16;
- FIG. 18 shows the processing steps for implementing the SCROLL routine referenced in FIG. 3;
- FIGS. 19 to 22 illustrate an example of an image displayed in the viewfinder while performing a single word selection routine in accordance with one embodiment of the invention.
- FIG. 1 illustrates a perspective view of a portable imaging device 2 according to one embodiment of the invention.
- the portable imaging device 2 includes a viewfinder or display 4 , a shutter release button 6 , an imaging unit 8 , and a pointing device 10 .
- the viewfinder 4 is a flat panel display, such as a conventional LCD (Liquid Crystal Display) panel.
- the shutter release button 6 has two user-selectable positions (e.g., a half-press position and a full-press position) and operates in accordance with conventional camera technology.
- the imaging unit 8 includes a lens and an image array and digitization circuit. Part of the image array and digitization circuit is a two-dimensional CCD (Charged Coupled Device) array. In operation, images are focuses onto the two-dimensional CCD array by the lens and output from the CCD array for display on viewfinder 4 .
- CCD Charged Device
- a user identifies graphical features, such as text, captured by the imaging unit 8 and displayed on the viewfinder 4 with the pointing device 10 .
- the pointing device 10 allows a user of the portable imaging device 2 to move cursor crosshairs (i.e., pointer) displayed on the viewfinder 4 (see, for example, U.S. Pat. Nos. 5,489,900; 5,708,562; or 5,694,123).
- the pointing device 10 is a pointing stick, such as the TrackPoint® developed by IBM Corporation.
- the pointing device 10 is a touchpad or a trackball, or the combination of a pointing stick, a touchpad, or a trackball.
- FIG. 2 is a schematic block diagram of the internal hardware of the portable imaging device 2 illustrated in FIG. 1.
- a CPU central processing unit
- a speaker 30 a speaker
- a GPS (Global Positioning System) 23 a GPS (Global Positioning System) 23
- memory 25 e.g., ROM and/or RAM
- output port 31 are coupled to a common bus 27 .
- the image array and digitization circuit in the imaging unit 8 generate digital images and supply digital image data to bus 27 via interface (I/F) 28 a . Digital images are output for display on the viewfinder 4 from bus 27 via display driver 24 .
- I/F interface
- the user operable devices i.e., pointing device 10 and shutter release button 6
- the user operable devices are also coupled to bus 27 for providing user inputs for processing by the CPU 21 via suitable interfaces 28 c and 28 d .
- CPU 21 is adapted to output image data, text data, and audio data recorded in memory 25 to output port 31 or speaker 30 via interfaces 28 f and 28 b, respectively.
- FIG. 3 schematically illustrates the sequence of steps for operating the portable imaging device 2 in accordance with the present invention.
- the operating mode of the portable imaging device is set to one of an image mode, a text mode, or an image-plus-text mode.
- a translation mode is set to either a no-translate mode, an auto-translate mode, or a select-language mode.
- the portable imaging device 2 defaults to the no-translate mode when the operating mode is set to image mode.
- stepping through a menu displayed on viewfinder 4 enables a user to set these modes of operation and translation.
- the portable imaging device could include individual operation and translation mode switches (not shown) for enabling a user to set these modes.
- the pointing device 10 is disabled.
- the sequence of operations set forth in FIG. 3 includes four state transitions (i.e., one (1), two (2), three (3), four (4)) and eight state transition routines (five between states: INITIALIZE, CAPTURE, OCR (Optical Character Recognition), STORE, and DISPLAY TEXT; and three within a state: REPOSITION, UPDATE, and SCROLL).
- state transition routines five between states: INITIALIZE, CAPTURE, OCR (Optical Character Recognition), STORE, and DISPLAY TEXT; and three within a state: REPOSITION, UPDATE, and SCROLL.
- the steps of the REPOSITION, UPDATE, OCR, DISPLAY TEXT, and SCROLL routines are not performed when the portable imaging device is set to image mode.
- FIG. 4 shows the processing steps for performing the INITIALIZE routine referenced in FIG. 3.
- the INITIALIZE routine includes the step of setting (step s 2 ) the viewfinder 4 to update continuously from the imaging array (e.g., live video). If the portable imaging device 2 is in image mode (step s 3 ) the INITIALIZE routine terminates; otherwise, the cursor crosshairs are positioned at the center of the viewfinder 4 (step s 4 ).
- the position of the cursor crosshairs is indicated to the user in the viewfinder 4 by the intersection of two lines.
- FIG. 13 illustrates an example of a pair of crosshairs 90 and 92 displayed on viewfinder 4 .
- One cross-hair 90 is vertical and extends the entire depth of the viewfinder 4 ; the other cross-hair 92 is horizontal and extends the entire width of the viewfinder 4 .
- the cursor crosshairs can be implemented using any number different pointers known in the art for identifying objects displayed on the viewfinder 4 .
- FIG. 5 sets forth the processing steps for implementing the REPOSITION routine referenced in FIG. 3. Initially, if the portable imaging device 2 is in image mode (step s 5 ), the routine terminates; otherwise (step s 6 ), the X,Y coordinates that identify movement of the cursor cross-hair position on the viewfinder 4 are recorded as current X,Y coordinates. Subsequently after performing step s 6 , the current X,Y coordinates defined by the user's movement of the pointing device 10 are used to redraw the cursor crosshairs (step s 8 ) on the viewfinder 4 .
- FIG. 6 shows the processing steps for implementing the CAPTURE routine referenced in FIG. 3. Initially (step s 10 ), the contents of the imaging array in the imaging unit 8 are transferred to a location identified as “image store” in the memory 25 . Subsequently, the content of the image store are displayed on the viewfinder 4 . Step s 11 , effectively freezes the image on the viewfinder 4 for further operations by the user. If the portable imaging device 2 is in image mode (step s 12 ), the routine terminates; otherwise, the routine continues at step s 14 .
- step s 14 the cursor crosshairs are superimposed on the image displayed on the viewfinder 4 at the current X,Y coordinates.
- the current cursor crosshairs X,Y coordinates are stored (step s 15 ) in Start-X and Start-Y registers located in the memory 25 . Because it is likely that the user has not been able to perfectly align the field of view of the device 2 with the text to be captured, skew is removed at step s 16 .
- a skew angle of the field of view must be determined.
- a skew angle of the field of view may be determined and removed as described in U.S. patent application Ser. No. 09/081,266, which is hereby incorporated by reference. More specifically, FIG. 7 shows the processing steps for implementing the REMOVE SKEW step s 16 referenced in FIG. 6. Initially (step s 140 ), the contents of image store in the memory 25 are copied into a location in memory 25 identified as “deskewed store”.
- step s 142 for a range of possible skew angles (e.g., ⁇ 5 to +5° in steps of 0.1°), and using the image stored in deskewed store, there are performed the steps of: rotating the image; summing the pixel values on each scanline; and calculating the variance in pixel value sums.
- a SkewAngle is then identified as the angle that gives rise to the greatest variance.
- the next step is for the contents of image store to be copied into deskewed store (step s 144 ). Then, the contents of the deskewed store are rotated (step s 146 ) by a negative value of SkewAngle, where SkewAngle is the angle determined at step s 142 .
- step s 148 a rotation operation by a negative value of SkewAngle is performed on coordinates Start-X and Start-Y, and the results stored in Deskewed-Start-X and Deskewed-Start-Y registers located in the memory 25 for further use.
- FIG. 8 shows the processing steps for implementing the FIND MARGINS routine referenced in FIG. 6.
- the FIND MARGINS routine the columns of white space to the left and to the right of the text are found.
- the image in deskewed store is dilated (step s 160 ) in order to merge adjacent lines of text.
- An example of the dilation of two lines of text is illustrated in FIG. 9.
- a seed point in the text is found (step s 162 ).
- step s 164 operations to find the left margin are performed: using the seed point obtained in step s 162 , a step is made to the left and the distance to the nearest black pixel up and down is determined.
- FIG. 10 illustrates an example of one manner of computing the distance between two points using a seed point. If the distance “h” between the pixels exceeds h min , this is treated as a margin and the stepping halts; otherwise, a further step left is made.
- the next step (s 166 ) is a repetition of the procedure in step s 164 , but for the right margin.
- the margin positions are then set (step s 168 ) as the limits of a horizontal scan performed by FIND TEXT OBJECTS routine of step s 19 .
- FIG. 11 shows the processing steps for implementing the FIND TEXT OBJECTS routine of FIG. 6.
- the procedure commences (step s 180 ) by building a list of connected components in deskewed store within the margins determined in step s 17 .
- Text-line lists are then built (step s 182 ) from connected components overlapping each other in the Y direction; and a histogram of gaps between components in the text-line lists is then constructed (step s 184 ).
- step s 186 The next step is to derive (step s 186 ) the width of inter-character and inter-word spaces from the histogram peaks, the details of which are set forth in U.S. patent application Ser. No. 09/081,266. Then, words are formed from sets of components delimited by inter-word-sized spaces (step s 188 ). From this, a list of bounding boxes for words on each line is built (step s 190 ). In an alternate embodiment, step s 18 is performed instead of steps s 16 , s 17 and s 19 in FIG. 6. At step s 18 , an OCR application such as TextBridge® is invoked to locate positions of margins and bounding boxes of text objects in the image stored in deskewed store.
- OCR application such as TextBridge® is invoked to locate positions of margins and bounding boxes of text objects in the image stored in deskewed store.
- a user of the portable imaging device is warned before recording the image in image store by fully-pressing the shutter release button that it is likely that the OCR application will produce inaccurate results.
- This enables the user to perform corrective action (e.g., improving the light on the object being recorded) to improve the performance of the OCR application before recording the desired image.
- an error rate estimate is computed (step s 20 ) to determine whether to warn the user of potential OCR inaccuracies.
- the error rate estimate is computed by measuring the blur and/or noise in the text objects located at steps s 18 or s 19 .
- the blur of an image can be measure using a technique as disclosed by Lündijk et al., in “Maximum Likelihood Image and Blur Identification: A Unifying Approach,” Optical Engineering, May 1990, pp. 422-435, which is incorporated herein by reference.
- the noise can be measured using a technique as disclosed by Galatsanos et al., in “Methods for Choosing the Regularization Parameter and Estimating the Noise Variance in Image Restoration and Their Relation,” IEEE Trans. on Image Processing, July 1992, pp. 322-336, which is incorporate herein by reference.
- the error rate estimate can be supplemented by measuring the contrast and the text size of text objects located at step s 19 .
- the contrast of text objects can be measured from a histogram of windowed variance.
- a histogram of windowed variance can be generated by computing the variance of windows of pixels (e.g., between 7 ⁇ 7 and 20 ⁇ 20 pixels) in a captured image. Subsequently, a threshold value is computed from this histogram. The threshold value is chosen to discriminate between high and low variance.
- One method for determining a suitable threshold value between high and low variance is the Otsu thresholding method, which is disclosed by Trier et al., in “Goal-Directed Evaluation Of Binarization Methods,” IEEE Transactions On Pattern Analysis and Machine Intelligence, Vol. 17, No. 12, pp. 1191-1201, 1995, which is incorporated herein by reference.
- the ratio of the mean variance of windows identified as having a high variance to the mean variance of the windows identified as having a low variance is computed. This ratio provides an approximate signal to noise ratio that can then be used as an estimate of image contrast.
- an approximate value for text size can be found during de-skewing when there are several lines of text. For example, this can be done by computing the average distance, in pixels, between peaks in the pixel value sums (i.e., the sum of the pixel values on each scanline), to gain the line-to-line distance in pixels. Because of inter-line gaps, text size will typically be slightly less than this distance. It will be appreciated by those skilled in the art that there exist other methods for establishing what the value of “slightly less” should be. If the error rate estimate measured at step s 20 exceeds a predetermined threshold value (step s 22 ), then a warning indicator is displayed on viewfinder 4 (step s 24 ). The warning indicator displayed on viewfinder 4 at step s 24 is a text message, an error symbol, or a warning light. Alternatively, the warning indicator is an audible signal output through speaker 30 .
- FIG. 12 shows the processing steps for implementing the UPDATE routine referenced in FIG. 3. If the portable imaging device 2 is in image mode (s 100 ), the routine terminates; otherwise, the routine continues at step s 101 . First, the X,Y coordinates (i.e., the current coordinates) of the pointing device 10 are read (step s 101 ); and the cursor crosshairs are redrawn (step s 102 ) at the current X,Y coordinates.
- step s 104 the current X,Y coordinates are rotated (step s 104 ) by the negative value of SkewAngle, and the results stored in Deskewed-Current-X and Deskewed-Current-Y.
- step s 106 the word bounding box containing Deskewed-Start-X and Deskewed Start-Y are located and stored in Start-Word (step s 106 ).
- step s 108 the word bounding box containing Deskewed-Start-X and Deskewed Start-Y are located and stored in Start-Word
- step s 108 the word bounding box containing Deskewed-Start-X and Deskewed Start-Y are located and stored in Start-Word (step s 106 ).
- step s 108 the word bounding box containing Deskewed-Start-X and Deskewed Start-Y are located and stored in Start-Word (step s 106 ).
- step s 108 images of the text are displayed (step s 110 ) in which all words
- FIG. 14 shows the processing steps for implementing the OCR routine referenced in FIG. 3. If the portable imaging device 2 is in image mode (s 120 ), the routine terminates; otherwise, it continues at step s 121 . Initially (step s 121 ), the selected region (including the text matter the user wishes to convert) is copied from deskewed store into “text store” in the memory 25 for subsequent processing. Subsequently, the image in text store is thresholded (step s 122 ) to generate a binary image, using techniques known in the art, as disclosed for example in U.S. patent application Ser. No. 09/081,269, which is hereby incorporated by reference. As disclosed therein, such conversion may include resolution enhancement.
- the resulting binary image is then passed to an OCR application, such as TextBridge®, to convert the binary image to (ASCII) text for further use and/or manipulation (step s 124 ). If in auto-translate mode or select-language mode (step s 126 ), the text output from the OCR application is translated (step s 128 ). The text identified at step s 124 , whether translated at step s 126 or not, is stored in text store (step s 129 ). Referring again to FIG. 3, once the OCR routine completes the system transitions to state three (3).
- OCR application such as TextBridge®
- FIG. 15 shows the processing steps for implementing the STORE routine of FIG. 3.
- step s 130 if the portable imaging device 2 is in image mode the routine jumps to step s 133 ; otherwise, step s 131 is executed.
- step s 131 the OCRed (ASCII) text stored in text store is copied to a location of the memory 25 identified as “text buffer”, for later readout (e.g., through uploading to the user's PC).
- step s 133 if the portable imaging device is set to image-plus-text mode, step s 133 is performed; otherwise, the routine terminates.
- step s 133 the image contents of image store is copied to a location in the memory 25 identified as “image buffer” for later readout, for example, to the user's computer coupled to output port 31 .
- FIG. 16 shows the processing steps for implementing the DISPLAY TEXT routine referenced in FIG. 3. If the portable imaging device 2 is in image mode (s 150 ), the routine terminates; otherwise, the routine continues at step s 151 .
- the text displayed, which corresponds to the content of the text store, is first merged (step s 151 ) into the image displayed on the viewfinder 4 .
- FIG. 17 illustrates one manner in which to present the results in text store to the user on the viewfinder 4 . As illustrated in FIG. 17, the results stored in the text store are overlaid on the original image stored in the image store.
- FIG. 18 shows the processing steps for implementing the SCROLL routine referenced in FIG. 3. If the portable imaging device 2 is in image mode (s 170 ), the routine terminates; otherwise, step s 171 is performed. Initially, a test is made (step s 171 ) to determine whether the cursor crosshairs position has moved up or down from its prior position. If there is movement, the Start-line is incremented or decremented, accordingly (step s 172 ). Using the new Start-line, the text is merged from the text store for display on the viewfinder 4 (step s 174 ).
- FIGS. 19 to 22 illustrate the images displayed in the viewfinder after executing a single word selection routine in accordance with one embodiment of the invention.
- the single word selection routine uses the pointing device 10 or shutter release button 6 to emulate a double mouse button click on a conventional computer (preceded if necessary by a suitable, i.e. Single-Word, mode selection by the user).
- a suitable, i.e. Single-Word, mode selection by the user may be selected and converted.
- the crosshairs coincident with the word (“FERFI”, which is Hungarian for “MEN” in FIG. 20)
- the double click selects the word, and the shutter release being held down until a time out (e.g., a second or two), causes an image of the word to be captured and OCRed.
- the resulting (ASCII) version of the word can then be used for subsequent processing.
- the word after OCRing is displayed in the top left corner of the viewfinder 4 , as shown in FIG. 21.
- the portable imaging device 2 when the portable imaging device 2 is in translate mode, which is available when in text mode or image-plus-text mode, the portable imaging device 2 translates the word after OCRing into a desired language and displayed in the top left corner of the viewfinder 4 , as illustrated in FIG. 22.
- the language to translate to is specified by the user on a menu displayed on viewfinder 4 .
- an additional function can be added to aid blind or partially sighted people by synthesizing speech from OCRed text displayed in viewfinder 4 .
- audio data stored in memory 25 representing the word identified after OCRing or translation into a desired language is output through speaker 30 .
- this function is combined with a rangefinder to determine the distance of the recorded text from the user, and to generate speech that combines both pieces (i.e., text and distance) of information. For example, an object in an image captured at 50; feet and OCRed as “bus station” could be combined and output through speaker 30 as “50 feet from the bus station.”
- the present invention involves integrating a finger-operated pointing device to a portable imaging device, which includes interactive segmentation (using the camera viewfinder for feedback) and OCR applications.
- the digital imaging device in which images containing textual and formatting constructs are captured by an imaging unit and displayed by a display device.
- the integrated user operable pointing device allows a user of the portable imaging device to determine whether textual and formatting content in the image can be properly analyzed by OCR applications.
- the portable imaging device is operated by performing the steps of: (a) displaying successive images captured by the imaging unit on the display device, each image being defined by grayscale and/or color data, (b) receiving a first user input defining the start of a selection and a first position within the displayed image, (c) in response to the first user input, freezing the displayed image, (d) receiving at least one further user input, including a final user input defining the end of a selection, (e) extracting from the frozen displayed image a selected image having extremities defined by the first and final user inputs, and (f) performing an optical character recognition operation on data defining the selected image to generate text data, the text data defining text corresponding to text matter within the selected image.
- the portable imaging device has the following advantages: 1) text is scanned, OCRed and visually checked on the spot, so that any problems with image quality are discovered at once rather than later when up-loading to a PC; 2) the pointing device allows just the required portion of the document image to be selected and stored, and interactive segmentation allows just the words or paragraphs of interest to be selected; 3) ability to store as text allows many more document pages to be stored locally before up-loading; 4) lengthy documents can be captured with the aid of a recirculating document feeder; and 5) text can be captured off physical objects, e.g., serial numbers of product labels, names off signs or conference badges.
Abstract
Description
- 1. Field of the Invention
- The present invention relates generally to a digital camera, and more particularly, to a system integral with the digital camera for identifying, translating, and recording text in images.
- 2. Description of Related Art
- It is well known to use scanners, such as flatbed scanners, to capture and convert bitmap images of documents to text or structured documents. In some implementations of document scanners, the portion of the bitmap image that contains text is selected during a pre-scan pass of the document. The selected portion of the bitmap image is then re-scanned at a higher resolution and post-processed. The post-processing of the selected portion of the higher resolution image involves the application of selected image processing functions to clean and identify textual and formatting content of the scanned document. An example of a post-processing application is TextBridge® (sold by ScanSoft, Inc.), which is capable of converting scanned images into simple ASCII text documents or formatted documents with tables and pictures.
- Performing a pre-scan pass and then rescanning an image to record document content with a handheld imaging device such as a digital camera, however, is not practical. A problem encountered when using digital image cameras to record textual content, in for example documents, is that digital image cameras generally do not have a high enough resolution to guarantee that the textual and formatting content in the recorded bitmap image will be properly detected by a post-processing application. Some digital cameras attempt to solve this problem by including a text-mode feature that is adapted to sharpen text features in a recorded image. Examples of digital cameras with a text-mode feature are the Power Shot 600 digital camera by Canon and the RDC-2E digital camera by Ricoh.
- However, even with the text-mode feature, the recorded images may not be of sufficient resolution for post-processing applications such as TextBridge® to identify textual and other formatting content in a recorded image. Consequently, it is not until an image has been recorded using a digital camera and downloaded to a post-processing device such as computer that it is known whether the recorded image can be properly analyzed to identify textual and formatting content in the image. In addition, because there is no manner in which to identify the portion of the bitmap image that is of interest for post-processing analysis at the time it is recorded with a digital camera, the identifying information must be remembered and input at the time the image is post-processed.
- Accordingly, it would be advantageous to provide a digital imaging device that overcomes these and other problems of recording digital images that consist of textual and formatting content. In particular, it would be advantageous to provide a digital camera that alerts a user when it is not likely that the digital camera is capable of recording an image with sufficient resolution to evaluate the recorded image for textual and formatting content. It would also be advantageous if such an improved digital camera provided a user with the ability to identify and preview those regions of the recorded image that contain textual data. It would be further advantageous if such a digital camera provided translation of detected textual data from one language to another.
- In accordance with the present invention, there is provided a method and a portable imaging device therefor for capturing text. Initially, an image recorded with an imaging unit is displayed on a viewfinder of the portable imaging device. A first user input is received from a shutter release button. The first user input is adjusted using a pointing device for identifying a first position within the displayed image on the viewfinder. In response to the first user input, the image displayed on the viewfinder is recorded in a memory of the portable imaging unit. In addition, a second user input is received from the shutter release button. The second user input is also adjusted using the pointing device for identifying a second position within the displayed image on the viewfinder. Finally, an image segment is extracted from the image stored in the memory using the first position and the second position and examined to identify textual content.
- In accordance with one aspect of the invention, an error rate for the textual content identified in the image segment is determined. A warning indicator is displayed on the viewfinder when the estimated error rate exceeds a threshold value. The purpose of the warning indicator is to alert the user of the portable imaging device when a recorded image cannot be accurately post-processed for the identification of textual or other formatting content. In accordance with another aspect of the invention, textual content is translated from one language to another. In one embodiment, the language from which to translate from is determined using a GPS system.
- These and other aspects of the invention will become apparent from the following description read in conjunction with the accompanying drawings wherein the same reference numerals have been applied to like parts and in which:
- FIG. 1 illustrates a perspective view of a portable imaging device according to one embodiment of the invention;
- FIG. 2 is a schematic block diagram of the internal hardware of the device of FIG. 1;
- FIG. 3 schematically illustrates the sequence of steps for operating the portable imaging device shown in FIG. 1 in accordance with the present invention;
- FIG. 4 shows the processing steps for implementing the INITIALIZE routine referenced in FIG. 3;
- FIG. 5 shows the processing steps for implementing the REPOSITION routine referenced in FIG. 3;
- FIG. 6 shows the processing steps for implementing the CAPTURE routine referenced in FIG. 3;
- FIG. 7 shows the processing steps for implementing the REMOVE SKEW routine referenced in FIG. 6;
- FIG. 8 shows the processing steps for implementing the FIND MARGINS routine referenced in FIG. 6;
- FIG. 9 illustrates an example of dilation of two lines of text in an image;
- FIG. 10 illustrates an example of one manner of computing the distance between two points using a seed point;
- FIG. 11 shows the processing steps for implementing the FIND TEXT OBJECTS routine referenced in FIG. 6;
- FIG. 12 shows the processing steps for implementing the UPDATE routine referenced in FIG. 3;
- FIG. 13 shows an example of an image displayed in the viewfinder after performing the UPDATE routine set forth FIG. 12;
- FIG. 14 shows the processing steps for implementing the OCR routine referenced in FIG. 3;
- FIG. 15 shows the processing steps for implementing the STORE routine referenced in FIG. 3;
- FIG. 16 shows the processing steps for implementing the DISPLAY TEXT routine referenced in FIG. 3;
- FIG. 17 shows the image displayed in the viewfinder with text overlaid on the original image after performing the DISPLAY TEXT routine in FIG. 16;
- FIG. 18 shows the processing steps for implementing the SCROLL routine referenced in FIG. 3; and
- FIGS.19 to 22 illustrate an example of an image displayed in the viewfinder while performing a single word selection routine in accordance with one embodiment of the invention.
- FIG. 1 illustrates a perspective view of a
portable imaging device 2 according to one embodiment of the invention. Theportable imaging device 2 includes a viewfinder ordisplay 4, ashutter release button 6, animaging unit 8, and apointing device 10. In the embodiment shown in FIG. 1, theviewfinder 4 is a flat panel display, such as a conventional LCD (Liquid Crystal Display) panel. Theshutter release button 6 has two user-selectable positions (e.g., a half-press position and a full-press position) and operates in accordance with conventional camera technology. Theimaging unit 8 includes a lens and an image array and digitization circuit. Part of the image array and digitization circuit is a two-dimensional CCD (Charged Coupled Device) array. In operation, images are focuses onto the two-dimensional CCD array by the lens and output from the CCD array for display onviewfinder 4. - In accordance with one aspect of the invention, a user identifies graphical features, such as text, captured by the
imaging unit 8 and displayed on theviewfinder 4 with thepointing device 10. Thepointing device 10 allows a user of theportable imaging device 2 to move cursor crosshairs (i.e., pointer) displayed on the viewfinder 4 (see, for example, U.S. Pat. Nos. 5,489,900; 5,708,562; or 5,694,123). In the embodiment shown in FIG. 1, thepointing device 10 is a pointing stick, such as the TrackPoint® developed by IBM Corporation. In an alternate embodiment, thepointing device 10 is a touchpad or a trackball, or the combination of a pointing stick, a touchpad, or a trackball. - FIG. 2 is a schematic block diagram of the internal hardware of the
portable imaging device 2 illustrated in FIG. 1. In the embodiment shown in FIG. 2, a CPU (central processing unit) 21, aspeaker 30, a GPS (Global Positioning System) 23, memory 25 (e.g., ROM and/or RAM), andoutput port 31 are coupled to acommon bus 27. The image array and digitization circuit in theimaging unit 8 generate digital images and supply digital image data tobus 27 via interface (I/F) 28 a. Digital images are output for display on theviewfinder 4 frombus 27 viadisplay driver 24. The user operable devices (i.e., pointingdevice 10 and shutter release button 6) are also coupled tobus 27 for providing user inputs for processing by theCPU 21 viasuitable interfaces CPU 21 is adapted to output image data, text data, and audio data recorded inmemory 25 tooutput port 31 orspeaker 30 viainterfaces - FIG. 3 schematically illustrates the sequence of steps for operating the
portable imaging device 2 in accordance with the present invention. Initially the operating mode of the portable imaging device is set to one of an image mode, a text mode, or an image-plus-text mode. Subsequently, a translation mode is set to either a no-translate mode, an auto-translate mode, or a select-language mode. It will be appreciated by those skilled in the art that theportable imaging device 2 defaults to the no-translate mode when the operating mode is set to image mode. In one embodiment, stepping through a menu displayed onviewfinder 4 enables a user to set these modes of operation and translation. Alternatively, the portable imaging device could include individual operation and translation mode switches (not shown) for enabling a user to set these modes. When the portable imaging device is set to image mode, thepointing device 10 is disabled. - Generally, the sequence of operations set forth in FIG. 3 includes four state transitions (i.e., one (1), two (2), three (3), four (4)) and eight state transition routines (five between states: INITIALIZE, CAPTURE, OCR (Optical Character Recognition), STORE, and DISPLAY TEXT; and three within a state: REPOSITION, UPDATE, and SCROLL). As set forth below, the steps of the REPOSITION, UPDATE, OCR, DISPLAY TEXT, and SCROLL routines are not performed when the portable imaging device is set to image mode.
- After setting the operating mode and the translation modes, an INITIALIZE routine is invoked to initialize the sequence of operations for performing image and/or text capture in accordance with the present invention. FIG. 4 shows the processing steps for performing the INITIALIZE routine referenced in FIG. 3. The INITIALIZE routine includes the step of setting (step s2) the
viewfinder 4 to update continuously from the imaging array (e.g., live video). If theportable imaging device 2 is in image mode (step s3) the INITIALIZE routine terminates; otherwise, the cursor crosshairs are positioned at the center of the viewfinder 4 (step s4). In a preferred embodiment, the position of the cursor crosshairs, which is controlled with thepointing device 10, is indicated to the user in theviewfinder 4 by the intersection of two lines. FIG. 13 illustrates an example of a pair ofcrosshairs viewfinder 4. Onecross-hair 90 is vertical and extends the entire depth of theviewfinder 4; theother cross-hair 92 is horizontal and extends the entire width of theviewfinder 4. It will be appreciated that in alternate embodiments the cursor crosshairs can be implemented using any number different pointers known in the art for identifying objects displayed on theviewfinder 4. - As set forth in FIG. 3, any movement of the position of the cursor crosshairs on
viewfinder 4 by the user with thepointing device 10 while in state one (1) and while the shutter is not depressed invokes a REPOSITION routine. FIG. 5 sets forth the processing steps for implementing the REPOSITION routine referenced in FIG. 3. Initially, if theportable imaging device 2 is in image mode (step s5), the routine terminates; otherwise (step s6), the X,Y coordinates that identify movement of the cursor cross-hair position on theviewfinder 4 are recorded as current X,Y coordinates. Subsequently after performing step s6, the current X,Y coordinates defined by the user's movement of thepointing device 10 are used to redraw the cursor crosshairs (step s8) on theviewfinder 4. - Returning again to FIG. 3, when the user half-press the
shutter release button 6 while in state one (1), a CAPTURE routine is invoked. FIG. 6 shows the processing steps for implementing the CAPTURE routine referenced in FIG. 3. Initially (step s10), the contents of the imaging array in theimaging unit 8 are transferred to a location identified as “image store” in thememory 25. Subsequently, the content of the image store are displayed on theviewfinder 4. Step s11, effectively freezes the image on theviewfinder 4 for further operations by the user. If theportable imaging device 2 is in image mode (step s12), the routine terminates; otherwise, the routine continues at step s14. At step s14, the cursor crosshairs are superimposed on the image displayed on theviewfinder 4 at the current X,Y coordinates. Next, the current cursor crosshairs X,Y coordinates are stored (step s15) in Start-X and Start-Y registers located in thememory 25. Because it is likely that the user has not been able to perfectly align the field of view of thedevice 2 with the text to be captured, skew is removed at step s16. - To remove any skew of the image stored in the image store at step s16, a skew angle of the field of view must be determined. Generally, a skew angle of the field of view may be determined and removed as described in U.S. patent application Ser. No. 09/081,266, which is hereby incorporated by reference. More specifically, FIG. 7 shows the processing steps for implementing the REMOVE SKEW step s16 referenced in FIG. 6. Initially (step s140), the contents of image store in the
memory 25 are copied into a location inmemory 25 identified as “deskewed store”. Then (step s142), for a range of possible skew angles (e.g., −5 to +5° in steps of 0.1°), and using the image stored in deskewed store, there are performed the steps of: rotating the image; summing the pixel values on each scanline; and calculating the variance in pixel value sums. A SkewAngle is then identified as the angle that gives rise to the greatest variance. The next step is for the contents of image store to be copied into deskewed store (step s144). Then, the contents of the deskewed store are rotated (step s146) by a negative value of SkewAngle, where SkewAngle is the angle determined at step s142. Finally, a rotation operation (step s148) by a negative value of SkewAngle is performed on coordinates Start-X and Start-Y, and the results stored in Deskewed-Start-X and Deskewed-Start-Y registers located in thememory 25 for further use. - Returning to FIG. 6, it will be seen that the REMOVE SKEW routine is followed by the FIND MARGINS routine (step s17). FIG. 8 shows the processing steps for implementing the FIND MARGINS routine referenced in FIG. 6. In the FIND MARGINS routine, the columns of white space to the left and to the right of the text are found. First, the image in deskewed store is dilated (step s160) in order to merge adjacent lines of text. An example of the dilation of two lines of text is illustrated in FIG. 9. Next, by searching right and down for black pixels a seed point in the text is found (step s162). Then, operations to find the left margin are performed (step s164): using the seed point obtained in step s162, a step is made to the left and the distance to the nearest black pixel up and down is determined. FIG. 10 illustrates an example of one manner of computing the distance between two points using a seed point. If the distance “h” between the pixels exceeds hmin, this is treated as a margin and the stepping halts; otherwise, a further step left is made. The next step (s166) is a repetition of the procedure in step s164, but for the right margin. The margin positions are then set (step s168) as the limits of a horizontal scan performed by FIND TEXT OBJECTS routine of step s19.
- Returning to FIG. 6, it will be seen that the FIND MARGINS routine is followed by the FIND TEXT OBJECTS routine (step s19). FIG. 11 shows the processing steps for implementing the FIND TEXT OBJECTS routine of FIG. 6. In the FIND TEXT OBJECTS routine, bounding boxes for words and text-lines are found. The procedure commences (step s180) by building a list of connected components in deskewed store within the margins determined in step s17. Text-line lists are then built (step s182) from connected components overlapping each other in the Y direction; and a histogram of gaps between components in the text-line lists is then constructed (step s184). The next step is to derive (step s186) the width of inter-character and inter-word spaces from the histogram peaks, the details of which are set forth in U.S. patent application Ser. No. 09/081,266. Then, words are formed from sets of components delimited by inter-word-sized spaces (step s188). From this, a list of bounding boxes for words on each line is built (step s190). In an alternate embodiment, step s18 is performed instead of steps s16, s17 and s19 in FIG. 6. At step s18, an OCR application such as TextBridge® is invoked to locate positions of margins and bounding boxes of text objects in the image stored in deskewed store.
- Referring again to FIG. 6, after finding text objects at step s19, a determination is made as to whether it is likely that the image captured in image store has sufficient quality for an OCR application to accurately identify textual or formatting content therein. In accordance with this aspect of the invention, a user of the portable imaging device is warned before recording the image in image store by fully-pressing the shutter release button that it is likely that the OCR application will produce inaccurate results. This enables the user to perform corrective action (e.g., improving the light on the object being recorded) to improve the performance of the OCR application before recording the desired image. More specifically, an error rate estimate is computed (step s20) to determine whether to warn the user of potential OCR inaccuracies. The error rate estimate is computed by measuring the blur and/or noise in the text objects located at steps s18 or s19. The blur of an image can be measure using a technique as disclosed by Lagendijk et al., in “Maximum Likelihood Image and Blur Identification: A Unifying Approach,” Optical Engineering, May 1990, pp. 422-435, which is incorporated herein by reference. The noise can be measured using a technique as disclosed by Galatsanos et al., in “Methods for Choosing the Regularization Parameter and Estimating the Noise Variance in Image Restoration and Their Relation,” IEEE Trans. on Image Processing, July 1992, pp. 322-336, which is incorporate herein by reference.
- In addition, the error rate estimate can be supplemented by measuring the contrast and the text size of text objects located at step s19. The contrast of text objects can be measured from a histogram of windowed variance. A histogram of windowed variance can be generated by computing the variance of windows of pixels (e.g., between 7×7 and 20×20 pixels) in a captured image. Subsequently, a threshold value is computed from this histogram. The threshold value is chosen to discriminate between high and low variance. One method for determining a suitable threshold value between high and low variance is the Otsu thresholding method, which is disclosed by Trier et al., in “Goal-Directed Evaluation Of Binarization Methods,” IEEE Transactions On Pattern Analysis and Machine Intelligence, Vol. 17, No. 12, pp. 1191-1201, 1995, which is incorporated herein by reference. Finally, the ratio of the mean variance of windows identified as having a high variance to the mean variance of the windows identified as having a low variance is computed. This ratio provides an approximate signal to noise ratio that can then be used as an estimate of image contrast.
- Furthermore at step s19, an approximate value for text size can be found during de-skewing when there are several lines of text. For example, this can be done by computing the average distance, in pixels, between peaks in the pixel value sums (i.e., the sum of the pixel values on each scanline), to gain the line-to-line distance in pixels. Because of inter-line gaps, text size will typically be slightly less than this distance. It will be appreciated by those skilled in the art that there exist other methods for establishing what the value of “slightly less” should be. If the error rate estimate measured at step s20 exceeds a predetermined threshold value (step s22), then a warning indicator is displayed on viewfinder 4 (step s24). The warning indicator displayed on
viewfinder 4 at step s24 is a text message, an error symbol, or a warning light. Alternatively, the warning indicator is an audible signal output throughspeaker 30. - Returning to FIG. 3, it can be seen that once the CAPTURE routine is completed, state two (2) is reached. In state two (2), any movement of the cursor crosshairs position by the user using
pointing device 10 invokes an UPDATE routine. FIG. 12 shows the processing steps for implementing the UPDATE routine referenced in FIG. 3. If theportable imaging device 2 is in image mode (s100), the routine terminates; otherwise, the routine continues at step s101. First, the X,Y coordinates (i.e., the current coordinates) of thepointing device 10 are read (step s101); and the cursor crosshairs are redrawn (step s102) at the current X,Y coordinates. Then, the current X,Y coordinates are rotated (step s104) by the negative value of SkewAngle, and the results stored in Deskewed-Current-X and Deskewed-Current-Y. Next, the word bounding box containing Deskewed-Start-X and Deskewed Start-Y are located and stored in Start-Word (step s106). This step is then repeated (step s108), but using Deskewed-Current-X and Deskewed-Current-Y, and the result stored in Current-Word. To display feedback to the user, images of the text are displayed (step s110) in which all words from Start-Word to Current-Word are highlighted (e.g., reversed out). FIG. 13 shows the image (containing highlighted text) displayed in the viewfinder after performing the UPDATE routine of FIG. 12. - Returning to FIG. 3, it will be seen that while the system is in state two (2), a shutter release operation by the user causes the re-initialization of the system, and a return to state one (1). In contrast, a full-press of the
shutter release button 6 causes a state transition and the execution of the OCR routine on text selected by the user of the portable imaging device with thepointing device 10. However, before performing the steps of the OCR routine, which are set forth in detail in FIG. 14, a determination is made as to whether the cursor crosshairs position has moved since the shutter was half-pressed (i.e., while in state two (2)). If the cursor crosshairs position did not move while in state two (2), then the UPDATE routine set forth in FIG. 12 is invoked before invoking the OCR routine; otherwise, the OCR routine is immediately invoked. - FIG. 14 shows the processing steps for implementing the OCR routine referenced in FIG. 3. If the
portable imaging device 2 is in image mode (s120), the routine terminates; otherwise, it continues at step s121. Initially (step s121), the selected region (including the text matter the user wishes to convert) is copied from deskewed store into “text store” in thememory 25 for subsequent processing. Subsequently, the image in text store is thresholded (step s122) to generate a binary image, using techniques known in the art, as disclosed for example in U.S. patent application Ser. No. 09/081,269, which is hereby incorporated by reference. As disclosed therein, such conversion may include resolution enhancement. The resulting binary image is then passed to an OCR application, such as TextBridge®, to convert the binary image to (ASCII) text for further use and/or manipulation (step s124). If in auto-translate mode or select-language mode (step s126), the text output from the OCR application is translated (step s128). The text identified at step s124, whether translated at step s126 or not, is stored in text store (step s129). Referring again to FIG. 3, once the OCR routine completes the system transitions to state three (3). - As can be seen in FIG. 3, while the system is in states three (3) or four (4), a shutter release operation by the user causes a STORE routine to be invoked, followed by the re-initialization of the system, and a return to state one (1). FIG. 15 shows the processing steps for implementing the STORE routine of FIG. 3. First (step s130), if the
portable imaging device 2 is in image mode the routine jumps to step s133; otherwise, step s131 is executed. At step s131, the OCRed (ASCII) text stored in text store is copied to a location of thememory 25 identified as “text buffer”, for later readout (e.g., through uploading to the user's PC). Next (step s132), if the portable imaging device is set to image-plus-text mode, step s133 is performed; otherwise, the routine terminates. At step s133, the image contents of image store is copied to a location in thememory 25 identified as “image buffer” for later readout, for example, to the user's computer coupled tooutput port 31. - When the
shutter release button 6 is held down for a time-out period in the full-press position by the user, while the system is in state three (3), a DISPLAY TEXT routine is invoked to display the OCRed results in the viewfinder, as set forth in FIG. 3. FIG. 16 shows the processing steps for implementing the DISPLAY TEXT routine referenced in FIG. 3. If theportable imaging device 2 is in image mode (s150), the routine terminates; otherwise, the routine continues at step s151. The text displayed, which corresponds to the content of the text store, is first merged (step s151) into the image displayed on theviewfinder 4. Then, the start-line is set (step s152) to one, in case of further operation such as scrolling though the image. FIG. 17 illustrates one manner in which to present the results in text store to the user on theviewfinder 4. As illustrated in FIG. 17, the results stored in the text store are overlaid on the original image stored in the image store. - Once the DISPLAY TEXT routine is completed, the system transitions to state four (4) as illustrated in FIG. 3. While in this state, the user, by moving the cursor cross-hair position via the
pointing device 10, can scroll through the text displayed in theviewfinder 4. As set forth in FIG. 3 any movement of the cursor crosshairs position invokes the SCROLL routine. FIG. 18 shows the processing steps for implementing the SCROLL routine referenced in FIG. 3. If theportable imaging device 2 is in image mode (s170), the routine terminates; otherwise, step s171 is performed. Initially, a test is made (step s171) to determine whether the cursor crosshairs position has moved up or down from its prior position. If there is movement, the Start-line is incremented or decremented, accordingly (step s172). Using the new Start-line, the text is merged from the text store for display on the viewfinder 4 (step s174). - FIGS.19 to 22 illustrate the images displayed in the viewfinder after executing a single word selection routine in accordance with one embodiment of the invention. In operation, the single word selection routine uses the
pointing device 10 orshutter release button 6 to emulate a double mouse button click on a conventional computer (preceded if necessary by a suitable, i.e. Single-Word, mode selection by the user). Thus, with the cursor crosshairs centered in the display ofviewfinder 4, a single word in a document, but more likely a distant object (seen at a distance in FIG. 19), may be selected and converted. For example, with the crosshairs coincident with the word (“FERFI”, which is Hungarian for “MEN” in FIG. 20), the double click selects the word, and the shutter release being held down until a time out (e.g., a second or two), causes an image of the word to be captured and OCRed. - The resulting (ASCII) version of the word can then be used for subsequent processing. In one embodiment, the word after OCRing is displayed in the top left corner of the
viewfinder 4, as shown in FIG. 21. In another embodiment, when theportable imaging device 2 is in translate mode, which is available when in text mode or image-plus-text mode, theportable imaging device 2 translates the word after OCRing into a desired language and displayed in the top left corner of theviewfinder 4, as illustrated in FIG. 22. When in select-language mode, the language to translate to is specified by the user on a menu displayed onviewfinder 4. In contrast, when in auto-translate mode, clues as to which language to translate from are provided by coordinates received fromGPS 23 throughinterface 28 e, or a language guesser as disclosed, for example, in “Comparing Two Language Identification Schemes,” Proceedings of the 3rd International Conference on the Statistical Analysis of Textual Data (JADT'95), Rome, Italy, December 1995. The language from which to translate from can be either specified using a default value stored inmemory 25 and/or specified by the user on a menu displayed onviewfinder 4. - In an alternate embodiment, an additional function can be added to aid blind or partially sighted people by synthesizing speech from OCRed text displayed in
viewfinder 4. In this alternate embodiment, audio data stored inmemory 25 representing the word identified after OCRing or translation into a desired language is output throughspeaker 30. In yet another alternate embodiment, this function is combined with a rangefinder to determine the distance of the recorded text from the user, and to generate speech that combines both pieces (i.e., text and distance) of information. For example, an object in an image captured at 50; feet and OCRed as “bus station” could be combined and output throughspeaker 30 as “50 feet from the bus station.” - To recapitulate, the present invention involves integrating a finger-operated pointing device to a portable imaging device, which includes interactive segmentation (using the camera viewfinder for feedback) and OCR applications. The digital imaging device in which images containing textual and formatting constructs are captured by an imaging unit and displayed by a display device. The integrated user operable pointing device allows a user of the portable imaging device to determine whether textual and formatting content in the image can be properly analyzed by OCR applications.
- In one embodiment, the portable imaging device is operated by performing the steps of: (a) displaying successive images captured by the imaging unit on the display device, each image being defined by grayscale and/or color data, (b) receiving a first user input defining the start of a selection and a first position within the displayed image, (c) in response to the first user input, freezing the displayed image, (d) receiving at least one further user input, including a final user input defining the end of a selection, (e) extracting from the frozen displayed image a selected image having extremities defined by the first and final user inputs, and (f) performing an optical character recognition operation on data defining the selected image to generate text data, the text data defining text corresponding to text matter within the selected image.
- The portable imaging device has the following advantages: 1) text is scanned, OCRed and visually checked on the spot, so that any problems with image quality are discovered at once rather than later when up-loading to a PC; 2) the pointing device allows just the required portion of the document image to be selected and stored, and interactive segmentation allows just the words or paragraphs of interest to be selected; 3) ability to store as text allows many more document pages to be stored locally before up-loading; 4) lengthy documents can be captured with the aid of a recirculating document feeder; and 5) text can be captured off physical objects, e.g., serial numbers of product labels, names off signs or conference badges.
- The invention has been described with reference to a particular embodiment. Modifications and alterations will occur to others upon reading and understanding this specification taken together with the drawings. The embodiments are but examples, and various alternatives, modifications, variations or improvements may be made by those skilled in the art from this teaching which are intended to be encompassed by the following claims.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/214,291 US20020191847A1 (en) | 1998-05-06 | 2002-08-08 | Portable text capturing method and device therefor |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB9809679.5 | 1998-05-06 | ||
GBGB9809679.5A GB9809679D0 (en) | 1998-05-06 | 1998-05-06 | Portable text capturing method and device therefor |
US09/304,659 US6473523B1 (en) | 1998-05-06 | 1999-05-04 | Portable text capturing method and device therefor |
US10/214,291 US20020191847A1 (en) | 1998-05-06 | 2002-08-08 | Portable text capturing method and device therefor |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/304,659 Division US6473523B1 (en) | 1998-05-06 | 1999-05-04 | Portable text capturing method and device therefor |
Publications (1)
Publication Number | Publication Date |
---|---|
US20020191847A1 true US20020191847A1 (en) | 2002-12-19 |
Family
ID=10831547
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/304,659 Expired - Lifetime US6473523B1 (en) | 1998-05-06 | 1999-05-04 | Portable text capturing method and device therefor |
US10/214,291 Abandoned US20020191847A1 (en) | 1998-05-06 | 2002-08-08 | Portable text capturing method and device therefor |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/304,659 Expired - Lifetime US6473523B1 (en) | 1998-05-06 | 1999-05-04 | Portable text capturing method and device therefor |
Country Status (2)
Country | Link |
---|---|
US (2) | US6473523B1 (en) |
GB (1) | GB9809679D0 (en) |
Cited By (64)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020171664A1 (en) * | 2001-03-28 | 2002-11-21 | International Business Machines Corporation | Image rotation with substantially no aliasing error |
US20050243179A1 (en) * | 2002-05-28 | 2005-11-03 | Nikon Corporation | Electronic camera |
US20060072820A1 (en) * | 2004-10-05 | 2006-04-06 | Nokia Corporation | System and method for checking framing and sharpness of a digital image |
EP1703445A1 (en) * | 2004-01-08 | 2006-09-20 | NEC Corporation | Character recognition device, mobile communication system, mobile terminal device, fixed station device, character recognition method, and character recognition program |
US20060293874A1 (en) * | 2005-06-27 | 2006-12-28 | Microsoft Corporation | Translation and capture architecture for output of conversational utterances |
CN1310181C (en) * | 2004-09-15 | 2007-04-11 | 北京中星微电子有限公司 | Optical character identifying treating method for mobile terminal with camera |
US20070116363A1 (en) * | 2005-11-22 | 2007-05-24 | Fuji Xerox Co., Ltd. | Image processing device, image processing method, and storage medium storing image processing program |
US20070133874A1 (en) * | 2005-12-12 | 2007-06-14 | Xerox Corporation | Personal information retrieval using knowledge bases for optical character recognition correction |
WO2007082534A1 (en) * | 2006-01-17 | 2007-07-26 | Flemming Ast | Mobile unit with camera and optical character recognition, optionally for conversion of imaged text into comprehensible speech |
US20070234203A1 (en) * | 2006-03-29 | 2007-10-04 | Joshua Shagam | Generating image-based reflowable files for rendering on various sized displays |
US20080233980A1 (en) * | 2007-03-22 | 2008-09-25 | Sony Ericsson Mobile Communications Ab | Translation and display of text in picture |
US20080267535A1 (en) * | 2006-03-28 | 2008-10-30 | Goodwin Robert L | Efficient processing of non-reflow content in a digital image |
US20090050701A1 (en) * | 2007-08-21 | 2009-02-26 | Symbol Technologies, Inc. | Reader with Optical Character Recognition |
US20100023517A1 (en) * | 2008-07-28 | 2010-01-28 | V Raja | Method and system for extracting data-points from a data file |
US20100188419A1 (en) * | 2009-01-28 | 2010-07-29 | Google Inc. | Selective display of ocr'ed text and corresponding images from publications on a client device |
US20110014944A1 (en) * | 2009-07-18 | 2011-01-20 | Abbyy Software Ltd. | Text processing method for a digital camera |
US20110081083A1 (en) * | 2009-10-07 | 2011-04-07 | Google Inc. | Gesture-based selective text recognition |
US20110123115A1 (en) * | 2009-11-25 | 2011-05-26 | Google Inc. | On-Screen Guideline-Based Selective Text Recognition |
US7990556B2 (en) | 2004-12-03 | 2011-08-02 | Google Inc. | Association of a portable scanner with input/output and storage devices |
US8005720B2 (en) | 2004-02-15 | 2011-08-23 | Google Inc. | Applying scanned information to identify content |
US8023738B1 (en) | 2006-03-28 | 2011-09-20 | Amazon Technologies, Inc. | Generating reflow files from digital images for rendering on various sized displays |
US8081849B2 (en) | 2004-12-03 | 2011-12-20 | Google Inc. | Portable scanning and memory device |
US8146156B2 (en) | 2004-04-01 | 2012-03-27 | Google Inc. | Archive of text captures from rendered documents |
US8179563B2 (en) | 2004-08-23 | 2012-05-15 | Google Inc. | Portable scanning device |
US20120130704A1 (en) * | 2010-11-23 | 2012-05-24 | Inventec Corporation | Real-time translation method for mobile device |
US8261094B2 (en) | 2004-04-19 | 2012-09-04 | Google Inc. | Secure data gathering from rendered documents |
US20120330643A1 (en) * | 2010-06-04 | 2012-12-27 | John Frei | System and method for translation |
US8346620B2 (en) | 2004-07-19 | 2013-01-01 | Google Inc. | Automatic modification of web pages |
US8413048B1 (en) | 2006-03-28 | 2013-04-02 | Amazon Technologies, Inc. | Processing digital images including headers and footers into reflow content |
US8418055B2 (en) | 2009-02-18 | 2013-04-09 | Google Inc. | Identifying a document by performing spectral analysis on the contents of the document |
US8442813B1 (en) | 2009-02-05 | 2013-05-14 | Google Inc. | Methods and systems for assessing the quality of automatically generated text |
US8442331B2 (en) | 2004-02-15 | 2013-05-14 | Google Inc. | Capturing text from rendered documents using supplemental information |
US8447066B2 (en) | 2009-03-12 | 2013-05-21 | Google Inc. | Performing actions based on capturing information from rendered documents, such as documents under copyright |
US8447111B2 (en) | 2004-04-01 | 2013-05-21 | Google Inc. | Triggering actions in response to optically or acoustically capturing keywords from a rendered document |
US8489624B2 (en) | 2004-05-17 | 2013-07-16 | Google, Inc. | Processing techniques for text capture from a rendered document |
US8499236B1 (en) | 2010-01-21 | 2013-07-30 | Amazon Technologies, Inc. | Systems and methods for presenting reflowable content on a display |
US8572480B1 (en) | 2008-05-30 | 2013-10-29 | Amazon Technologies, Inc. | Editing the sequential flow of a page |
US8600196B2 (en) | 2006-09-08 | 2013-12-03 | Google Inc. | Optical scanners, such as hand-held optical scanners |
US8619147B2 (en) | 2004-02-15 | 2013-12-31 | Google Inc. | Handheld device for capturing text from both a document printed on paper and a document displayed on a dynamic display device |
US8620760B2 (en) | 2004-04-01 | 2013-12-31 | Google Inc. | Methods and systems for initiating application processes by data capture from rendered documents |
US8621349B2 (en) | 2004-04-01 | 2013-12-31 | Google Inc. | Publishing techniques for adding value to a rendered document |
US8620083B2 (en) | 2004-12-03 | 2013-12-31 | Google Inc. | Method and system for character recognition |
US8713418B2 (en) | 2004-04-12 | 2014-04-29 | Google Inc. | Adding value to a rendered document |
US8782516B1 (en) | 2007-12-21 | 2014-07-15 | Amazon Technologies, Inc. | Content style detection |
US8793162B2 (en) | 2004-04-01 | 2014-07-29 | Google Inc. | Adding information or functionality to a rendered document via association with an electronic counterpart |
US8799303B2 (en) | 2004-02-15 | 2014-08-05 | Google Inc. | Establishing an interactive environment for rendered documents |
US8874504B2 (en) | 2004-12-03 | 2014-10-28 | Google Inc. | Processing techniques for visual capture data from a rendered document |
US8892495B2 (en) | 1991-12-23 | 2014-11-18 | Blanding Hovenweep, Llc | Adaptive pattern recognition based controller apparatus and method and human-interface therefore |
US8903759B2 (en) * | 2004-12-03 | 2014-12-02 | Google Inc. | Determining actions involving captured information and electronic content associated with rendered documents |
US8990235B2 (en) | 2009-03-12 | 2015-03-24 | Google Inc. | Automatically providing content associated with captured information, such as information captured in real-time |
US9081799B2 (en) | 2009-12-04 | 2015-07-14 | Google Inc. | Using gestalt information to identify locations in printed information |
US9116890B2 (en) | 2004-04-01 | 2015-08-25 | Google Inc. | Triggering actions in response to optically or acoustically capturing keywords from a rendered document |
US20150242096A1 (en) * | 2003-04-18 | 2015-08-27 | International Business Machines Corporation | Enabling a visually impaired or blind person to have access to information printed on a physical document |
US9143638B2 (en) | 2004-04-01 | 2015-09-22 | Google Inc. | Data capture from rendered documents using handheld device |
US9208133B2 (en) | 2006-09-29 | 2015-12-08 | Amazon Technologies, Inc. | Optimizing typographical content for transmission and display |
US9229911B1 (en) | 2008-09-30 | 2016-01-05 | Amazon Technologies, Inc. | Detecting continuation of flow of a page |
US9251428B2 (en) | 2009-07-18 | 2016-02-02 | Abbyy Development Llc | Entering information through an OCR-enabled viewfinder |
US9268852B2 (en) | 2004-02-15 | 2016-02-23 | Google Inc. | Search engines and systems with handheld document data capture devices |
US9323784B2 (en) | 2009-12-09 | 2016-04-26 | Google Inc. | Image search using text-based elements within the contents of images |
US9454764B2 (en) | 2004-04-01 | 2016-09-27 | Google Inc. | Contextual dynamic advertising based upon captured rendered text |
US9535563B2 (en) | 1999-02-01 | 2017-01-03 | Blanding Hovenweep, Llc | Internet appliance system and method |
US10346703B2 (en) | 2014-11-06 | 2019-07-09 | Alibaba Group Holding Limited | Method and apparatus for information recognition |
US10769431B2 (en) | 2004-09-27 | 2020-09-08 | Google Llc | Handheld device for capturing text from both a document printed on paper and a document displayed on a dynamic display device |
US20220258027A1 (en) * | 2021-02-16 | 2022-08-18 | Caddie Snap, LLC | Scoring method and system |
Families Citing this family (61)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7428569B1 (en) * | 1999-05-14 | 2008-09-23 | Sony Corporation | Information processing apparatus, information processing method, and provision medium |
US7406214B2 (en) | 1999-05-19 | 2008-07-29 | Digimarc Corporation | Methods and devices employing optical sensors and/or steganography |
US7069508B1 (en) | 2000-07-13 | 2006-06-27 | Language Technologies, Inc. | System and method for formatting text according to linguistic, visual and psychological variables |
US7346489B1 (en) | 1999-07-16 | 2008-03-18 | Language Technologies, Inc. | System and method of determining phrasing in text |
JP3373811B2 (en) * | 1999-08-06 | 2003-02-04 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Method and apparatus for embedding and detecting watermark information in black and white binary document image |
US8391851B2 (en) | 1999-11-03 | 2013-03-05 | Digimarc Corporation | Gestural techniques with wireless mobile phone devices |
US6640010B2 (en) | 1999-11-12 | 2003-10-28 | Xerox Corporation | Word-to-word selection on images |
US20010032070A1 (en) * | 2000-01-10 | 2001-10-18 | Mordechai Teicher | Apparatus and method for translating visual text |
US6823084B2 (en) * | 2000-09-22 | 2004-11-23 | Sri International | Method and apparatus for portably recognizing text in an image sequence of scene imagery |
JP4095243B2 (en) * | 2000-11-28 | 2008-06-04 | キヤノン株式会社 | A storage medium storing a URL acquisition and processing system and method and a program for executing the method. |
US20020101614A1 (en) * | 2001-01-29 | 2002-08-01 | Imes Edward Peter | Text only feature for a digital copier |
US7117430B2 (en) * | 2001-02-27 | 2006-10-03 | Microsoft Corporation | Spreadsheet error checker |
US6937747B2 (en) * | 2001-09-24 | 2005-08-30 | Hewlett Packard Development Company, L.P. | System and method for capturing non-audible information for processing |
US7053939B2 (en) * | 2001-10-17 | 2006-05-30 | Hewlett-Packard Development Company, L.P. | Automatic document detection method and system |
US6922487B2 (en) * | 2001-11-02 | 2005-07-26 | Xerox Corporation | Method and apparatus for capturing text images |
US20030113015A1 (en) * | 2001-12-18 | 2003-06-19 | Toshiaki Tanaka | Method and apparatus for extracting text information from moving image |
JP4566740B2 (en) * | 2002-08-07 | 2010-10-20 | パナソニック株式会社 | Mobile terminal device |
KR100515959B1 (en) * | 2002-10-29 | 2005-09-23 | 삼성테크윈 주식회사 | Method of controlling camera for user having defective sight |
US7194693B2 (en) * | 2002-10-29 | 2007-03-20 | International Business Machines Corporation | Apparatus and method for automatically highlighting text in an electronic document |
US20040210444A1 (en) * | 2003-04-17 | 2004-10-21 | International Business Machines Corporation | System and method for translating languages using portable display device |
JP4269811B2 (en) * | 2003-07-09 | 2009-05-27 | 株式会社日立製作所 | mobile phone |
US20050083411A1 (en) * | 2003-10-16 | 2005-04-21 | Cozier Robert P. | Device driven share system and method |
US20050083425A1 (en) * | 2003-10-16 | 2005-04-21 | Cozier Robert P. | Using audio in a customized menu |
WO2005048188A2 (en) * | 2003-11-11 | 2005-05-26 | Sri International | Method and apparatus for capturing paper-based information on a mobile computing device |
US7325735B2 (en) * | 2004-04-02 | 2008-02-05 | K-Nfb Reading Technology, Inc. | Directed reading mode for portable reading machine |
US8036895B2 (en) * | 2004-04-02 | 2011-10-11 | K-Nfb Reading Technology, Inc. | Cooperative processing for portable reading machine |
US7840033B2 (en) * | 2004-04-02 | 2010-11-23 | K-Nfb Reading Technology, Inc. | Text stitching from multiple images |
US8320708B2 (en) | 2004-04-02 | 2012-11-27 | K-Nfb Reading Technology, Inc. | Tilt adjustment for optical character recognition in portable reading machine |
US7659915B2 (en) * | 2004-04-02 | 2010-02-09 | K-Nfb Reading Technology, Inc. | Portable reading device with mode processing |
US7505056B2 (en) * | 2004-04-02 | 2009-03-17 | K-Nfb Reading Technology, Inc. | Mode processing in portable reading machine |
US9236043B2 (en) * | 2004-04-02 | 2016-01-12 | Knfb Reader, Llc | Document mode processing for portable reading machine enabling document navigation |
US20060020486A1 (en) * | 2004-04-02 | 2006-01-26 | Kurzweil Raymond C | Machine and method to assist user in selecting clothing |
US7641108B2 (en) * | 2004-04-02 | 2010-01-05 | K-Nfb Reading Technology, Inc. | Device and method to assist user in conducting a transaction with a machine |
US7627142B2 (en) * | 2004-04-02 | 2009-12-01 | K-Nfb Reading Technology, Inc. | Gesture processing with low resolution images with high resolution processing for optical character recognition for a reading machine |
US7629989B2 (en) * | 2004-04-02 | 2009-12-08 | K-Nfb Reading Technology, Inc. | Reducing processing latency in optical character recognition for portable reading machine |
US8873890B2 (en) * | 2004-04-02 | 2014-10-28 | K-Nfb Reading Technology, Inc. | Image resizing for optical character recognition in portable reading machine |
US8249309B2 (en) * | 2004-04-02 | 2012-08-21 | K-Nfb Reading Technology, Inc. | Image evaluation for reading mode in a reading machine |
US9811728B2 (en) * | 2004-04-12 | 2017-11-07 | Google Inc. | Adding value to a rendered document |
JP4574313B2 (en) * | 2004-10-04 | 2010-11-04 | キヤノン株式会社 | Image processing apparatus and method |
US20070150512A1 (en) * | 2005-12-15 | 2007-06-28 | Microsoft Corporation | Collaborative meeting assistant |
US8050498B2 (en) | 2006-07-21 | 2011-11-01 | Adobe Systems Incorporated | Live coherent image selection to differentiate foreground and background pixels |
US20080094496A1 (en) * | 2006-10-24 | 2008-04-24 | Kong Qiao Wang | Mobile communication terminal |
US8041555B2 (en) * | 2007-08-15 | 2011-10-18 | International Business Machines Corporation | Language translation based on a location of a wireless device |
TW200910875A (en) * | 2007-08-29 | 2009-03-01 | Inventec Appliances Corp | Method and system for instantly translating text within an image |
CN101122953B (en) * | 2007-09-21 | 2010-11-17 | 北京大学 | Picture words segmentation method |
US8725490B2 (en) * | 2007-10-18 | 2014-05-13 | Yahoo! Inc. | Virtual universal translator for a mobile device with a camera |
EP2189926B1 (en) * | 2008-11-21 | 2012-09-19 | beyo GmbH | Method for providing camera-based services using a portable communication device of a user and portable communication device of a user |
US20100128994A1 (en) * | 2008-11-24 | 2010-05-27 | Jan Scott Zwolinski | Personal dictionary and translator device |
KR20100064533A (en) * | 2008-12-05 | 2010-06-15 | 삼성전자주식회사 | Apparatus and method for automatic character resizing using camera |
US8938211B2 (en) | 2008-12-22 | 2015-01-20 | Qualcomm Incorporated | Providing and utilizing maps in location determination based on RSSI and RTT data |
US20100157848A1 (en) * | 2008-12-22 | 2010-06-24 | Qualcomm Incorporated | Method and apparatus for providing and utilizing local maps and annotations in location determination |
US8938355B2 (en) * | 2009-03-13 | 2015-01-20 | Qualcomm Incorporated | Human assisted techniques for providing local maps and location-specific annotated data |
KR20110051073A (en) * | 2009-11-09 | 2011-05-17 | 엘지전자 주식회사 | Method of executing application program in portable terminal |
US9092674B2 (en) * | 2011-06-23 | 2015-07-28 | International Business Machines Corportion | Method for enhanced location based and context sensitive augmented reality translation |
US9734132B1 (en) * | 2011-12-20 | 2017-08-15 | Amazon Technologies, Inc. | Alignment and reflow of displayed character images |
US9080882B2 (en) | 2012-03-02 | 2015-07-14 | Qualcomm Incorporated | Visual OCR for positioning |
US9858271B2 (en) * | 2012-11-30 | 2018-01-02 | Ricoh Company, Ltd. | System and method for translating content between devices |
US9037450B2 (en) | 2012-12-14 | 2015-05-19 | Microsoft Technology Licensing, Llc | Text overlay techniques in realtime translation |
KR20150060338A (en) * | 2013-11-26 | 2015-06-03 | 삼성전자주식회사 | Electronic device and method for recogniting character in electronic device |
CN104967749A (en) * | 2015-07-29 | 2015-10-07 | 努比亚技术有限公司 | Device and method for processing picture and text information |
US10701261B2 (en) | 2016-08-01 | 2020-06-30 | International Business Machines Corporation | Method, system and computer program product for selective image capture |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5402171A (en) * | 1992-09-11 | 1995-03-28 | Kabushiki Kaisha Toshiba | Electronic still camera with improved picture resolution by image shifting in a parallelogram arrangement |
US5473344A (en) * | 1994-01-06 | 1995-12-05 | Microsoft Corporation | 3-D cursor positioning device |
US5477264A (en) * | 1994-03-29 | 1995-12-19 | Eastman Kodak Company | Electronic imaging system using a removable software-enhanced storage device |
DE19509062C2 (en) * | 1994-05-11 | 1997-02-13 | Bruker Analytische Messtechnik | NMR sample holder and method for filling the sample holder |
TW347503B (en) * | 1995-11-15 | 1998-12-11 | Hitachi Ltd | Character recognition translation system and voice recognition translation system |
US5960114A (en) * | 1996-10-28 | 1999-09-28 | International Business Machines Corporation | Process for identifying and capturing text |
GB9711022D0 (en) * | 1997-05-28 | 1997-07-23 | Rank Xerox Ltd | Text/image selection from document images |
-
1998
- 1998-05-06 GB GBGB9809679.5A patent/GB9809679D0/en not_active Ceased
-
1999
- 1999-05-04 US US09/304,659 patent/US6473523B1/en not_active Expired - Lifetime
-
2002
- 2002-08-08 US US10/214,291 patent/US20020191847A1/en not_active Abandoned
Cited By (114)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8892495B2 (en) | 1991-12-23 | 2014-11-18 | Blanding Hovenweep, Llc | Adaptive pattern recognition based controller apparatus and method and human-interface therefore |
US9535563B2 (en) | 1999-02-01 | 2017-01-03 | Blanding Hovenweep, Llc | Internet appliance system and method |
US20020171664A1 (en) * | 2001-03-28 | 2002-11-21 | International Business Machines Corporation | Image rotation with substantially no aliasing error |
US7411593B2 (en) * | 2001-03-28 | 2008-08-12 | International Business Machines Corporation | Image rotation with substantially no aliasing error |
US7489343B2 (en) * | 2002-05-28 | 2009-02-10 | Nikon Corporation | Electronic camera |
US20050243179A1 (en) * | 2002-05-28 | 2005-11-03 | Nikon Corporation | Electronic camera |
US10614729B2 (en) | 2003-04-18 | 2020-04-07 | International Business Machines Corporation | Enabling a visually impaired or blind person to have access to information printed on a physical document |
US10276065B2 (en) * | 2003-04-18 | 2019-04-30 | International Business Machines Corporation | Enabling a visually impaired or blind person to have access to information printed on a physical document |
US20150242096A1 (en) * | 2003-04-18 | 2015-08-27 | International Business Machines Corporation | Enabling a visually impaired or blind person to have access to information printed on a physical document |
US20110081084A1 (en) * | 2004-01-08 | 2011-04-07 | Nec Corporation | Character recognition device, mobile communication system, mobile terminal device, fixed station device, character recognition method and character recognition program |
EP1703445A1 (en) * | 2004-01-08 | 2006-09-20 | NEC Corporation | Character recognition device, mobile communication system, mobile terminal device, fixed station device, character recognition method, and character recognition program |
EP1703445A4 (en) * | 2004-01-08 | 2011-07-27 | Nec Corp | Character recognition device, mobile communication system, mobile terminal device, fixed station device, character recognition method, and character recognition program |
US8135218B2 (en) | 2004-01-08 | 2012-03-13 | Nec Corporation | Character recognition device, mobile communication system, mobile terminal device, fixed station device, character recognition method and character recognition program |
US8619147B2 (en) | 2004-02-15 | 2013-12-31 | Google Inc. | Handheld device for capturing text from both a document printed on paper and a document displayed on a dynamic display device |
US8064700B2 (en) | 2004-02-15 | 2011-11-22 | Google Inc. | Method and system for character recognition |
US8019648B2 (en) | 2004-02-15 | 2011-09-13 | Google Inc. | Search engines and systems with handheld document data capture devices |
US9268852B2 (en) | 2004-02-15 | 2016-02-23 | Google Inc. | Search engines and systems with handheld document data capture devices |
US8005720B2 (en) | 2004-02-15 | 2011-08-23 | Google Inc. | Applying scanned information to identify content |
US8442331B2 (en) | 2004-02-15 | 2013-05-14 | Google Inc. | Capturing text from rendered documents using supplemental information |
US8799303B2 (en) | 2004-02-15 | 2014-08-05 | Google Inc. | Establishing an interactive environment for rendered documents |
US8515816B2 (en) | 2004-02-15 | 2013-08-20 | Google Inc. | Aggregate analysis of text captures performed by multiple users from rendered documents |
US8214387B2 (en) | 2004-02-15 | 2012-07-03 | Google Inc. | Document enhancement system and method |
US8831365B2 (en) | 2004-02-15 | 2014-09-09 | Google Inc. | Capturing text from rendered documents using supplement information |
US8447144B2 (en) | 2004-02-15 | 2013-05-21 | Google Inc. | Data capture from rendered documents using handheld device |
US9514134B2 (en) | 2004-04-01 | 2016-12-06 | Google Inc. | Triggering actions in response to optically or acoustically capturing keywords from a rendered document |
US8505090B2 (en) | 2004-04-01 | 2013-08-06 | Google Inc. | Archive of text captures from rendered documents |
US8793162B2 (en) | 2004-04-01 | 2014-07-29 | Google Inc. | Adding information or functionality to a rendered document via association with an electronic counterpart |
US8447111B2 (en) | 2004-04-01 | 2013-05-21 | Google Inc. | Triggering actions in response to optically or acoustically capturing keywords from a rendered document |
US9116890B2 (en) | 2004-04-01 | 2015-08-25 | Google Inc. | Triggering actions in response to optically or acoustically capturing keywords from a rendered document |
US8781228B2 (en) | 2004-04-01 | 2014-07-15 | Google Inc. | Triggering actions in response to optically or acoustically capturing keywords from a rendered document |
US9143638B2 (en) | 2004-04-01 | 2015-09-22 | Google Inc. | Data capture from rendered documents using handheld device |
US8620760B2 (en) | 2004-04-01 | 2013-12-31 | Google Inc. | Methods and systems for initiating application processes by data capture from rendered documents |
US8146156B2 (en) | 2004-04-01 | 2012-03-27 | Google Inc. | Archive of text captures from rendered documents |
US9454764B2 (en) | 2004-04-01 | 2016-09-27 | Google Inc. | Contextual dynamic advertising based upon captured rendered text |
US8621349B2 (en) | 2004-04-01 | 2013-12-31 | Google Inc. | Publishing techniques for adding value to a rendered document |
US8619287B2 (en) | 2004-04-01 | 2013-12-31 | Google Inc. | System and method for information gathering utilizing form identifiers |
US9633013B2 (en) | 2004-04-01 | 2017-04-25 | Google Inc. | Triggering actions in response to optically or acoustically capturing keywords from a rendered document |
US8713418B2 (en) | 2004-04-12 | 2014-04-29 | Google Inc. | Adding value to a rendered document |
US8261094B2 (en) | 2004-04-19 | 2012-09-04 | Google Inc. | Secure data gathering from rendered documents |
US9030699B2 (en) | 2004-04-19 | 2015-05-12 | Google Inc. | Association of a portable scanner with input/output and storage devices |
US8799099B2 (en) | 2004-05-17 | 2014-08-05 | Google Inc. | Processing techniques for text capture from a rendered document |
US8489624B2 (en) | 2004-05-17 | 2013-07-16 | Google, Inc. | Processing techniques for text capture from a rendered document |
US9275051B2 (en) | 2004-07-19 | 2016-03-01 | Google Inc. | Automatic modification of web pages |
US8346620B2 (en) | 2004-07-19 | 2013-01-01 | Google Inc. | Automatic modification of web pages |
US8179563B2 (en) | 2004-08-23 | 2012-05-15 | Google Inc. | Portable scanning device |
CN1310181C (en) * | 2004-09-15 | 2007-04-11 | 北京中星微电子有限公司 | Optical character identifying treating method for mobile terminal with camera |
US10769431B2 (en) | 2004-09-27 | 2020-09-08 | Google Llc | Handheld device for capturing text from both a document printed on paper and a document displayed on a dynamic display device |
WO2006038092A1 (en) * | 2004-10-05 | 2006-04-13 | Nokia Corporation | System and method for checking framing and sharpness of a digital image |
US20060072820A1 (en) * | 2004-10-05 | 2006-04-06 | Nokia Corporation | System and method for checking framing and sharpness of a digital image |
US7990556B2 (en) | 2004-12-03 | 2011-08-02 | Google Inc. | Association of a portable scanner with input/output and storage devices |
US8081849B2 (en) | 2004-12-03 | 2011-12-20 | Google Inc. | Portable scanning and memory device |
US8874504B2 (en) | 2004-12-03 | 2014-10-28 | Google Inc. | Processing techniques for visual capture data from a rendered document |
US8620083B2 (en) | 2004-12-03 | 2013-12-31 | Google Inc. | Method and system for character recognition |
US8903759B2 (en) * | 2004-12-03 | 2014-12-02 | Google Inc. | Determining actions involving captured information and electronic content associated with rendered documents |
US8953886B2 (en) | 2004-12-03 | 2015-02-10 | Google Inc. | Method and system for character recognition |
US20060293874A1 (en) * | 2005-06-27 | 2006-12-28 | Microsoft Corporation | Translation and capture architecture for output of conversational utterances |
US7991607B2 (en) * | 2005-06-27 | 2011-08-02 | Microsoft Corporation | Translation and capture architecture for output of conversational utterances |
US20070116363A1 (en) * | 2005-11-22 | 2007-05-24 | Fuji Xerox Co., Ltd. | Image processing device, image processing method, and storage medium storing image processing program |
AU2006235826B2 (en) * | 2005-11-22 | 2010-01-28 | Fujifilm Business Innovation Corp. | Image processing device, image processing method, and storage medium storing image processing program |
US20070133874A1 (en) * | 2005-12-12 | 2007-06-14 | Xerox Corporation | Personal information retrieval using knowledge bases for optical character recognition correction |
US7826665B2 (en) | 2005-12-12 | 2010-11-02 | Xerox Corporation | Personal information retrieval using knowledge bases for optical character recognition correction |
WO2007082534A1 (en) * | 2006-01-17 | 2007-07-26 | Flemming Ast | Mobile unit with camera and optical character recognition, optionally for conversion of imaged text into comprehensible speech |
US7961987B2 (en) | 2006-03-28 | 2011-06-14 | Amazon Technologies, Inc. | Efficient processing of non-reflow content in a digital image |
US8413048B1 (en) | 2006-03-28 | 2013-04-02 | Amazon Technologies, Inc. | Processing digital images including headers and footers into reflow content |
US8023738B1 (en) | 2006-03-28 | 2011-09-20 | Amazon Technologies, Inc. | Generating reflow files from digital images for rendering on various sized displays |
US20080267535A1 (en) * | 2006-03-28 | 2008-10-30 | Goodwin Robert L | Efficient processing of non-reflow content in a digital image |
US7966557B2 (en) * | 2006-03-29 | 2011-06-21 | Amazon Technologies, Inc. | Generating image-based reflowable files for rendering on various sized displays |
US8566707B1 (en) | 2006-03-29 | 2013-10-22 | Amazon Technologies, Inc. | Generating image-based reflowable files for rendering on various sized displays |
US20070234203A1 (en) * | 2006-03-29 | 2007-10-04 | Joshua Shagam | Generating image-based reflowable files for rendering on various sized displays |
US8600196B2 (en) | 2006-09-08 | 2013-12-03 | Google Inc. | Optical scanners, such as hand-held optical scanners |
US9208133B2 (en) | 2006-09-29 | 2015-12-08 | Amazon Technologies, Inc. | Optimizing typographical content for transmission and display |
EP2434433A3 (en) * | 2007-03-22 | 2012-09-12 | Sony Ericsson Mobile Communications AB | Translations and display of text in picture |
US8144990B2 (en) * | 2007-03-22 | 2012-03-27 | Sony Ericsson Mobile Communications Ab | Translation and display of text in picture |
US20080233980A1 (en) * | 2007-03-22 | 2008-09-25 | Sony Ericsson Mobile Communications Ab | Translation and display of text in picture |
US9773197B2 (en) * | 2007-03-22 | 2017-09-26 | Sony Corporation | Translation and display of text in picture |
US20120163668A1 (en) * | 2007-03-22 | 2012-06-28 | Sony Ericsson Mobile Communications Ab | Translation and display of text in picture |
US20180018544A1 (en) * | 2007-03-22 | 2018-01-18 | Sony Mobile Communications Inc. | Translation and display of text in picture |
CN102866991A (en) * | 2007-03-22 | 2013-01-09 | 索尼爱立信移动通讯股份有限公司 | Translation and display of text in picture |
US10943158B2 (en) | 2007-03-22 | 2021-03-09 | Sony Corporation | Translation and display of text in picture |
US20090050701A1 (en) * | 2007-08-21 | 2009-02-26 | Symbol Technologies, Inc. | Reader with Optical Character Recognition |
US8783570B2 (en) * | 2007-08-21 | 2014-07-22 | Symbol Technologies, Inc. | Reader with optical character recognition |
US8782516B1 (en) | 2007-12-21 | 2014-07-15 | Amazon Technologies, Inc. | Content style detection |
US8572480B1 (en) | 2008-05-30 | 2013-10-29 | Amazon Technologies, Inc. | Editing the sequential flow of a page |
US20100023517A1 (en) * | 2008-07-28 | 2010-01-28 | V Raja | Method and system for extracting data-points from a data file |
US9229911B1 (en) | 2008-09-30 | 2016-01-05 | Amazon Technologies, Inc. | Detecting continuation of flow of a page |
US8373724B2 (en) * | 2009-01-28 | 2013-02-12 | Google Inc. | Selective display of OCR'ed text and corresponding images from publications on a client device |
CN104134057A (en) * | 2009-01-28 | 2014-11-05 | 谷歌公司 | Selective display of OCR'ed text and corresponding images from publications on a client device |
US20100188419A1 (en) * | 2009-01-28 | 2010-07-29 | Google Inc. | Selective display of ocr'ed text and corresponding images from publications on a client device |
CN102301380A (en) * | 2009-01-28 | 2011-12-28 | 谷歌公司 | Selective display of ocr'ed text and corresponding images from publications on a client device |
US9280952B2 (en) | 2009-01-28 | 2016-03-08 | Google Inc. | Selective display of OCR'ed text and corresponding images from publications on a client device |
US8675012B2 (en) * | 2009-01-28 | 2014-03-18 | Google Inc. | Selective display of OCR'ed text and corresponding images from publications on a client device |
WO2010088182A1 (en) * | 2009-01-28 | 2010-08-05 | Google Inc. | Selective display of ocr'ed text and corresponding images from publications on a client device |
US8682648B2 (en) | 2009-02-05 | 2014-03-25 | Google Inc. | Methods and systems for assessing the quality of automatically generated text |
US8442813B1 (en) | 2009-02-05 | 2013-05-14 | Google Inc. | Methods and systems for assessing the quality of automatically generated text |
US8638363B2 (en) | 2009-02-18 | 2014-01-28 | Google Inc. | Automatically capturing information, such as capturing information using a document-aware device |
US8418055B2 (en) | 2009-02-18 | 2013-04-09 | Google Inc. | Identifying a document by performing spectral analysis on the contents of the document |
US9075779B2 (en) | 2009-03-12 | 2015-07-07 | Google Inc. | Performing actions based on capturing information from rendered documents, such as documents under copyright |
US8990235B2 (en) | 2009-03-12 | 2015-03-24 | Google Inc. | Automatically providing content associated with captured information, such as information captured in real-time |
US8447066B2 (en) | 2009-03-12 | 2013-05-21 | Google Inc. | Performing actions based on capturing information from rendered documents, such as documents under copyright |
US9251428B2 (en) | 2009-07-18 | 2016-02-02 | Abbyy Development Llc | Entering information through an OCR-enabled viewfinder |
US9055161B2 (en) * | 2009-07-18 | 2015-06-09 | Abbyy Development Llc | Text processing method for a digital camera |
US20110014944A1 (en) * | 2009-07-18 | 2011-01-20 | Abbyy Software Ltd. | Text processing method for a digital camera |
US8520983B2 (en) | 2009-10-07 | 2013-08-27 | Google Inc. | Gesture-based selective text recognition |
KR101304084B1 (en) | 2009-10-07 | 2013-09-10 | 구글 인코포레이티드 | Gesture-based selective text recognition |
US20110081083A1 (en) * | 2009-10-07 | 2011-04-07 | Google Inc. | Gesture-based selective text recognition |
US8515185B2 (en) * | 2009-11-25 | 2013-08-20 | Google Inc. | On-screen guideline-based selective text recognition |
US20110123115A1 (en) * | 2009-11-25 | 2011-05-26 | Google Inc. | On-Screen Guideline-Based Selective Text Recognition |
US9081799B2 (en) | 2009-12-04 | 2015-07-14 | Google Inc. | Using gestalt information to identify locations in printed information |
US9323784B2 (en) | 2009-12-09 | 2016-04-26 | Google Inc. | Image search using text-based elements within the contents of images |
US8499236B1 (en) | 2010-01-21 | 2013-07-30 | Amazon Technologies, Inc. | Systems and methods for presenting reflowable content on a display |
US20120330643A1 (en) * | 2010-06-04 | 2012-12-27 | John Frei | System and method for translation |
US20120130704A1 (en) * | 2010-11-23 | 2012-05-24 | Inventec Corporation | Real-time translation method for mobile device |
US10346703B2 (en) | 2014-11-06 | 2019-07-09 | Alibaba Group Holding Limited | Method and apparatus for information recognition |
US20220258027A1 (en) * | 2021-02-16 | 2022-08-18 | Caddie Snap, LLC | Scoring method and system |
Also Published As
Publication number | Publication date |
---|---|
US6473523B1 (en) | 2002-10-29 |
GB9809679D0 (en) | 1998-07-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6473523B1 (en) | Portable text capturing method and device therefor | |
JP2940960B2 (en) | Image tilt detection method and correction method, and image information processing apparatus | |
US6774889B1 (en) | System and method for transforming an ordinary computer monitor screen into a touch screen | |
US6178270B1 (en) | Method and apparatus for selecting text and image data from video images | |
JP2986383B2 (en) | Method and apparatus for correcting skew for line scan images | |
US7031553B2 (en) | Method and apparatus for recognizing text in an image sequence of scene imagery | |
CN101667251B (en) | OCR recognition method and device with auxiliary positioning function | |
JP5896245B2 (en) | How to crop a text image | |
CN108549643B (en) | Translation processing method and device | |
EP1091320A2 (en) | Processing multiple digital images | |
US20090040215A1 (en) | Interpreting Sign Language Gestures | |
RU2631765C1 (en) | Method and system of correcting perspective distortions in images occupying double-page spread | |
CN111091590A (en) | Image processing method, image processing device, storage medium and electronic equipment | |
CN112115936A (en) | Text recognition method and device, storage medium and electronic equipment | |
CN111612696B (en) | Image stitching method, device, medium and electronic equipment | |
CN111291661A (en) | Method and equipment for identifying text content of icons in screen | |
WO2001003416A1 (en) | Border eliminating device, border eliminating method, and authoring device | |
US5517586A (en) | Method and apparatus for automatically specifying a portion of text from a bitmap image of the text | |
CN114121179B (en) | Extraction method and extraction device of chemical structural formula | |
US20020012468A1 (en) | Document recognition apparatus and method | |
JP2985935B2 (en) | Handwritten character / graphic reader | |
Dave et al. | OCR Text Detector and Audio Convertor | |
JPH1153539A (en) | Circular pattern discriminating method and storage medium | |
CN116883461B (en) | Method for acquiring clear document image and terminal device thereof | |
JP2000011192A (en) | Inter-image positioning method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: JPMORGAN CHASE BANK, AS COLLATERAL AGENT, TEXAS Free format text: SECURITY AGREEMENT;ASSIGNOR:XEROX CORPORATION;REEL/FRAME:015134/0476 Effective date: 20030625 Owner name: JPMORGAN CHASE BANK, AS COLLATERAL AGENT,TEXAS Free format text: SECURITY AGREEMENT;ASSIGNOR:XEROX CORPORATION;REEL/FRAME:015134/0476 Effective date: 20030625 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: MAJANDRO LLC, TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:XEROX CORPORATION;REEL/FRAME:053258/0949 Effective date: 20200413 |
|
AS | Assignment |
Owner name: MILESTONE IP LLC, TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MAJANDRO LLC;REEL/FRAME:053543/0971 Effective date: 20200815 |
|
AS | Assignment |
Owner name: XEROX CORPORATION, CONNECTICUT Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:JPMORGAN CHASE BANK, N.A. AS SUCCESSOR-IN-INTEREST ADMINISTRATIVE AGENT AND COLLATERAL AGENT TO JPMORGAN CHASE BANK;REEL/FRAME:066728/0193 Effective date: 20220822 |