US20080131006A1 - Pure adversarial approach for identifying text content in images - Google Patents
Pure adversarial approach for identifying text content in images Download PDFInfo
- Publication number
- US20080131006A1 US20080131006A1 US11/893,921 US89392107A US2008131006A1 US 20080131006 A1 US20080131006 A1 US 20080131006A1 US 89392107 A US89392107 A US 89392107A US 2008131006 A1 US2008131006 A1 US 2008131006A1
- Authority
- US
- United States
- Prior art keywords
- image
- blocks
- search term
- character
- ocr
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/21—Monitoring or handling of messages
- H04L51/212—Monitoring or handling of messages using filtering or selective blocking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- the present invention relates generally to computer security, and more particularly but not exclusively to methods and apparatus for identifying text content in images.
- Email Electronic mail
- a computer network such as the Internet.
- email is relatively convenient, fast, and cost-effective compared to traditional mail. It is thus no surprise that a lot of businesses and home computer users have some form of email access.
- spammmers unscrupulous advertisers, also known as “spammers,” have resorted to mass emailings of advertisements over the Internet. These mass emails, which are also referred to as “spam emails” or simply “spam,” are sent to computer users regardless of whether they asked for them or not. Spam includes any unsolicited email, not just advertisements. Spam is not only a nuisance, but also poses an economic burden.
- spam messages can be distinguished from normal or legitimate messages in at least two ways.
- the inappropriate content e.g., words such as “Viagra”, “free”, “online prescriptions,” etc.
- keyword and statistical filters e.g., see Sahami M., Dumais S., Heckerman D., and Horvitz E., “A Bayesian Approach to Filtering Junk E-mail,” AAAI'98 Workshop on Learning for Text Categorization, 27 Jul. 1998, Madison, Wis.
- domain in URLs uniform resource locators
- the spam can be compared to databases of known bad domains and links (e.g., see Internet URL ⁇ http://www.surbl.org/>).
- a spam email where the inappropriate content and URLs are embedded in an image may be harder to classify because the email itself does not contain obvious spammy textual content and does not have a link/domain that can be looked up in a database of bad links/domains.
- OCR optical character recognition
- the present invention provides a novel and effective approach for identifying content in an image even when the image has anti-OCR features.
- an image and a search term are input to a pure adversarial OCR module configured to search the image for presence of the search term.
- the image may be extracted from an email by an email processing engine.
- the OCR module may split the image into several character-blocks that each has a reasonable probability of containing a character (e.g., an ASCII character).
- the OCR module may form a sequence of blocks that represent a candidate match for the search term and estimate the probability of a match between the sequence of blocks and the search term.
- the OCR module may be configured to output whether or not the search term is found in the image and, if applicable, the location of the search term in the image.
- Embodiments of the present invention may be employed in a variety of applications including, but not limited to, antispam, anti-phishing, email scanning for confidential or prohibited information, etc.
- FIG. 1 shows an example image included in a spam.
- FIG. 2 shows text extracted from the image of FIG. 1 by optical character recognition.
- FIG. 3 shows a schematic diagram of a computer in accordance with an embodiment of the present invention.
- FIG. 4 shows a flow diagram of a method of identifying inappropriate text content in images in accordance with an embodiment of the present invention.
- FIG. 5 shows a flow diagram of a method of identifying inappropriate text content in images in accordance with another embodiment of the present invention.
- FIG. 6 shows a spam image included in an email and processed using the method of FIG. 5 .
- FIG. 7 shows inappropriate text content found in the spam image of FIG. 6 using the method of FIG. 5 .
- FIG. 8 shows a flow diagram of a method of identifying inappropriate text content in images in accordance with yet another embodiment of the present invention.
- FIGS. 9A and 9B illustrate conventional OCR processing.
- FIGS. 10A-10F show example images that contain anti-OCR features.
- FIGS. 11 , 14 , and 15 show example character-blocks.
- FIG. 12 shows a schematic diagram of a computer in accordance with an embodiment of the present invention.
- FIGS. 13A and 13B illustrate a pure adversarial OCR processing in accordance with an embodiment of the present invention.
- FIG. 1 shows an example image included in a spam.
- the spam image of FIG. 1 includes anti-OCR features in the form of an irregular background, fonts, and color scheme to confuse an OCR module.
- FIG. 2 shows the text extracted from the image of FIG. 1 using conventional OCR process.
- the anti-OCR features fooled the OCR module enough to make the text largely unintelligible, making it difficult to determine if the image contains inappropriate content, such as those commonly used in spam emails.
- the computer 300 may have less or more components to meet the needs of a particular application.
- the computer 300 may include a processor 101 , such as those from the Intel Corporation or Advanced Micro Devices, for example.
- the computer 300 may have one or more buses 103 coupling its various components.
- the computer 300 may include one or more user input devices 102 (e.g., keyboard, mouse), one or more data storage devices 106 (e.g., hard drive, optical disk, USB memory), a display monitor 104 (e.g., LCD, flat panel monitor, CRT), a computer network interface 105 (e.g., network adapter, modem), and a main memory 108 (e.g., RAM).
- the main memory 108 includes an antispam engine 320 , an OCR module 321 , expressions 322 , images 323 , and emails 324 .
- the components shown as being in the main memory 108 may be loaded from a data storage device 106 for execution or reading by the processor 101 .
- the emails 324 may be received over the Internet by way of the computer network interface 105 , buffered in the data storage device 106 , and then loaded onto the main memory 108 for processing by the antispam engine 320 .
- the antispam engine 320 may be stored in the data storage device 106 and then loaded onto the main memory 108 to provide antispam functionalities in the computer 300 .
- the antispam engine 320 may comprise computer-readable program code for identifying spam emails or other data with inappropriate content, which may comprise text that includes one or more words and phrases identified in the expressions 322 .
- the antispam engine 320 may be configured to extract an image 323 from an email 324 , use the OCR module 321 to extract text from the image 323 , and process the extracted text output to determine if the image 323 includes inappropriate content, such as an expression 322 .
- the antispam engine 320 may be configured to determine if one or more expressions in the expressions 322 are present in the extracted text.
- the antispam engine 320 may also be configured to directly process the image 323 , without having to extract text from the image 323 , to determine whether or not the image 323 includes inappropriate content. For example, the antispam engine 320 may directly compare the expressions 322 to sections of the image 323 . The antispam engine 320 may deem emails 324 with inappropriate content as spam.
- the OCR module 321 may comprise computer-readable program code for extracting text from an image.
- the OCR module 321 may be configured to receive an image in the form of an image file or other representation and process the image to generate text from the image.
- the OCR module 321 may comprise a conventional OCR module.
- the OCR module 321 is employed to extract embedded texts from the images 323 , which in turn are extracted from the emails 324 .
- the expressions 322 may comprise words, phrases, terms, or other character combinations or strings that may be present in spam images. Examples of such expressions may include “brokers,” “companyname” (particular companies), “currentprice,” “5daytarget,” “strongbuy,” “symbol,” “tradingalert” and so on.
- the expressions 322 may be obtained from samples of confirmed spam emails, for example.
- embodiments of the present invention are adversarial in that they select an expression from the expressions 322 and specifically look for the selected expression in the image, either directly or from the text output of the OCR module 321 . That is, instead of extracting text from an image and querying whether the extracted text is in a listing of expressions, embodiments of the present invention ask the question of whether a particular expression is in an image.
- the adversarial approach allows for better accuracy in identifying inappropriate content in images in that it focuses search for a particular expression, allowing for more accurate reading of text embedded in images.
- the emails 324 may comprise emails received over the computer network interface 105 or other means.
- the images 323 may comprise images extracted from the emails 324 .
- the images 324 may be in any conventional image format including JPEG, TIFF, etc.
- FIG. 4 shows a flow diagram of a method 400 of identifying inappropriate text content in images in accordance with an embodiment of the present invention.
- FIG. 4 is explained using the components shown in FIG. 3 . Other components may also be used without detracting from the merits of the present invention.
- the method 400 starts after the antispam engine 320 extracts an image 323 from an email 324 .
- the antispam engine 320 selects an expression from the expressions 322 (step 401 ).
- the antispam engine 320 determines if there is a section of the image 323 that corresponds to the start and end of the selected expression (step 402 ). That is, the selected expression is used as a basis in finding a corresponding section.
- the antispam engine 320 may determine if the image 323 includes a section that looks similar to the selected expression 322 in terms of shape.
- the antispam engine 320 compares the selected expression 322 to the section to determine the closeness of the selected expression 322 to the section.
- this is performed by the antispam engine 320 by scoring the section against the selected expression (step 403 ).
- the score may reflect how close the selected expression 322 is to the section. For example, the higher the score, the higher the likelihood that the selected expression 322 matches the section.
- a minimum threshold indicative of the amount of correspondence required to obtain a match between an expression 322 and a section may be predetermined. The value of the threshold may be obtained and optimized empirically. If the score is higher than the threshold, the antispam engine 320 may deem the selected expression 322 as being close enough to the section that a match is obtained, i.e., the selected expression 322 is deemed found in the image 323 (step 404 ).
- the antispam engine 320 records that the selected expression was found at the location of the section in the image 323 .
- the antispam engine 320 may repeat the above-described process for each of the expressions 322 (step 405 ).
- a separate scoring procedure may be performed for all identified expressions 322 to determine whether or not the image is a spam image.
- the antispam engine 320 may employ conventional text-based algorithms to determine if the identified expressions 322 are sufficient to deem the image 323 a spam image.
- the email 324 from which a spam image was extracted may be deemed as spam.
- FIG. 5 shows a flow diagram of a method 500 of identifying inappropriate text content in images in accordance with another embodiment of the present invention.
- FIG. 5 is explained using the components shown in FIG. 3 . Other components may also be used without detracting from the merits of the present invention.
- the method 500 starts after the antispam engine 320 extracts an image 323 from an email 324 .
- the OCR module 321 then extracts text from the image, hereinafter referred to as “OCR text output” (step 501 ).
- the antispam engine 320 selects an expression from the expressions 322 (step 502 ). Using the selected expression as a reference, the antispam engine 320 finds an occurrence in the OCR text output that is suitably similar to the selected expression 322 (step 503 ). For example, the antispam engine 320 may find one or more occurrences in the OCR text output that could match the beginning and end of the selected expression 322 in terms of shape. Conventional shape matching algorithms may be employed to perform the step 503 .
- the antispam engine may employ the shape matching algorithm disclosed in the publication “Shape Matching and Object Recognition Using Shape Contexts”, S. Belongie, J. Malik, and J. Puzicha., IEEE Transactions on PAMI, Vol 24, No. 24, April 2002.
- Other shape matching algorithms may also be employed without detracting from the merits of the present invention.
- the antispam engine 320 determines the closeness of the selected expression 322 to each found occurrence, such as by assigning a score indicative of how well the selected expression 322 matches each found occurrence in the OCR text output (step 504 ). For example, the higher the score, the higher the likelihood the selected expression 322 matches the found occurrence.
- the similarity between the selected expression 322 and a found occurrence may be scored, for example, using the edit distance algorithm or the viterbi algorithm (e.g., see “Using Lexigraphical Distancing to Block Spam”, Jonathan Oliver, in Presentation of the Second MIT Spam Conference, Cambridge, Mass., 2005 and “Spam deobfuscation using a hidden Markov model”, Honglak Lee and Andrew Y. Ng. in Proceedings of the Second Conference on Email and Anti-Spam (CEAS 2005)). Other scoring algorithms may also be used without detracting from the merits of the present invention.
- a minimum threshold indicative of the amount of correspondence required to obtain a match between an expression 322 and a found occurrence may be predetermined.
- the value of the threshold may be obtained and optimized empirically. If the score of the step 504 is higher than the threshold, the antispam engine 320 may deem the selected expression 322 as being close enough to the occurrence that a match is obtained, i.e., the selected expression 322 is deemed found in the image 323 (step 505 ). In that case, the antispam engine 320 records that the selected expression was found at the location of the occurrence in the image 323 . For each image 323 , the antispam engine 320 may repeat the above-described process for each of the expressions 322 (step 506 ).
- a separate scoring procedure may be performed for all identified expressions 322 to determine whether or not the image is a spam image. For example, once the expressions 322 present in the image 323 have been identified, the antispam engine 320 may employ conventional text-based algorithms to determine if the identified expressions 322 are sufficient to deem the image 323 a spam image. The email 324 from which a spam image was extracted may be deemed as spam.
- FIG. 6 shows a spam image included in an email and processed using the method 500 .
- FIG. 7 shows the inappropriate text content found by the method 500 on the spam image of FIG. 6 . Note that the inappropriate text content, which is included in a list of expressions 322 , has been simplified for ease of processing by removing spaces between phrases.
- FIG. 8 shows a flow diagram of a method 800 of identifying inappropriate text content in images in accordance with yet another embodiment of the present invention.
- FIG. 8 is explained using the components shown in FIG. 3 . Other components may also be used without detracting from the merits of the present invention.
- the method 800 starts after the antispam engine 320 extracts an image 323 from an email 324 .
- the antispam engine 320 selects an expression from the expressions 322 (step 801 ).
- the antispam engine 320 finds a section in the image 323 that is suitably similar to the selected expression 322 (step 802 ).
- the antispam engine 320 may find a section in the image 323 that could match the beginning and end of the selected expression 322 in terms of shape.
- a shape matching algorithm such as that previously mentioned with reference to step 503 of FIG. 5 or other conventional shape matching algorithm, may be employed to perform the step 802 .
- the antispam engine 320 builds a text string directly (i.e., without first converting the image to text by OCR, for example) from the section of the image and then scores the text string against the selected expression to determine the closeness of the selected expression 322 to the found section (step 803 ). The higher the resulting score, the higher the likelihood the selected expression 322 matches the section.
- the antispam engine 320 may process the section of the image 323 between the potential start and end points that could match the selected expression 322 .
- the pixel blocks in between the potential start and end points are then assigned probabilities of being the characters under consideration (for example the characters in the ASCII character set).
- the pixel blocks in between the potential start and end points are then scored using the aforementioned edit algorithm or viterbi algorithm to determine the similarity of the selected expression 322 to the found section.
- a minimum threshold indicative of the amount of correspondence required to obtain a match between an expression 322 and a found section may be predetermined.
- the value of the threshold may be obtained and optimized empirically. If the score of the similarity between the selected expression 322 and the found section of the image 323 is higher than the threshold, the antispam engine 320 may deem the selected expression 322 as being close enough to the found section that there is a match, i.e., the selected expression 322 is deemed found in the image 323 (step 804 ). In that case, the antispam engine 320 records that the selected expression was found at the location of the section in the image 323 .
- the antispam engine 320 may repeat the above-described process for each of the expressions 322 (step 805 ).
- a separate scoring procedure may be performed for all identified expressions 322 to determine whether or not an image is a spam image. For example, once the expressions 322 present in the image 323 have been identified, the antispam engine 320 may employ conventional text-based algorithms to determine if the identified expressions 322 are sufficient to deem the image 323 a spam image.
- the email 324 from which a spam image was extracted may be deemed as spam.
- embodiments of the present invention may be employed in applications other than antispam. This is because the above-disclosed techniques may be employed to identify text content in images in general, the images being present in various types of messages including emails, web page postings, electronic documents, and so on.
- the components shown in FIG. 3 may be configured for other applications including anti-phishing, identification of confidential information in emails, identification of communications that breach policies or regulations in emails, and other computer security applications involving identification of text content in images.
- links to phishing sites may be included in the expressions 322 .
- the antispam engine 320 may be configured to determine if an image included in an email has text content matching a link to a phishing site included in the expressions 322 .
- Confidential e.g., company trade secret information or intellectual property
- prohibited e.g., text content that is against policy or regulation
- FIGS. 9A and 9B illustrate conventional OCR processing 900 for identifying text content in an image.
- OCR processing 900 takes an image as an input and outputs text found in the image.
- the OCR processing 900 is similar to GOCR and Tesseract OCR systems.
- FIG. 9B shows a flow diagram of the OCR processing 900 .
- the OCR processing 900 may be divided into several phases, labeled 901 - 906 in FIG. 9B .
- Phases 902 , 903 and 904 may be performed in different order depending on the OCR application. In some applications, phases 902 , 903 and 904 may be interspersed with each other.
- OCR processing 900 begins with processing the image to split it into one or more character-blocks or other regions, each character-block potentially representing one or more characters (phase 901 ).
- the character-blocks are then processed to identify the most likely character (e.g., letters, digits, or symbols) the character-blocks represent (phase 902 ).
- This phase, phase 902 may be performed using a variety of techniques including handcrafted code (e.g., see GOCR) or using statistical approaches (e.g., see Cheng-Lin Liu and Hiromichi Fujisawa, “Classification and Learning for Character Recognition: Comparison of Methods and Remaining Problems”).
- Phase 902 will be most accurate if the character-blocks formed in phase 901 reflect single characters or the pixels set in the character-blocks are similar or match the pixels of the intended character.
- character-blocks that are difficult to identify in phase 902 may be grouped together into a single character-block or split apart into several character-blocks to make it easier to identify the possible character included in the character-block (phase 903 ).
- Character-blocks constituting a line of text are then identified (phase 904 ).
- a string is formed by concatenating the most likely characters represented (phase 905 ).
- a post processing step may be performed on the output (from phase 905 ), such as spell check and other correction steps (phase 906 ).
- embodiments of the present invention may be employed to identify terms, phrases, and other text in images in a variety of applications including in antispam, anti-phishing, and email processing to identify unauthorized emailing of confidential information or other information that breaches some policy or regulation.
- an email may be created to include anti-OCR features to defeat OCR-based approaches.
- Conventional OCR processing approaches, such OCR processing 900 may be easily confused by these anti-OCR features, hence the need for the present invention.
- FIGS. 10A-10F show example images containing anti-OCR features.
- FIG. 10A shows an image with angled writing.
- FIG. 10B shows an image having a blurred background.
- FIG. 10C shows an image with cursive-like writing to make it difficult to form coherent character-blocks as in phase 901 of OCR processing 900 .
- the reason that forming co-herent character blocks is difficult in that case is that in many cases the letters touch at the bottom, so with this image, the character blocks often contain two or more characters.
- FIG. 10D shows an image with underlined letters to lower the accuracy of identifying characters in character-blocks as in phase 902 of OCR processing 900 .
- FIG. 10D also has characters that go up and down to lower the accuracy of identifying character-blocks that constitute a line of text as in phase 904 of OCR processing 900 .
- FIG. 10E shows an image having dots and speckles to increase the number of potential character-blocks and to lower the accuracy of identifying characters in character-blocks as in phase 902 of OCR processing 900 , since the speckles and dots make it unclear which letter is intended.
- FIG. 10F shows an image with small gaps in the letters. For example, by clever use of a dark blue font, an OCR system may be tricked into identifying an “m” as two letters that look like an “n” and an “l” as in the pixel configuration of the character-block 941 of FIG. 11 .
- a pure adversarial OCR system may be employed to increase the accuracy of identifying search terms in images.
- a pure adversarial OCR system in accordance with an embodiment of the present invention is now described beginning with FIG. 12 .
- FIG. 12 shows a schematic diagram of a computer 930 in accordance with an embodiment of the present invention.
- the computer 930 is the same as the computer 300 of FIG. 3 , except for the use of an email processing engine 325 and a pure adversarial OCR module 326 instead of the antispam engine 320 and the OCR module 321 .
- the email processing engine 325 may comprise computer-readable program code for processing an email to perform one or more of a variety of applications including, antispam, anti-phishing, checking for confidential or other information for regulation or policy enforcement, and so on.
- the email processing engine 325 may be configured to extract an image 323 from an email 324 and use the adversarial OCR module 326 to identify text in the image 323 .
- the email processing engine 325 may comprise conventional email processing software that uses OCR to identify text in images.
- the email processing engine 325 may comprise conventional antispam software that would receive an email, extract an image from the email, forward the image to the adversarial OCR module 326 to identify text in the image, and to score the email based on the identified text.
- the pure adversarial OCR module 326 may comprise computer-readable program code for extracting search terms and expressions from an image using a pure adversarial OCR approach.
- the adversarial OCR module 326 may be configured to receive an image in the form of an image file or other representation from the email processing engine 325 (or other programs), and process the image to identify text present in the image.
- the adversarial OCR module 326 may process an image using a pure adversarial OCR processing 920 described with reference to FIGS. 13A and 13B .
- the other components of the computer 930 have already been described with reference to the computer 300 of FIG. 3 .
- FIGS. 13A and 13B illustrate the pure adversarial OCR processing 920 in accordance with an embodiment of the present invention.
- the pure adversarial OCR processing 920 takes as inputs an image and search terms, and outputs the search terms found (if any) in the image and location of found search terms in the image.
- the search terms comprise the expressions 322 . That is, the OCR processing 920 may take in an image and expressions 322 , look for the expressions 322 in the image, and provide information on the location of expressions 322 found in the image. This is in marked contrast to conventional OCR processing where an image is taken as an input and the OCR processing outputs text found in the image.
- the pure adversarial OCR processing 920 may be performed in multiple phases or steps, as shown in the flow diagram of FIG. 13B .
- processing 920 begins by splitting the input image into character-blocks or other regions potentially having characters.
- Each character-block may comprise pixel information of a single character (e.g., ASCII character) or multiple characters.
- One way of performing phase 921 is to:
- Phase 921 may also be performed using other techniques without detracting from the merits of the present invention.
- phase 922 the probability that each character-block formed in phase 921 contains a character, such as various letters, digits, or symbols, is calculated. Note that phase 922 does not necessarily require identification of the particular character that may be present in a character-block. This advantageously makes OCR processing 920 more robust compared to conventional OCR processing.
- Phase 922 may be performed using handcrafted code as in GOCR or by using statistical approaches (e.g., see Cheng-Lin Liu and Hiromichi Fujisawa, “Classification and Learning for Character Recognition: Comparison of Methods and Remaining Problems”).
- the character-block 942 might get assigned a reasonable probability (e.g., greater than 0.9) of being either the character “B”, “8”, or “&”.
- This probability calculation may be performed using a support vector machine (SVM) by training an SVM using annotated data sets, taking the SVM score, and then normalizing the SVM score to obtain a probability estimate.
- SVM support vector machine
- Other techniques for calculating the probability that the character-blocks contain characters may also be employed without detracting from the merits of the present invention.
- Phase 923 is an optional phase.
- character-blocks that are difficult to identify in phase 922 may be grouped together into a single character-block or split apart into several character-blocks.
- the two character-blocks can be combined.
- the character-blocks 943 and 944 may be merged into character-blocks 941 of FIG. 11 .
- the probability that character-block 941 contains a character may then be recalculated. Similar rules may be applied to split a single character-block to several character-blocks.
- Phase 924 a candidate sequence of character-blocks is identified.
- Phase 924 may be performed by identifying one or more character-blocks that are likely to match the start of the search term, and identifying one or more character-blocks that are likely to match the end of the search term.
- the similarity of the identified candidate sequence (in phase 924 ) to the input search terms is calculated. For example, a similarity score indicative of the similarity of a search term to the candidate sequence may be calculated and compared to a similarity threshold. The search term may be deemed to be present in the image if the similarity score is greater than the threshold.
- the threshold may be determined empirically, for example.
- Phase 925 may be performed using various techniques including a dynamic programming approach or the viterbi algorithm (e.g., see “Dynamic Programming Algorithm for Sequence Alignment”, by Lloyd Allison at Internet URL ⁇ http://www.csse.monash.edu.au/ ⁇ lloyd/tildeStrings/Notes/DPA.html>). Other techniques for evaluating similarities may also be used without detracting from the merits of the present invention.
- phase 925 consider matching a candidate sequence of character-blocks that have the following probability estimates calculated in phase 922 .
- the final score for the sequence of character-blocks against the search term “symbol” is 13.96. This final score may be good enough to deem the image as having the search term “symbol” in it.
- the location of “symbol” may be output by the processing 920 based on the location of the character-blocks forming the search term. That is, the location of the found search term is the location of the corresponding sequence of character-blocks in the image (e.g., defined by pixel location).
- the pure adversarial approach takes an image and search terms as inputs, and outputs the search terms found in the image and the locations of the search terms.
- pure adversarial OCR processing does not necessarily require establishment of which letter, digit, or symbol a character-block contains.
- traditional OCR approaches requires determination of which letter, digit, or symbol is in a character-block. This makes traditional OCR approaches vulnerable to anti-OCR features that use confusing and ambiguous characters, such as an upper case “i”, a vertical bar, a lower case “l”, a lower case “L”, and an exclamation point, to name a few examples.
- phase 922 of the processing 920 distinguishing between characters that may be in a character-block is not critical, and hence typically not performed, in phase 922 of the processing 920 . This is because the processing 920 does not require conversion of an image into text to determine if a search term is present in the image. The processing 920 allows for determination of whether or not a search term is present in an image by working directly with the image. Phase 925 of the processing 920 allows lines of text containing any of the aforementioned ambiguous characters to be matched to search terms without particularly identifying a particular ambiguous character in a particular character-block.
Abstract
Description
- This application is a continuation-in-part of U.S. patent application Ser. No. 11/803,963, filed on May 16, 2007, which claims the benefit of U.S. Provisional Application No. 60/872,928, filed on Dec. 4, 2006.
- This application claims the benefit of U.S. Provisional Application No. 60/872,928, filed on Dec. 4, 2006.
- The above-identified U.S. Provisional and Patent Applications are incorporated herein by reference in their entirety.
- 1. Field of the Invention
- The present invention relates generally to computer security, and more particularly but not exclusively to methods and apparatus for identifying text content in images.
- 2. Description of the Background Art
- Electronic mail (“email”) has become a relatively common means of communication among individuals with access to a computer network, such as the Internet. Among its advantages, email is relatively convenient, fast, and cost-effective compared to traditional mail. It is thus no surprise that a lot of businesses and home computer users have some form of email access. Unfortunately, the features that make email popular also lead to its abuse. Specifically, unscrupulous advertisers, also known as “spammers,” have resorted to mass emailings of advertisements over the Internet. These mass emails, which are also referred to as “spam emails” or simply “spam,” are sent to computer users regardless of whether they asked for them or not. Spam includes any unsolicited email, not just advertisements. Spam is not only a nuisance, but also poses an economic burden.
- Previously, the majority of spam consisted of text and images that are linked to websites. More recently, spammers are sending spam with an image containing the inappropriate content (i.e., the unsolicited message). The reason for embedding inappropriate content in an image is that spam messages can be distinguished from normal or legitimate messages in at least two ways. First, the inappropriate content (e.g., words such as “Viagra”, “free”, “online prescriptions,” etc.) can be readily detected by keyword and statistical filters (e.g., see Sahami M., Dumais S., Heckerman D., and Horvitz E., “A Bayesian Approach to Filtering Junk E-mail,” AAAI'98 Workshop on Learning for Text Categorization, 27 Jul. 1998, Madison, Wis.). Second, the domain in URLs (uniform resource locators) in the spam can be compared to databases of known bad domains and links (e.g., see Internet URL <http://www.surbl.org/>).
- In contrast, a spam email where the inappropriate content and URLs are embedded in an image may be harder to classify because the email itself does not contain obvious spammy textual content and does not have a link/domain that can be looked up in a database of bad links/domains.
- Using OCR (optical character recognition) techniques to identify spam images (i.e., images having embedded spammy content) have been proposed because OCR can be used to identify text in images. In general, use of OCR for anti-spam applications would involve performing OCR on an image to extract text from the image, scoring the extracted text, and comparing the score to a threshold to determine if the image contains spammy content. Examples of anti-spam applications that may incorporate OCR functionality include the SpamAssassin and Barracuda Networks spam filters. Spammers responded to OCR solutions in spam filters with images deliberately designed with anti-OCR features. Other approaches to combat spam images include flesh-tone analysis and use of regular expressions.
- The present invention provides a novel and effective approach for identifying content in an image even when the image has anti-OCR features.
- In one embodiment, an image and a search term are input to a pure adversarial OCR module configured to search the image for presence of the search term. The image may be extracted from an email by an email processing engine. The OCR module may split the image into several character-blocks that each has a reasonable probability of containing a character (e.g., an ASCII character). The OCR module may form a sequence of blocks that represent a candidate match for the search term and estimate the probability of a match between the sequence of blocks and the search term. The OCR module may be configured to output whether or not the search term is found in the image and, if applicable, the location of the search term in the image. Embodiments of the present invention may be employed in a variety of applications including, but not limited to, antispam, anti-phishing, email scanning for confidential or prohibited information, etc.
- These and other features of the present invention will be readily apparent to persons of ordinary skill in the art upon reading the entirety of this disclosure, which includes the accompanying drawings and claims.
-
FIG. 1 shows an example image included in a spam. -
FIG. 2 shows text extracted from the image ofFIG. 1 by optical character recognition. -
FIG. 3 shows a schematic diagram of a computer in accordance with an embodiment of the present invention. -
FIG. 4 shows a flow diagram of a method of identifying inappropriate text content in images in accordance with an embodiment of the present invention. -
FIG. 5 shows a flow diagram of a method of identifying inappropriate text content in images in accordance with another embodiment of the present invention. -
FIG. 6 shows a spam image included in an email and processed using the method ofFIG. 5 . -
FIG. 7 shows inappropriate text content found in the spam image ofFIG. 6 using the method ofFIG. 5 . -
FIG. 8 shows a flow diagram of a method of identifying inappropriate text content in images in accordance with yet another embodiment of the present invention. -
FIGS. 9A and 9B illustrate conventional OCR processing. -
FIGS. 10A-10F show example images that contain anti-OCR features. -
FIGS. 11 , 14, and 15 show example character-blocks. -
FIG. 12 shows a schematic diagram of a computer in accordance with an embodiment of the present invention. -
FIGS. 13A and 13B illustrate a pure adversarial OCR processing in accordance with an embodiment of the present invention. - The use of the same reference label in different drawings indicates the same or like components.
- In the present disclosure, numerous specific details are provided, such as examples of apparatus, components, and methods, to provide a thorough understanding of embodiments of the invention. Persons of ordinary skill in the art will recognize, however, that the invention can be practiced without one or more of the specific details. In other instances, well-known details are not shown or described to avoid obscuring aspects of the invention.
-
FIG. 1 shows an example image included in a spam. The spam image ofFIG. 1 includes anti-OCR features in the form of an irregular background, fonts, and color scheme to confuse an OCR module.FIG. 2 shows the text extracted from the image ofFIG. 1 using conventional OCR process. The anti-OCR features fooled the OCR module enough to make the text largely unintelligible, making it difficult to determine if the image contains inappropriate content, such as those commonly used in spam emails. - Referring now to
FIG. 3 , there is shown a schematic diagram of acomputer 300 in accordance with an embodiment of the present invention. Thecomputer 300 may have less or more components to meet the needs of a particular application. Thecomputer 300 may include aprocessor 101, such as those from the Intel Corporation or Advanced Micro Devices, for example. Thecomputer 300 may have one ormore buses 103 coupling its various components. Thecomputer 300 may include one or more user input devices 102 (e.g., keyboard, mouse), one or more data storage devices 106 (e.g., hard drive, optical disk, USB memory), a display monitor 104 (e.g., LCD, flat panel monitor, CRT), a computer network interface 105 (e.g., network adapter, modem), and a main memory 108 (e.g., RAM). In the example ofFIG. 1 , themain memory 108 includes anantispam engine 320, anOCR module 321,expressions 322,images 323, and emails 324. The components shown as being in themain memory 108 may be loaded from adata storage device 106 for execution or reading by theprocessor 101. For example, theemails 324 may be received over the Internet by way of thecomputer network interface 105, buffered in thedata storage device 106, and then loaded onto themain memory 108 for processing by theantispam engine 320. Similarly, theantispam engine 320 may be stored in thedata storage device 106 and then loaded onto themain memory 108 to provide antispam functionalities in thecomputer 300. - The
antispam engine 320 may comprise computer-readable program code for identifying spam emails or other data with inappropriate content, which may comprise text that includes one or more words and phrases identified in theexpressions 322. Theantispam engine 320 may be configured to extract animage 323 from anemail 324, use theOCR module 321 to extract text from theimage 323, and process the extracted text output to determine if theimage 323 includes inappropriate content, such as anexpression 322. For example, theantispam engine 320 may be configured to determine if one or more expressions in theexpressions 322 are present in the extracted text. Theantispam engine 320 may also be configured to directly process theimage 323, without having to extract text from theimage 323, to determine whether or not theimage 323 includes inappropriate content. For example, theantispam engine 320 may directly compare theexpressions 322 to sections of theimage 323. Theantispam engine 320 may deememails 324 with inappropriate content as spam. - The
OCR module 321 may comprise computer-readable program code for extracting text from an image. TheOCR module 321 may be configured to receive an image in the form of an image file or other representation and process the image to generate text from the image. TheOCR module 321 may comprise a conventional OCR module. In one embodiment, theOCR module 321 is employed to extract embedded texts from theimages 323, which in turn are extracted from theemails 324. - The
expressions 322 may comprise words, phrases, terms, or other character combinations or strings that may be present in spam images. Examples of such expressions may include “brokers,” “companyname” (particular companies), “currentprice,” “5daytarget,” “strongbuy,” “symbol,” “tradingalert” and so on. Theexpressions 322 may be obtained from samples of confirmed spam emails, for example. - As will be more apparent below, embodiments of the present invention are adversarial in that they select an expression from the
expressions 322 and specifically look for the selected expression in the image, either directly or from the text output of theOCR module 321. That is, instead of extracting text from an image and querying whether the extracted text is in a listing of expressions, embodiments of the present invention ask the question of whether a particular expression is in an image. The adversarial approach allows for better accuracy in identifying inappropriate content in images in that it focuses search for a particular expression, allowing for more accurate reading of text embedded in images. - The
emails 324 may comprise emails received over thecomputer network interface 105 or other means. Theimages 323 may comprise images extracted from theemails 324. Theimages 324 may be in any conventional image format including JPEG, TIFF, etc. -
FIG. 4 shows a flow diagram of amethod 400 of identifying inappropriate text content in images in accordance with an embodiment of the present invention.FIG. 4 is explained using the components shown inFIG. 3 . Other components may also be used without detracting from the merits of the present invention. - The
method 400 starts after theantispam engine 320 extracts animage 323 from anemail 324. Theantispam engine 320 then selects an expression from the expressions 322 (step 401). Using the selected expression as a reference, theantispam engine 320 determines if there is a section of theimage 323 that corresponds to the start and end of the selected expression (step 402). That is, the selected expression is used as a basis in finding a corresponding section. For example, theantispam engine 320 may determine if theimage 323 includes a section that looks similar to the selectedexpression 322 in terms of shape. Theantispam engine 320 then compares the selectedexpression 322 to the section to determine the closeness of the selectedexpression 322 to the section. In one embodiment, this is performed by theantispam engine 320 by scoring the section against the selected expression (step 403). The score may reflect how close the selectedexpression 322 is to the section. For example, the higher the score, the higher the likelihood that the selectedexpression 322 matches the section. A minimum threshold indicative of the amount of correspondence required to obtain a match between anexpression 322 and a section may be predetermined. The value of the threshold may be obtained and optimized empirically. If the score is higher than the threshold, theantispam engine 320 may deem the selectedexpression 322 as being close enough to the section that a match is obtained, i.e., the selectedexpression 322 is deemed found in the image 323 (step 404). In that case, theantispam engine 320 records that the selected expression was found at the location of the section in theimage 323. For eachimage 323, theantispam engine 320 may repeat the above-described process for each of the expressions 322 (step 405). A separate scoring procedure may be performed for all identifiedexpressions 322 to determine whether or not the image is a spam image. For example, once theexpressions 322 present in theimage 323 have been identified, theantispam engine 320 may employ conventional text-based algorithms to determine if the identifiedexpressions 322 are sufficient to deem the image 323 a spam image. Theemail 324 from which a spam image was extracted may be deemed as spam. -
FIG. 5 shows a flow diagram of amethod 500 of identifying inappropriate text content in images in accordance with another embodiment of the present invention.FIG. 5 is explained using the components shown inFIG. 3 . Other components may also be used without detracting from the merits of the present invention. - The
method 500 starts after theantispam engine 320 extracts animage 323 from anemail 324. TheOCR module 321 then extracts text from the image, hereinafter referred to as “OCR text output” (step 501). Theantispam engine 320 selects an expression from the expressions 322 (step 502). Using the selected expression as a reference, theantispam engine 320 finds an occurrence in the OCR text output that is suitably similar to the selected expression 322 (step 503). For example, theantispam engine 320 may find one or more occurrences in the OCR text output that could match the beginning and end of the selectedexpression 322 in terms of shape. Conventional shape matching algorithms may be employed to perform thestep 503. For example, the antispam engine may employ the shape matching algorithm disclosed in the publication “Shape Matching and Object Recognition Using Shape Contexts”, S. Belongie, J. Malik, and J. Puzicha., IEEE Transactions on PAMI, Vol 24, No. 24, April 2002. Other shape matching algorithms may also be employed without detracting from the merits of the present invention. - The
antispam engine 320 determines the closeness of the selectedexpression 322 to each found occurrence, such as by assigning a score indicative of how well the selectedexpression 322 matches each found occurrence in the OCR text output (step 504). For example, the higher the score, the higher the likelihood the selectedexpression 322 matches the found occurrence. The similarity between the selectedexpression 322 and a found occurrence may be scored, for example, using the edit distance algorithm or the viterbi algorithm (e.g., see “Using Lexigraphical Distancing to Block Spam”, Jonathan Oliver, in Presentation of the Second MIT Spam Conference, Cambridge, Mass., 2005 and “Spam deobfuscation using a hidden Markov model”, Honglak Lee and Andrew Y. Ng. in Proceedings of the Second Conference on Email and Anti-Spam (CEAS 2005)). Other scoring algorithms may also be used without detracting from the merits of the present invention. - In the
method 500, a minimum threshold indicative of the amount of correspondence required to obtain a match between anexpression 322 and a found occurrence may be predetermined. The value of the threshold may be obtained and optimized empirically. If the score of thestep 504 is higher than the threshold, theantispam engine 320 may deem the selectedexpression 322 as being close enough to the occurrence that a match is obtained, i.e., the selectedexpression 322 is deemed found in the image 323 (step 505). In that case, theantispam engine 320 records that the selected expression was found at the location of the occurrence in theimage 323. For eachimage 323, theantispam engine 320 may repeat the above-described process for each of the expressions 322 (step 506). A separate scoring procedure may be performed for all identifiedexpressions 322 to determine whether or not the image is a spam image. For example, once theexpressions 322 present in theimage 323 have been identified, theantispam engine 320 may employ conventional text-based algorithms to determine if the identifiedexpressions 322 are sufficient to deem the image 323 a spam image. Theemail 324 from which a spam image was extracted may be deemed as spam. -
FIG. 6 shows a spam image included in an email and processed using themethod 500.FIG. 7 shows the inappropriate text content found by themethod 500 on the spam image ofFIG. 6 . Note that the inappropriate text content, which is included in a list ofexpressions 322, has been simplified for ease of processing by removing spaces between phrases. -
FIG. 8 shows a flow diagram of amethod 800 of identifying inappropriate text content in images in accordance with yet another embodiment of the present invention.FIG. 8 is explained using the components shown inFIG. 3 . Other components may also be used without detracting from the merits of the present invention. - The
method 800 starts after theantispam engine 320 extracts animage 323 from anemail 324. Theantispam engine 320 then selects an expression from the expressions 322 (step 801). Theantispam engine 320 finds a section in theimage 323 that is suitably similar to the selected expression 322 (step 802). For example, theantispam engine 320 may find a section in theimage 323 that could match the beginning and end of the selectedexpression 322 in terms of shape. A shape matching algorithm, such as that previously mentioned with reference to step 503 ofFIG. 5 or other conventional shape matching algorithm, may be employed to perform thestep 802. - The
antispam engine 320 builds a text string directly (i.e., without first converting the image to text by OCR, for example) from the section of the image and then scores the text string against the selected expression to determine the closeness of the selectedexpression 322 to the found section (step 803). The higher the resulting score, the higher the likelihood the selectedexpression 322 matches the section. For example, to identify the text string, theantispam engine 320 may process the section of theimage 323 between the potential start and end points that could match the selectedexpression 322. The pixel blocks in between the potential start and end points (a region of connected pixels) are then assigned probabilities of being the characters under consideration (for example the characters in the ASCII character set). The pixel blocks in between the potential start and end points are then scored using the aforementioned edit algorithm or viterbi algorithm to determine the similarity of the selectedexpression 322 to the found section. - In the
method 800, a minimum threshold indicative of the amount of correspondence required to obtain a match between anexpression 322 and a found section may be predetermined. The value of the threshold may be obtained and optimized empirically. If the score of the similarity between the selectedexpression 322 and the found section of theimage 323 is higher than the threshold, theantispam engine 320 may deem the selectedexpression 322 as being close enough to the found section that there is a match, i.e., the selectedexpression 322 is deemed found in the image 323 (step 804). In that case, theantispam engine 320 records that the selected expression was found at the location of the section in theimage 323. For eachimage 323, theantispam engine 320 may repeat the above-described process for each of the expressions 322 (step 805). A separate scoring procedure may be performed for all identifiedexpressions 322 to determine whether or not an image is a spam image. For example, once theexpressions 322 present in theimage 323 have been identified, theantispam engine 320 may employ conventional text-based algorithms to determine if the identifiedexpressions 322 are sufficient to deem the image 323 a spam image. Theemail 324 from which a spam image was extracted may be deemed as spam. - In light of the present disclosure, those of ordinary skill in the art will appreciate that embodiments of the present invention may be employed in applications other than antispam. This is because the above-disclosed techniques may be employed to identify text content in images in general, the images being present in various types of messages including emails, web page postings, electronic documents, and so on. For example, the components shown in
FIG. 3 may be configured for other applications including anti-phishing, identification of confidential information in emails, identification of communications that breach policies or regulations in emails, and other computer security applications involving identification of text content in images. For anti-phishing applications, links to phishing sites may be included in theexpressions 322. In that case, theantispam engine 320 may be configured to determine if an image included in an email has text content matching a link to a phishing site included in theexpressions 322. Confidential (e.g., company trade secret information or intellectual property) or prohibited (e.g., text content that is against policy or regulation) information may also be included in theexpressions 322 so that theantispam engine 320 may determine if such information is present in an image included in an email message. -
FIGS. 9A and 9B illustrateconventional OCR processing 900 for identifying text content in an image. As shown inFIG. 9A , OCR processing 900 takes an image as an input and outputs text found in the image. TheOCR processing 900 is similar to GOCR and Tesseract OCR systems. -
FIG. 9B shows a flow diagram of theOCR processing 900. TheOCR processing 900 may be divided into several phases, labeled 901-906 inFIG. 9B .Phases -
OCR processing 900 begins with processing the image to split it into one or more character-blocks or other regions, each character-block potentially representing one or more characters (phase 901). The character-blocks are then processed to identify the most likely character (e.g., letters, digits, or symbols) the character-blocks represent (phase 902). This phase,phase 902, may be performed using a variety of techniques including handcrafted code (e.g., see GOCR) or using statistical approaches (e.g., see Cheng-Lin Liu and Hiromichi Fujisawa, “Classification and Learning for Character Recognition: Comparison of Methods and Remaining Problems”).Phase 902 will be most accurate if the character-blocks formed inphase 901 reflect single characters or the pixels set in the character-blocks are similar or match the pixels of the intended character. - Optionally, character-blocks that are difficult to identify in
phase 902 may be grouped together into a single character-block or split apart into several character-blocks to make it easier to identify the possible character included in the character-block (phase 903). Character-blocks constituting a line of text are then identified (phase 904). For each line of text identified (in phase 904), a string is formed by concatenating the most likely characters represented (phase 905). Optionally, a post processing step may be performed on the output (from phase 905), such as spell check and other correction steps (phase 906). - As can be appreciated, embodiments of the present invention may be employed to identify terms, phrases, and other text in images in a variety of applications including in antispam, anti-phishing, and email processing to identify unauthorized emailing of confidential information or other information that breaches some policy or regulation. In these applications, an email may be created to include anti-OCR features to defeat OCR-based approaches. Conventional OCR processing approaches,
such OCR processing 900, may be easily confused by these anti-OCR features, hence the need for the present invention. -
FIGS. 10A-10F show example images containing anti-OCR features.FIG. 10A shows an image with angled writing.FIG. 10B shows an image having a blurred background.FIG. 10C shows an image with cursive-like writing to make it difficult to form coherent character-blocks as inphase 901 ofOCR processing 900. The reason that forming co-herent character blocks is difficult in that case is that in many cases the letters touch at the bottom, so with this image, the character blocks often contain two or more characters.FIG. 10D shows an image with underlined letters to lower the accuracy of identifying characters in character-blocks as inphase 902 ofOCR processing 900. The image ofFIG. 10D also has characters that go up and down to lower the accuracy of identifying character-blocks that constitute a line of text as inphase 904 ofOCR processing 900.FIG. 10E shows an image having dots and speckles to increase the number of potential character-blocks and to lower the accuracy of identifying characters in character-blocks as inphase 902 ofOCR processing 900, since the speckles and dots make it unclear which letter is intended.FIG. 10F shows an image with small gaps in the letters. For example, by clever use of a dark blue font, an OCR system may be tricked into identifying an “m” as two letters that look like an “n” and an “l” as in the pixel configuration of the character-block 941 ofFIG. 11 . - A pure adversarial OCR system may be employed to increase the accuracy of identifying search terms in images. A pure adversarial OCR system in accordance with an embodiment of the present invention is now described beginning with
FIG. 12 . -
FIG. 12 shows a schematic diagram of acomputer 930 in accordance with an embodiment of the present invention. Thecomputer 930 is the same as thecomputer 300 ofFIG. 3 , except for the use of anemail processing engine 325 and a pureadversarial OCR module 326 instead of theantispam engine 320 and theOCR module 321. - The
email processing engine 325 may comprise computer-readable program code for processing an email to perform one or more of a variety of applications including, antispam, anti-phishing, checking for confidential or other information for regulation or policy enforcement, and so on. Theemail processing engine 325 may be configured to extract animage 323 from anemail 324 and use theadversarial OCR module 326 to identify text in theimage 323. Theemail processing engine 325 may comprise conventional email processing software that uses OCR to identify text in images. For example, theemail processing engine 325 may comprise conventional antispam software that would receive an email, extract an image from the email, forward the image to theadversarial OCR module 326 to identify text in the image, and to score the email based on the identified text. - The pure
adversarial OCR module 326 may comprise computer-readable program code for extracting search terms and expressions from an image using a pure adversarial OCR approach. Theadversarial OCR module 326 may be configured to receive an image in the form of an image file or other representation from the email processing engine 325 (or other programs), and process the image to identify text present in the image. Theadversarial OCR module 326 may process an image using a pureadversarial OCR processing 920 described with reference toFIGS. 13A and 13B . The other components of thecomputer 930 have already been described with reference to thecomputer 300 ofFIG. 3 . -
FIGS. 13A and 13B illustrate the pureadversarial OCR processing 920 in accordance with an embodiment of the present invention. As shown inFIG. 13A , the pureadversarial OCR processing 920 takes as inputs an image and search terms, and outputs the search terms found (if any) in the image and location of found search terms in the image. In one embodiment, the search terms comprise theexpressions 322. That is, theOCR processing 920 may take in an image andexpressions 322, look for theexpressions 322 in the image, and provide information on the location ofexpressions 322 found in the image. This is in marked contrast to conventional OCR processing where an image is taken as an input and the OCR processing outputs text found in the image. - The pure
adversarial OCR processing 920 may be performed in multiple phases or steps, as shown in the flow diagram ofFIG. 13B . Inphase 901, processing 920 begins by splitting the input image into character-blocks or other regions potentially having characters. Each character-block may comprise pixel information of a single character (e.g., ASCII character) or multiple characters. One way of performingphase 921 is to: -
- a) Grayscale the Image.
- b) Determine pixels which are “set”—a set pixel is likely to be a part of a character. This can be done by straight forward approaches such as selecting a threshold and defining any pixel with a value above this threshold as being set. Alternatively, a criterion based on the pixel value and surrounding pixels can be applied to determine if the pixel is set.
- c) Go through each pixel that is set and if the current pixel does not belong to an existing character-block then create a new character-block. Define all pixels that are connected to the current pixel by pixels that have been set as belonging to the current character-block. Two pixels may be deemed connected if they are both set and they are adjacent pixels either vertically or horizontally. Optionally, two pixels may also be deemed connected if they touch each other diagonally.
-
Phase 921 may also be performed using other techniques without detracting from the merits of the present invention. - In
phase 922, the probability that each character-block formed inphase 921 contains a character, such as various letters, digits, or symbols, is calculated. Note thatphase 922 does not necessarily require identification of the particular character that may be present in a character-block. This advantageously makesOCR processing 920 more robust compared to conventional OCR processing. -
Phase 922 may be performed using handcrafted code as in GOCR or by using statistical approaches (e.g., see Cheng-Lin Liu and Hiromichi Fujisawa, “Classification and Learning for Character Recognition: Comparison of Methods and Remaining Problems”). For example, referring toFIG. 14 , the character-block 942 might get assigned a reasonable probability (e.g., greater than 0.9) of being either the character “B”, “8”, or “&”. This probability calculation may be performed using a support vector machine (SVM) by training an SVM using annotated data sets, taking the SVM score, and then normalizing the SVM score to obtain a probability estimate. Other techniques for calculating the probability that the character-blocks contain characters may also be employed without detracting from the merits of the present invention. -
Phase 923 is an optional phase. Inphase 923, character-blocks that are difficult to identify inphase 922 may be grouped together into a single character-block or split apart into several character-blocks. - If two character-blocks are close together (a single pixel in the example of
FIG. 15 ) and one or both of them are difficult to identify (e.g., getting low probability of being assigned to characters) and combining the two character-blocks results in a character-block having a higher probability of being a character, then the two character-blocks can be combined. For example, referring toFIG. 15 , the character-blocks blocks 941 ofFIG. 11 . The probability that character-block 941 contains a character may then be recalculated. Similar rules may be applied to split a single character-block to several character-blocks. - In
phase 924, a candidate sequence of character-blocks is identified.Phase 924 may be performed by identifying one or more character-blocks that are likely to match the start of the search term, and identifying one or more character-blocks that are likely to match the end of the search term. - In
phase 925, the similarity of the identified candidate sequence (in phase 924) to the input search terms is calculated. For example, a similarity score indicative of the similarity of a search term to the candidate sequence may be calculated and compared to a similarity threshold. The search term may be deemed to be present in the image if the similarity score is greater than the threshold. The threshold may be determined empirically, for example.Phase 925 may be performed using various techniques including a dynamic programming approach or the viterbi algorithm (e.g., see “Dynamic Programming Algorithm for Sequence Alignment”, by Lloyd Allison at Internet URL <http://www.csse.monash.edu.au/˜lloyd/tildeStrings/Notes/DPA.html>). Other techniques for evaluating similarities may also be used without detracting from the merits of the present invention. - To illustrate
phase 925, consider matching a candidate sequence of character-blocks that have the following probability estimates calculated inphase 922. -
-
CB 1. Prob(S/s/5)=80% - CB 2. Prob(y)=80%
- CB 2. Prob(g/j)=15%
-
CB 3. Prob(m)=80% -
CB 3. Prob(n)=15% -
CB 4. Prob(B/8/&)=80% -
CB 4. Prob(E)=15% -
CB 5. Prob(o/O/0)=80% -
CB 5. Prob(Q/C)=15% - CB 6. Prob(l/i/l/l/!)=80%
- CB 6. Prob(:)=15%
Where “CB 1” is the first character-block, having a probability of 80% to contain the character “S”, “s”, or “5”; “CB 2” is the second following character-block, having a probability of 80% to contain the character “y” and a probability of 15% to contain the character “g” or “j”; “CB 3” is the third character-block (following CB 2) and having a probability of 80% to containing the character “m” and a probability of 15% to contain the character “n”; and so on. Forming a matrix that scores this sequence of character-blocks against the search term “symbol” may result in the matrix of Table 1.
-
-
TABLE 1 CB 1CB 2 CB 3CB 4CB 5CB 6 80% S/s/5 Y M B/8/& o/0/0 l/i/|/l/! 15% g/j n E Q/C : s 0.00 7.91 15.81 26.42 34.33 42.23 y 10.02 1.23 9.14 19.75 27.66 35.56 m 20.04 11.26 2.47 13.08 20.98 28.89 b 30.07 21.28 12.49 11.49 19.40 27.31 o 40.09 31.30 22.51 21.51 12.73 20.63 l 50.11 41.32 32.54 31.54 22.75 13.96
The scores in Table 1 are calculated using the algorithm from the “Dynamic Programming Algorithm for Sequence Alignment,” by Lloyd Allison. From Table 1, the final score for the sequence of character-blocks against the search term “symbol” is 13.96. This final score may be good enough to deem the image as having the search term “symbol” in it. The location of “symbol” may be output by theprocessing 920 based on the location of the character-blocks forming the search term. That is, the location of the found search term is the location of the corresponding sequence of character-blocks in the image (e.g., defined by pixel location). - As can be appreciated, the pure adversarial approach takes an image and search terms as inputs, and outputs the search terms found in the image and the locations of the search terms. This advantageously provides a more accurate identification of search terms compared to conventional OCR approaches. For example, pure adversarial OCR processing does not necessarily require establishment of which letter, digit, or symbol a character-block contains. In contrast, traditional OCR approaches requires determination of which letter, digit, or symbol is in a character-block. This makes traditional OCR approaches vulnerable to anti-OCR features that use confusing and ambiguous characters, such as an upper case “i”, a vertical bar, a lower case “l”, a lower case “L”, and an exclamation point, to name a few examples. Note that distinguishing between characters that may be in a character-block is not critical, and hence typically not performed, in
phase 922 of theprocessing 920. This is because theprocessing 920 does not require conversion of an image into text to determine if a search term is present in the image. Theprocessing 920 allows for determination of whether or not a search term is present in an image by working directly with the image. Phase 925 of theprocessing 920 allows lines of text containing any of the aforementioned ambiguous characters to be matched to search terms without particularly identifying a particular ambiguous character in a particular character-block. - Improved techniques for identifying text content in images have been disclosed. While specific embodiments of the present invention have been provided, it is to be understood that these embodiments are for illustration purposes and not limiting. Many additional embodiments will be apparent to persons of ordinary skill in the art reading this disclosure.
Claims (20)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/893,921 US8045808B2 (en) | 2006-12-04 | 2007-08-16 | Pure adversarial approach for identifying text content in images |
PCT/JP2007/071448 WO2008068987A1 (en) | 2006-12-04 | 2007-10-30 | Pure adversarial approach for identifying text content in images |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US87292806P | 2006-12-04 | 2006-12-04 | |
US11/803,963 US8098939B2 (en) | 2006-12-04 | 2007-05-16 | Adversarial approach for identifying inappropriate text content in images |
US11/893,921 US8045808B2 (en) | 2006-12-04 | 2007-08-16 | Pure adversarial approach for identifying text content in images |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/803,963 Continuation-In-Part US8098939B2 (en) | 2006-12-04 | 2007-05-16 | Adversarial approach for identifying inappropriate text content in images |
Publications (2)
Publication Number | Publication Date |
---|---|
US20080131006A1 true US20080131006A1 (en) | 2008-06-05 |
US8045808B2 US8045808B2 (en) | 2011-10-25 |
Family
ID=39123064
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/893,921 Active 2030-05-09 US8045808B2 (en) | 2006-12-04 | 2007-08-16 | Pure adversarial approach for identifying text content in images |
Country Status (2)
Country | Link |
---|---|
US (1) | US8045808B2 (en) |
WO (1) | WO2008068987A1 (en) |
Cited By (166)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090077617A1 (en) * | 2007-09-13 | 2009-03-19 | Levow Zachary S | Automated generation of spam-detection rules using optical character recognition and identifications of common features |
US20100158395A1 (en) * | 2008-12-19 | 2010-06-24 | Yahoo! Inc., A Delaware Corporation | Method and system for detecting image spam |
US20110213850A1 (en) * | 2008-08-21 | 2011-09-01 | Yamaha Corporation | Relay apparatus, relay method and recording medium |
CN102298696A (en) * | 2010-06-28 | 2011-12-28 | 方正国际软件(北京)有限公司 | Character recognition method and system |
US20120023566A1 (en) * | 2008-04-21 | 2012-01-26 | Sentrybay Limited | Fraudulent Page Detection |
US20120308138A1 (en) * | 2011-06-03 | 2012-12-06 | Apple Inc | Multi-resolution spatial feature extraction for automatic handwriting recognition |
US20130041655A1 (en) * | 2010-01-29 | 2013-02-14 | Ipar, Llc | Systems and Methods for Word Offensiveness Detection and Processing Using Weighted Dictionaries and Normalization |
US20130163823A1 (en) * | 2006-04-04 | 2013-06-27 | Cyclops Technologies, Inc. | Image Capture and Recognition System Having Real-Time Secure Communication |
US20130163822A1 (en) * | 2006-04-04 | 2013-06-27 | Cyclops Technologies, Inc. | Airborne Image Capture and Recognition System |
US8527436B2 (en) | 2010-08-30 | 2013-09-03 | Stratify, Inc. | Automated parsing of e-mail messages |
US20140156678A1 (en) * | 2008-12-31 | 2014-06-05 | Sonicwall, Inc. | Image based spam blocking |
US8892446B2 (en) | 2010-01-18 | 2014-11-18 | Apple Inc. | Service orchestration for intelligent automated assistant |
US20140369567A1 (en) * | 2006-04-04 | 2014-12-18 | Cyclops Technologies, Inc. | Authorized Access Using Image Capture and Recognition System |
US20140369566A1 (en) * | 2006-04-04 | 2014-12-18 | Cyclops Technologies, Inc. | Perimeter Image Capture and Recognition System |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US9190062B2 (en) | 2010-02-25 | 2015-11-17 | Apple Inc. | User profiling for voice input processing |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9626955B2 (en) | 2008-04-05 | 2017-04-18 | Apple Inc. | Intelligent text-to-speech conversion |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9953088B2 (en) | 2012-05-14 | 2018-04-24 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10176500B1 (en) * | 2013-05-29 | 2019-01-08 | A9.Com, Inc. | Content classification based on data recognition |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10199051B2 (en) | 2013-02-07 | 2019-02-05 | Apple Inc. | Voice trigger for a digital assistant |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10303715B2 (en) | 2017-05-16 | 2019-05-28 | Apple Inc. | Intelligent automated assistant for media exploration |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US10332518B2 (en) | 2017-05-09 | 2019-06-25 | Apple Inc. | User interface for correcting recognition errors |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
US10403283B1 (en) | 2018-06-01 | 2019-09-03 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US10445429B2 (en) | 2017-09-21 | 2019-10-15 | Apple Inc. | Natural language understanding using vocabularies with compressed serialized tries |
CN110414527A (en) * | 2019-07-31 | 2019-11-05 | 北京字节跳动网络技术有限公司 | Character identifying method, device, storage medium and electronic equipment |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10496705B1 (en) | 2018-06-03 | 2019-12-03 | Apple Inc. | Accelerated task performance |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10636424B2 (en) | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
US10643611B2 (en) | 2008-10-02 | 2020-05-05 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US10657328B2 (en) | 2017-06-02 | 2020-05-19 | Apple Inc. | Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10672399B2 (en) | 2011-06-03 | 2020-06-02 | Apple Inc. | Switching between text data and audio data based on a mapping |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10684703B2 (en) | 2018-06-01 | 2020-06-16 | Apple Inc. | Attention aware virtual assistant dismissal |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10755051B2 (en) | 2017-09-29 | 2020-08-25 | Apple Inc. | Rule-based natural language processing |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10791216B2 (en) | 2013-08-06 | 2020-09-29 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US10789945B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Low-latency intelligent automated assistant |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11023513B2 (en) | 2007-12-20 | 2021-06-01 | Apple Inc. | Method and apparatus for searching using an active ontology |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
US11231904B2 (en) | 2015-03-06 | 2022-01-25 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US11281993B2 (en) | 2016-12-05 | 2022-03-22 | Apple Inc. | Model and ensemble compression for metric learning |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
US11314370B2 (en) | 2013-12-06 | 2022-04-26 | Apple Inc. | Method for extracting salient dialog usage from live data |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
US11495218B2 (en) | 2018-06-01 | 2022-11-08 | Apple Inc. | Virtual assistant operation in multi-device environments |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US20240005365A1 (en) * | 2022-06-30 | 2024-01-04 | Constant Contact, Inc. | Email Subject Line Generation Method |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4626777B2 (en) * | 2008-03-14 | 2011-02-09 | 富士ゼロックス株式会社 | Information processing apparatus and information processing program |
US8189924B2 (en) * | 2008-10-15 | 2012-05-29 | Yahoo! Inc. | Phishing abuse recognition in web pages |
US8358843B2 (en) * | 2011-01-31 | 2013-01-22 | Yahoo! Inc. | Techniques including URL recognition and applications |
US10262236B2 (en) | 2017-05-02 | 2019-04-16 | General Electric Company | Neural network training image generation system |
US11108714B1 (en) * | 2020-07-29 | 2021-08-31 | Vmware, Inc. | Integration of an email client with hosted applications |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050216564A1 (en) * | 2004-03-11 | 2005-09-29 | Myers Gregory K | Method and apparatus for analysis of electronic communications containing imagery |
US20080008348A1 (en) * | 2006-02-01 | 2008-01-10 | Markmonitor Inc. | Detecting online abuse in images |
-
2007
- 2007-08-16 US US11/893,921 patent/US8045808B2/en active Active
- 2007-10-30 WO PCT/JP2007/071448 patent/WO2008068987A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050216564A1 (en) * | 2004-03-11 | 2005-09-29 | Myers Gregory K | Method and apparatus for analysis of electronic communications containing imagery |
US20080008348A1 (en) * | 2006-02-01 | 2008-01-10 | Markmonitor Inc. | Detecting online abuse in images |
Cited By (242)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US20130163823A1 (en) * | 2006-04-04 | 2013-06-27 | Cyclops Technologies, Inc. | Image Capture and Recognition System Having Real-Time Secure Communication |
US20140369567A1 (en) * | 2006-04-04 | 2014-12-18 | Cyclops Technologies, Inc. | Authorized Access Using Image Capture and Recognition System |
US20140369566A1 (en) * | 2006-04-04 | 2014-12-18 | Cyclops Technologies, Inc. | Perimeter Image Capture and Recognition System |
US20130163822A1 (en) * | 2006-04-04 | 2013-06-27 | Cyclops Technologies, Inc. | Airborne Image Capture and Recognition System |
US9117447B2 (en) | 2006-09-08 | 2015-08-25 | Apple Inc. | Using event alert text as input to an automated assistant |
US8930191B2 (en) | 2006-09-08 | 2015-01-06 | Apple Inc. | Paraphrasing of user requests and results by automated digital assistant |
US8942986B2 (en) | 2006-09-08 | 2015-01-27 | Apple Inc. | Determining user intent based on ontologies of domains |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US10568032B2 (en) | 2007-04-03 | 2020-02-18 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US20090077617A1 (en) * | 2007-09-13 | 2009-03-19 | Levow Zachary S | Automated generation of spam-detection rules using optical character recognition and identifications of common features |
US11023513B2 (en) | 2007-12-20 | 2021-06-01 | Apple Inc. | Method and apparatus for searching using an active ontology |
US10381016B2 (en) | 2008-01-03 | 2019-08-13 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9626955B2 (en) | 2008-04-05 | 2017-04-18 | Apple Inc. | Intelligent text-to-speech conversion |
US9865248B2 (en) | 2008-04-05 | 2018-01-09 | Apple Inc. | Intelligent text-to-speech conversion |
US8806622B2 (en) * | 2008-04-21 | 2014-08-12 | Sentrybay Limited | Fraudulent page detection |
US20120023566A1 (en) * | 2008-04-21 | 2012-01-26 | Sentrybay Limited | Fraudulent Page Detection |
US10108612B2 (en) | 2008-07-31 | 2018-10-23 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US8676907B2 (en) * | 2008-08-21 | 2014-03-18 | Yamaha Corporation | Relay apparatus, relay method and recording medium |
US20110213850A1 (en) * | 2008-08-21 | 2011-09-01 | Yamaha Corporation | Relay apparatus, relay method and recording medium |
US10643611B2 (en) | 2008-10-02 | 2020-05-05 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US8731284B2 (en) * | 2008-12-19 | 2014-05-20 | Yahoo! Inc. | Method and system for detecting image spam |
US20100158395A1 (en) * | 2008-12-19 | 2010-06-24 | Yahoo! Inc., A Delaware Corporation | Method and system for detecting image spam |
US20140156678A1 (en) * | 2008-12-31 | 2014-06-05 | Sonicwall, Inc. | Image based spam blocking |
US9489452B2 (en) * | 2008-12-31 | 2016-11-08 | Dell Software Inc. | Image based spam blocking |
US10204157B2 (en) | 2008-12-31 | 2019-02-12 | Sonicwall Inc. | Image based spam blocking |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10795541B2 (en) | 2009-06-05 | 2020-10-06 | Apple Inc. | Intelligent organization of tasks items |
US11080012B2 (en) | 2009-06-05 | 2021-08-03 | Apple Inc. | Interface for a virtual digital assistant |
US10475446B2 (en) | 2009-06-05 | 2019-11-12 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US10283110B2 (en) | 2009-07-02 | 2019-05-07 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10706841B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Task flow identification based on user intent |
US8903716B2 (en) | 2010-01-18 | 2014-12-02 | Apple Inc. | Personalized vocabulary for digital assistant |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US8892446B2 (en) | 2010-01-18 | 2014-11-18 | Apple Inc. | Service orchestration for intelligent automated assistant |
US9548050B2 (en) | 2010-01-18 | 2017-01-17 | Apple Inc. | Intelligent automated assistant |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US9703872B2 (en) * | 2010-01-29 | 2017-07-11 | Ipar, Llc | Systems and methods for word offensiveness detection and processing using weighted dictionaries and normalization |
CN107402948A (en) * | 2010-01-29 | 2017-11-28 | 因迪普拉亚公司 | The system and method for carrying out word Detection by the method for attack and processing |
US10534827B2 (en) | 2010-01-29 | 2020-01-14 | Ipar, Llc | Systems and methods for word offensiveness detection and processing using weighted dictionaries and normalization |
US20130041655A1 (en) * | 2010-01-29 | 2013-02-14 | Ipar, Llc | Systems and Methods for Word Offensiveness Detection and Processing Using Weighted Dictionaries and Normalization |
US10049675B2 (en) | 2010-02-25 | 2018-08-14 | Apple Inc. | User profiling for voice input processing |
US10692504B2 (en) | 2010-02-25 | 2020-06-23 | Apple Inc. | User profiling for voice input processing |
US9190062B2 (en) | 2010-02-25 | 2015-11-17 | Apple Inc. | User profiling for voice input processing |
US9633660B2 (en) | 2010-02-25 | 2017-04-25 | Apple Inc. | User profiling for voice input processing |
CN102298696A (en) * | 2010-06-28 | 2011-12-28 | 方正国际软件(北京)有限公司 | Character recognition method and system |
US8527436B2 (en) | 2010-08-30 | 2013-09-03 | Stratify, Inc. | Automated parsing of e-mail messages |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10102359B2 (en) | 2011-03-21 | 2018-10-16 | Apple Inc. | Device access using voice authentication |
US10417405B2 (en) | 2011-03-21 | 2019-09-17 | Apple Inc. | Device access using voice authentication |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US20120308138A1 (en) * | 2011-06-03 | 2012-12-06 | Apple Inc | Multi-resolution spatial feature extraction for automatic handwriting recognition |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10672399B2 (en) | 2011-06-03 | 2020-06-02 | Apple Inc. | Switching between text data and audio data based on a mapping |
US8989492B2 (en) * | 2011-06-03 | 2015-03-24 | Apple Inc. | Multi-resolution spatial feature extraction for automatic handwriting recognition |
US11350253B2 (en) | 2011-06-03 | 2022-05-31 | Apple Inc. | Active transport based notifications |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US11069336B2 (en) | 2012-03-02 | 2021-07-20 | Apple Inc. | Systems and methods for name pronunciation |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9953088B2 (en) | 2012-05-14 | 2018-04-24 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
US10199051B2 (en) | 2013-02-07 | 2019-02-05 | Apple Inc. | Voice trigger for a digital assistant |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
US10176500B1 (en) * | 2013-05-29 | 2019-01-08 | A9.Com, Inc. | Content classification based on data recognition |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US9966060B2 (en) | 2013-06-07 | 2018-05-08 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10657961B2 (en) | 2013-06-08 | 2020-05-19 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10769385B2 (en) | 2013-06-09 | 2020-09-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US11048473B2 (en) | 2013-06-09 | 2021-06-29 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US10791216B2 (en) | 2013-08-06 | 2020-09-29 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US11314370B2 (en) | 2013-12-06 | 2022-04-26 | Apple Inc. | Method for extracting salient dialog usage from live data |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US10714095B2 (en) | 2014-05-30 | 2020-07-14 | Apple Inc. | Intelligent assistant for home automation |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US10699717B2 (en) | 2014-05-30 | 2020-06-30 | Apple Inc. | Intelligent assistant for home automation |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US10169329B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Exemplar-based natural language processing |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10657966B2 (en) | 2014-05-30 | 2020-05-19 | Apple Inc. | Better resolution when referencing to concepts |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US10417344B2 (en) | 2014-05-30 | 2019-09-17 | Apple Inc. | Exemplar-based natural language processing |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US10083690B2 (en) | 2014-05-30 | 2018-09-25 | Apple Inc. | Better resolution when referencing to concepts |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10497365B2 (en) | 2014-05-30 | 2019-12-03 | Apple Inc. | Multi-command single utterance input method |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US10904611B2 (en) | 2014-06-30 | 2021-01-26 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9668024B2 (en) | 2014-06-30 | 2017-05-30 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US10431204B2 (en) | 2014-09-11 | 2019-10-01 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10453443B2 (en) | 2014-09-30 | 2019-10-22 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US9986419B2 (en) | 2014-09-30 | 2018-05-29 | Apple Inc. | Social reminders |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US10438595B2 (en) | 2014-09-30 | 2019-10-08 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10390213B2 (en) | 2014-09-30 | 2019-08-20 | Apple Inc. | Social reminders |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US11556230B2 (en) | 2014-12-02 | 2023-01-17 | Apple Inc. | Data detection |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US11231904B2 (en) | 2015-03-06 | 2022-01-25 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US10529332B2 (en) | 2015-03-08 | 2020-01-07 | Apple Inc. | Virtual assistant activation |
US10311871B2 (en) | 2015-03-08 | 2019-06-04 | Apple Inc. | Competing devices responding to voice triggers |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US11127397B2 (en) | 2015-05-27 | 2021-09-21 | Apple Inc. | Device voice control |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10354652B2 (en) | 2015-12-02 | 2019-07-16 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US11069347B2 (en) | 2016-06-08 | 2021-07-20 | Apple Inc. | Intelligent automated assistant for media exploration |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10580409B2 (en) | 2016-06-11 | 2020-03-03 | Apple Inc. | Application integration with a digital assistant |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10942702B2 (en) | 2016-06-11 | 2021-03-09 | Apple Inc. | Intelligent device arbitration and control |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US10553215B2 (en) | 2016-09-23 | 2020-02-04 | Apple Inc. | Intelligent automated assistant |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US11281993B2 (en) | 2016-12-05 | 2022-03-22 | Apple Inc. | Model and ensemble compression for metric learning |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
US10332518B2 (en) | 2017-05-09 | 2019-06-25 | Apple Inc. | User interface for correcting recognition errors |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
US10847142B2 (en) | 2017-05-11 | 2020-11-24 | Apple Inc. | Maintaining privacy of personal information |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
US10789945B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Low-latency intelligent automated assistant |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
US10303715B2 (en) | 2017-05-16 | 2019-05-28 | Apple Inc. | Intelligent automated assistant for media exploration |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
US10657328B2 (en) | 2017-06-02 | 2020-05-19 | Apple Inc. | Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling |
US10445429B2 (en) | 2017-09-21 | 2019-10-15 | Apple Inc. | Natural language understanding using vocabularies with compressed serialized tries |
US10755051B2 (en) | 2017-09-29 | 2020-08-25 | Apple Inc. | Rule-based natural language processing |
US10636424B2 (en) | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
US10684703B2 (en) | 2018-06-01 | 2020-06-16 | Apple Inc. | Attention aware virtual assistant dismissal |
US10403283B1 (en) | 2018-06-01 | 2019-09-03 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
US10984798B2 (en) | 2018-06-01 | 2021-04-20 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11495218B2 (en) | 2018-06-01 | 2022-11-08 | Apple Inc. | Virtual assistant operation in multi-device environments |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
US11009970B2 (en) | 2018-06-01 | 2021-05-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US10944859B2 (en) | 2018-06-03 | 2021-03-09 | Apple Inc. | Accelerated task performance |
US10496705B1 (en) | 2018-06-03 | 2019-12-03 | Apple Inc. | Accelerated task performance |
US10504518B1 (en) | 2018-06-03 | 2019-12-10 | Apple Inc. | Accelerated task performance |
CN110414527A (en) * | 2019-07-31 | 2019-11-05 | 北京字节跳动网络技术有限公司 | Character identifying method, device, storage medium and electronic equipment |
US20240005365A1 (en) * | 2022-06-30 | 2024-01-04 | Constant Contact, Inc. | Email Subject Line Generation Method |
Also Published As
Publication number | Publication date |
---|---|
WO2008068987A1 (en) | 2008-06-12 |
US8045808B2 (en) | 2011-10-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8045808B2 (en) | Pure adversarial approach for identifying text content in images | |
US8098939B2 (en) | Adversarial approach for identifying inappropriate text content in images | |
US8489689B1 (en) | Apparatus and method for obfuscation detection within a spam filtering model | |
US8112484B1 (en) | Apparatus and method for auxiliary classification for generating features for a spam filtering model | |
EP2803031B1 (en) | Machine-learning based classification of user accounts based on email addresses and other account information | |
Gansterer et al. | E-mail classification for phishing defense | |
Attar et al. | A survey of image spamming and filtering techniques | |
JP2007529075A (en) | Method and apparatus for analyzing electronic communications containing images | |
CN113055386A (en) | Method and device for identifying and analyzing attack organization | |
Khan et al. | Cyber security using arabic captcha scheme. | |
Hayati et al. | Evaluation of spam detection and prevention frameworks for email and image spam: a state of art | |
US8699796B1 (en) | Identifying sensitive expressions in images for languages with large alphabets | |
Li et al. | Detection method of phishing email based on persuasion principle | |
CN112948725A (en) | Phishing website URL detection method and system based on machine learning | |
JP2006293573A (en) | Electronic mail processor, electronic mail filtering method and electronic mail filtering program | |
Kumar et al. | SVM with Gaussian kernel-based image spam detection on textual features | |
Xiujuan et al. | Detecting spear-phishing emails based on authentication | |
CN114691869A (en) | User label generation method and system | |
US11647046B2 (en) | Fuzzy inclusion based impersonation detection | |
Chiraratanasopha et al. | Detecting fraud job recruitment using features reflecting from real-world knowledge of fraud | |
Dangwal et al. | Feature selection for machine learning-based phishing websites detection | |
Shirazi et al. | A machine-learning based unbiased phishing detection approach | |
Tsikerdekis et al. | Detecting online content deception | |
CN114915468A (en) | Intelligent analysis and detection method for network crime based on knowledge graph | |
CN113746814A (en) | Mail processing method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: TREND MICRO INCORPORATED, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:OLIVER, JONATHAN JAMES;REEL/FRAME:019782/0404 Effective date: 20070813 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |
|
AS | Assignment |
Owner name: INCORPORATED, TREND M, TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TREND MICRO INCORPORATED;REEL/FRAME:061686/0520 Effective date: 20220914 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |