US20130145241A1 - Automated augmentation of text, web and physical environments using multimedia content - Google Patents

Automated augmentation of text, web and physical environments using multimedia content Download PDF

Info

Publication number
US20130145241A1
US20130145241A1 US13/693,246 US201213693246A US2013145241A1 US 20130145241 A1 US20130145241 A1 US 20130145241A1 US 201213693246 A US201213693246 A US 201213693246A US 2013145241 A1 US2013145241 A1 US 2013145241A1
Authority
US
United States
Prior art keywords
multimedia
keywords
text
pieces
user device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/693,246
Inventor
Ahmed Salama
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US13/693,246 priority Critical patent/US20130145241A1/en
Publication of US20130145241A1 publication Critical patent/US20130145241A1/en
Priority to US16/551,508 priority patent/US11256848B2/en
Priority to US17/651,783 priority patent/US20220171915A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • G06F17/211
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • G06F16/94Hypermedia

Definitions

  • This disclosure relates generally to data processing and, more particularly, to methods and systems for automated augmentation of text, web and physical environments using multimedia content.
  • e-books With the advent and rapid spread of electronic book (e-book) readers, tablet personal computers (PCs) and other hand-held devices, e-books are becoming very popular and the number of available e-books is steadily growing.
  • the devices that can be used to view e-books and other textual content, such as websites utilize various formats, which are not limited to plain text and can also be used to display multimedia content, such as videos and images.
  • e-books and web content, such as blogs are often insufficiently or poorly illustrated. Because illustrations promote better understanding of the contents, users may be forced to search the web for illustrations. This may be time-consuming and inconvenient.
  • a method for automated augmentation of text, web and physical environments using multimedia content comprises determining that a user device displays a portion of the text document in a predetermined format, analyzing the portion of the text document to generate keywords, generating, based on keywords, at least one search query for a multimedia content search via a search engine, retrieving, from one or more data storages, one or more pieces of multimedia associated with one or more keywords, and enabling the user device to display the one or more pieces of multimedia concurrently with the portion of the text document, wherein the pieces of multimedia are displayed on the user device according to predetermined user settings.
  • analyzing the portion of the text document to generate the keywords comprises parsing the portion of the text document to generate a plurality of terms and generating keywords based on the plurality of terms.
  • generating the keywords comprises selecting the plurality of terms, which appear in the portion of the text document more than a predetermined number of times.
  • keywords may be generated based on contextual data occurring in a displayed portion of a text, such as, for example, a date, a location, or a name. In some example embodiments, the keywords may be generated based on those terms in the displayed portion of the text that start with an uppercase letter.
  • the method may further comprise applying an optical character recognition process to an image containing a text to retrieve the plurality of terms.
  • the predetermined format of the text document may include an e-book reader text format or a web page document format.
  • a portion of the text document is related to a virtual page of an e-book.
  • the multimedia content may comprise a text, a still image, an icon, an animated image, a video, and an audio.
  • the pieces of multimedia are displayed on the user device as a mosaic.
  • data storages are selected from a remote database, a web site, a local database, or a cache of the user device.
  • the search engine is selected from a third party image search engine, a web search engine, or a local search engine embedded within the user device.
  • the method may further comprise storing the pieces of multimedia associated with the keywords to local or remote data storage.
  • the method may further comprise generating unique identifiers associated with each piece of multimedia from the set of pieces of multimedia and corresponding keywords, wherein the unique identifiers are stored in the local or the remote data storage along with corresponding pieces of multimedia and the corresponding keywords.
  • the predetermined user settings define a number of displayable pieces of multimedia per a portion of the text document, a number of the displayable pieces of multimedia per a slideshow, types of the displayable pieces of multimedia, a size of the user device screen allocated for displaying the pieces of multimedia, a number of slideshows, and a grid style used for arrangement of multiple pieces of multimedia on the user device screen using a dynamic layout program.
  • the method may further comprise enabling a user device to capture an image for further displaying, wherein the captured image at least in part relates to the portion of the text document.
  • pieces of multimedia are displayed dynamically depending on a currently displayable portion of the text document.
  • a web-based data storage with publicly available multimedia content may be used by the technology described herein to source multimedia content. Images, videos, and other multimedia content may be used. Such multimedia content or electronic links thereto may be added by the users manually to their profiles. The latter may enable sourcing multimedia content using the proposed method with greater relevance.
  • multimedia content may be added to e-books and other displayable text documents using the method proposed herein during the design stage, i.e. the related multimedia content may be embedded prior to their publication.
  • the user may be able to select to display or hide the embedded multimedia content.
  • the displayed relevant multimedia content may be clicked to display the corresponding caption or some related information.
  • modules, subsystems, or devices may be adapted to perform the recited steps.
  • Other features and exemplary embodiments are described below.
  • FIG. 1 shows a block diagram illustrating a system environment suitable for automated augmentation of text, web and physical environments using multimedia content, in accordance with certain embodiments.
  • FIG. 2 is a diagram of a system for automated augmentation of text, web and physical environments using multimedia content, in accordance with certain embodiments.
  • FIG. 3 shows a user device with an embedded system for automated augmentation of text, web and physical environments using multimedia content, in accordance with certain embodiments.
  • FIG. 4 is a process flow diagram showing a method for automated augmentation of text, web and physical environments using multimedia content, in accordance with certain embodiments.
  • FIG. 5 is a process flow diagram showing a method for optical character recognition of a text from text, web, and physical environments, in accordance with certain embodiments.
  • FIG. 6 is a block diagram illustrating the application of a head-mountable display device for optical character recognition of a text from a physical environment, in accordance with certain embodiments.
  • FIG. 7 is a block diagram illustrating the application of a tablet device for optical character recognition of a text from a text environment, in accordance with certain embodiments.
  • FIG. 8 is a graphical user interface of a user device displaying a text and the corresponding multimedia content within a browser, in accordance with certain embodiments.
  • FIG. 9 is an illustration of a graphical user interface of a user device, displaying a text and the corresponding multimedia content within a mobile application, in accordance with certain embodiments.
  • FIG. 10 is a diagrammatic representation of an example machine in the form of a computer system within which a set of instructions, for the machine to perform any one or more of the methodologies discussed herein, is executed.
  • a computer-implemented methods and systems for automated augmentation of text, web and physical environments using multimedia content are provided.
  • the disclosed methods and systems provide a unique way to improve experience of reading text-only content such as e-books or text based web sites by adding visual supplemented contented associated with a text.
  • the methods disclosed herein may be implemented in various types of electronic user devices including portable e-book readers, tablet PCs, laptops, mobile and smart phones, personal digital assistants (PDAs), computers, and any other electronic devices configured to display digital content and interact with remote servers via a network such as the Internet.
  • portable e-book readers tablet PCs, laptops, mobile and smart phones
  • PDAs personal digital assistants
  • computers and any other electronic devices configured to display digital content and interact with remote servers via a network such as the Internet.
  • the user devices may merely embed a web browser allowing users to browse web sites through the Internet and virtually interact with a remote system for automated augmentation of text, web and physical environments using multimedia content.
  • the system for automated augmentation of text, web and physical environments may automatically enable a user device to display a text of a website and corresponding images or video content concurrently on the same screen.
  • the user device may embed software allowing adding multimedia content to text, web and physical environments without the necessity of interacting with any remote systems.
  • the user device may include an internal data storage providing multimedia content when users read text documents.
  • there may be provided hybrid systems such that the user device may embed software, which may allow adding multimedia content to text, web and physical environments with or without interaction with remote systems.
  • the multimedia content may relate to a still image, an icon, an animated image, a video and an audio, or any combination thereof.
  • the multimedia content comprises multiple images or video (audio-video), which may be displayed sequentially, i.e. as a slideshow, or simultaneously, as a mosaic, a grid-based layout, or in any combination thereof.
  • a transparent image such as 24 bit transparent Alpha Portable Network Graphics (PNG) or a shadow may be used to overlay the pictures to simulate an illusion of depth. This approach may be applied to all images to make them appear in theme and aesthetically pleasing.
  • PNG Alpha Portable Network Graphics
  • a video may be cropped and its size adjusted to fit within a designed grid-based layout of a displayable page. For example, a horizontal landscape video may be cropped so that it may fit within a vertical page, or within a square on the page.
  • a video may be automatically played when a user lands on a page, or automatically stopped when a user moves on to a different page, whilst retaining a time position of the video, so that when the user comes back to that page, the video continues playing where it left off.
  • the video may also contain volume functions overplayed graphically so that the user may adjust the volume of a video clip.
  • Multimedia content may be retrieved either from a user device memory (e.g., a local data store or a database) or from a remote server.
  • a process for retrieving multimedia content may involve an analysis of the text currently displayable on the user device. For example, a currently displayable page of an e-book may be analyzed. As a result of the text analysis, a number of terms may be generated. Such terms may relate to words (e.g., nouns), phrases, or sentences. Further, the terms may be used to generate keywords. To this end, it may be determined what terms appear more frequently in the text, and those terms are used to construe keywords. Those skilled in the art would appreciate that various methods for generating keywords may be utilized.
  • search queries may be formulated. Such search queries may be then used in a local database or a remote search engine to find and retrieve multimedia content associated with the keywords.
  • multimedia content When the multimedia content is retrieved from remote servers, it may be locally stored on the user device to enable faster access in the following cases.
  • multimedia content Once multimedia content is retrieved, it may be displayed on a user device screen along with a corresponding text portion.
  • software used for visualization of text portions may include widgets to embed multimedia content.
  • a part of the user device screen may be virtually allocated for displaying the multimedia content. It should be understood, that there exist other ways of displaying text content and multimedia content.
  • keyword extraction and accompanying multimedia search results may yield inaccurate results. Therefore, a user may have an option of deleting an image or a video, which appears to be irrelevant. When the user deletes an image, the next best fitting search result may replace the deleted image. The user may repeat the deletion process until he selects an appropriate image. The selected image may be saved so when a similar query is performed it yields more accurate results. Such intelligent learning may be employed to provide better accuracy for image results.
  • the user may assign custom images from search results to particular keywords in a book or a text. These assignments may be stored to a central database, which may be shared with other users, so that that the users have a better experience with their own books. User devices may be configured to check the central database each time a page or an eBook is loaded with new keyword data or multimedia content, which would help improving user experience with the page.
  • the unused grid spaces may be filled up with placeholder images, which may be include solid colors, or randomly selected placeholder images from a database of placeholder images. This is to provide an aesthetically pleasing layout even if there are not enough images to populate the entire page.
  • a different grid-based layout may be randomly generated for each displayable page. This may be done in two different ways. The first is when each page includes image containers whose size may be randomly generated, forming a mosaic of squares including images and solid colors. The second is when preprogrammed templates are stored on the user device or on a remote server providing a layout for an image mosaic.
  • the user device may display a text document in the form of an image.
  • the user device may have a digital camera, which may be used to capture some printed text documents.
  • the image may be subjected to an optical character recognition process to derive terms used therein. These terms may then be used to generate keywords and search queries.
  • multimedia content may be added to the first occurrence, or all occurrences of a term in the displayed portion of the text, whichever option may be selected by the user. If the user wishes the multimedia content to be added to all occurrences of the term in the displayed portion of the text, he or she may specify that a different image or video is added to each occurrence of the term.
  • a tablet device may embed both e-books and web pages and may also be used as a reader for blogs, articles, and other online and offline documents.
  • hyperlinks to relevant resources on the Internet may be provided within the text. Images, videos, and keywords may be linked to their sources. Links may be presented by placing a button over the image, which may reveals the link or opens up a URL associated with the link.
  • prepackaged eBooks may come bundled with accompanying multimedia content and layouts so that no analysis or multimedia content search is required when viewing the eBook.
  • This prepackaged eBook may be downloaded from a remote server or transferred via a disk drive or the Internet.
  • the prepackaged eBook may include image files, video files, template files, graphic files, text files, eBook files, and any other files applicable to view the eBook in an offline mode (e.g., having no connection to a WAN).
  • a single sentence within a portion of the displayed text contains multiple occurrences of contextual data, such as, for example, a location, a date, and a name
  • the entire sentence may be used for performing a semantic search for relevant multimedia content.
  • multimedia content may be added to user-generated presentations and essays. Images and videos may be added instantly using a predetermined template or layout as the user types a word that may be used for generating a keyword in accordance with the embodiments described herein. Keywords may also be predefined by the user, stored in the keywords database, and used to conduct searches for multimedia content when reproduced by the user in the text of a presentation or essay. This approach may be facilitated by software integrated within a word processor, e.g. Microsoft Word, as a plug-in or used as a standalone application.
  • a word processor e.g. Microsoft Word
  • portions of the text to which multimedia content has been added or individual images or videos may be shared via a social network. For example, this may be done by highlighting an image or quote to be shared and pressing a “share” button.
  • FIG. 1 shows a block diagram illustrating a system environment 100 suitable for adding multimedia content to displayable text, web and physical environments.
  • the system environment 100 may comprise a user device 102 with a browser 112 , which may provide the ability to browse the Internet and interact with various websites, e.g. web search engine 106 .
  • a user device may include a computer or a laptop.
  • the user device 102 may be a mobile device that includes a mobile application 114 .
  • a mobile device may include a tablet computer, a handheld cellular phone, a mobile phone, a smart phone, a PDA, a handheld device having wireless connection capability, or any other electronic device.
  • the system environment 100 may further include a system for adding multimedia content to displayable text, web and physical environments 104 , a search engine 106 , including a web search engine, such as Google or Bing, a third-party image search engine, or a local search engine embedded within the user device, a web-based data storage 108 that allows storing and retrieving the digital media content, and a network (e.g. the Internet) 110 .
  • a search engine 106 including a web search engine, such as Google or Bing, a third-party image search engine, or a local search engine embedded within the user device, a web-based data storage 108 that allows storing and retrieving the digital media content, and a network (e.g. the Internet) 110 .
  • the network 110 may couple the aforementioned modules and is a network of data processing nodes interconnected for the purpose of data communication that may be utilized to communicatively couple various components of the system environment 100 .
  • the network 110 may include the Internet or any other network capable of communicating data between devices.
  • the network 110 may include links using technologies such as Ethernet, 802.11, worldwide interoperability for microwave access (WiMAX), 2 G/ 3 G/ 4 G mobile communications protocols, digital subscriber line (DSL), asynchronous transfer mode (ATM), InfiniBand, PCI Express Advanced Switching, etc.
  • the networking protocols used on the network 110 may include multiprotocol label switching (MPLS), the transmission control protocol/Internet protocol (TCP/IP), the User Datagram Protocol (UDP), the hypertext transport protocol (HTTP), the simple mail transfer protocol (SMTP), the file transfer protocol (FTP), etc.
  • MPLS multiprotocol label switching
  • TCP/IP transmission control protocol/Internet protocol
  • UDP User Datagram Protocol
  • HTTP hypertext transport protocol
  • SMTP simple mail transfer protocol
  • FTP file transfer protocol
  • the data exchanged over the network 110 may be represented using technologies or formats including image data in binary form (e.g.
  • PNG Portable Network Graphics
  • HTML hypertext markup language
  • XML extensible markup language
  • all or some links may be encrypted using conventional encryption technologies such as secure sockets layer (SSL), transport layer security (TLS), virtual private networks (VPNs), Internet Protocol security (IPsec), etc.
  • SSL secure sockets layer
  • TLS transport layer security
  • VPNs virtual private networks
  • IPsec Internet Protocol security
  • the entities on the network 110 may use custom or dedicated data communications technologies instead of, or in addition to, the ones described above.
  • the user device 102 may include a computer, a laptop, a tablet computer, a portable computing device, a PDA, a handheld cellular phone, a mobile phone, a smart phone, a handheld device having a wireless connection capability, or any other electronic device.
  • the user device 102 may comprise a browser that provides the ability to browse the Internet.
  • the user device 102 may be used to communicate with the search engine 106 . Accordingly, the user device 102 may receive or transmit data, such as search queries, via a wired or a wireless network.
  • the system for adding multimedia content to displayable text, web and physical environments 104 may be used for adding multimedia content to the displayable text, web and physical environments by interacting with the search engine 106 and the web-based data storage 108 , which may provide multimedia content as users read text documents. More specifically, the system for adding multimedia content to text, web and physical environments 104 may analyze the portion of the text displayed using the user device 102 , generate keywords based on the more frequently occurring terms (e.g., nouns), phrases, or sentences, generate search queries based on these keywords, and send them to the search engine 106 .
  • keywords based on the more frequently occurring terms (e.g., nouns), phrases, or sentences
  • the search engine 106 may conduct a search for multimedia content and store the multimedia content in the web-based data storage 108 .
  • the user device 102 and the system for adding multimedia content to displayable text, web and physical environments 104 may interact with the web-based data storage 108 to retrieve the multimedia content, which may then be displayed on the user device screen by the system for adding multimedia content to displayable text, web physical environments 104 as embedded in the portion of the text displayed, or in a separate portion or portions of the user device screen (not shown).
  • the system environment 100 may comprise an intermediary server (not shown).
  • the intermediary server may be configured to manage and parse all search queries from the user device and manage image resizing, image fetching, image searches, keyword extraction, account management, account details storage, template storage, template generation, uploading of templates and user generated data, and so forth.
  • the intermediary server may be also configured to provide updates to eBook templates on the user device.
  • FIG. 2 is a block diagram of the system for adding multimedia content to text, web and physical environments 104 .
  • the system for adding multimedia content to text, web and physical environments 104 may comprise a communication module 210 , a determining module 220 , an analyzing module 230 , a search query generator 240 , a retrieving module 250 , a displaying module 260 , an optical character recognition module 270 , a terms database 280 , and a keywords database 290 .
  • the communication module 210 may be configured to provide user interactions with the system for adding multimedia content to displayable text, web and physical environments 104 , as well as to provide interaction between the different modules of this system.
  • the determining module 220 may be configured to determine which of the terms may be used as keywords, to generate such keywords and to store the keywords in the keywords database 290 .
  • the analyzing module 230 may be configured to analyze the portion of the text displayed in order to identify terms (e.g. nouns), phrases, and sentences, thereby enabling their further use as keywords.
  • the analyzing module 230 may be configured to analyze dates mentioned in the portion of the text displayed or the in the entire article/book to establish a time context for the images being sought.
  • the search query generator 240 may be configured to convert into search query keywords, formulated based on the terms, selected by the analyzing and determining modules 230 , 220 and stored in the keywords database 290 .
  • the retrieving module 250 may be configured, in some embodiments, to retrieve multimedia content from the web-based data storage 108 , while in some other embodiments, the retrieving module 250 may be configure to retrieve multimedia content from the multimedia content database 340 shown in FIG. 3 .
  • the displaying module 260 may be configured to display multimedia content retrieved by the retrieving module 250 .
  • the optical character recognition module 270 may be configured to scan a portion of a printed text document into an image containing a text to be processed by the analyzing module 230 and the determining module 220 .
  • the terms database 280 may be configured to store the terms that were identified in the displayed portion of the text by the analyzing module 230 and suitable to be used as keywords by the determining module 220 .
  • the keywords database 290 may be configured to store keywords, formulated based on the terms that were selected by the analyzing module 230 and the determining module 220 .
  • FIG. 3 is a block diagram of the user device 102 with the system for adding multimedia content to displayable text, web and physical environments.
  • the optical character recognition module 314 may be used to scan the printed text document into an image containing text, and the image with the text is further subjected to optical character recognition to generate one or more keywords.
  • the user device 102 may also comprise a digital camera 316 to capture the displayed portion of the text.
  • the digital camera may be attached to a head-mountable display device, such as a heads-up display.
  • a head-mountable display device such as a heads-up display.
  • a person wearing the head-mountable display device with an in-built camera may view the surroundings and by means of the camera and computer device may scan the environment such as advertisements, signs, billboards, magazines, and newspapers for text.
  • the scanned material may be further processed by the computer device to generate keywords and provide relevant real-time images and other multimedia on the heads-up display.
  • Interaction with the system for adding multimedia content to displayable text, web and physical environments 104 may be performed using the communication module 302 .
  • the analyzing module 306 may analyze the portion of the text displayed to identify those terms (e.g. noun), phrases, or sentences, which occur in this portion of the text more frequently.
  • the determining module 304 may then determine which of the found terms may be used as keywords, generating such keywords and storing them in the keywords database 290 .
  • the identified terms may be stored in the terms database 320 and the keywords generated based on the identified terms may be stored in the keywords database 330 .
  • the search query generator 308 may convert the stored keywords into search queries, which may be sent to the search engine 106 . In some embodiments, these search queries may be used to search for multimedia content in the multimedia content database 340 .
  • the retrieving module 310 may interact with the web-based data storage 108 to retrieve the stored multimedia content. In other embodiments, the retrieving module 310 may retrieve multimedia content from the multimedia content database 340 . The retrieved multimedia content is then visualized by the displaying module 312 .
  • FIG. 4 is a process flow diagram showing a method 400 for adding multimedia content to displayable text, web and physical environments, according to an exemplary embodiment.
  • the method 400 may be performed by processing logic that may comprise hardware, software (such as software run on a general-purpose computer system or a hand-held device), or a combination of both.
  • the method 400 may be applied using the various modules discussed above with reference to FIG. 3 .
  • Each of these modules may include processing logic. It will be appreciated that the examples of the foregoing modules may be virtual, and the instructions said to be executed by a module may, in fact, be retrieved and executed by a processor.
  • the foregoing modules may also include memory cards, servers, and/or computer discs. Although the various modules may be configured to perform some or all of the various steps described herein, fewer or more modules may be provided and still fall within the scope of the exemplary embodiments.
  • the method 400 may commence at operation 402 with determining that a user device displays a portion of a text document of a predetermined format.
  • the text may be subjected to an optical recognition process that may commence at operation 404 .
  • the portion of the text displayed may be analyzed by the analyzing module 230 to identify the terms (e.g. nouns), phrases, and sentences.
  • the determining module 220 may then determine which of the found terms may be used for generating keywords, and may then generate keywords based on these terms at operation 408 .
  • the search query generator 240 may be used to generate at least one search query, which may then be sent to the search engine 106 .
  • the displayed text content may be transmitted to a server for processing.
  • the processing of the text to generate keywords and the sourcing of images may take place on a remote web server (in the cloud), and the server may transmit back to the user device a combination of both text and images, either packaged together or as separately, so that the user device can display the text and accompanying images without having to conduct the search queries or the keyword generation locally.
  • the determining module 220 may rely on cloud-based keyword identification and extraction as well as cloud based image retrieval.
  • the retrieving module may retrieve the multimedia content, associated with one or more keywords, from the search engine 106 and store the one or more keywords in local or remote data storage at operation 414 .
  • the displaying module displays the stored multimedia content concurrently with the corresponding portion of the text on the user device screen.
  • FIG. 5 is a process flow diagram showing a method 500 for optical character recognition of a text from text, web and physical environments, according to an example embodiment.
  • the method 500 may commence in operation 502 with scanning text associated text, web and physical environments.
  • text, web, and physical environments may be scanned using a head-mountable display device with a built-in digital camera or a tablet device.
  • the scanned text may be parsed. During the parsing, the scanned text may be analyzed to separate the text from graphics and to detect presence of columns and headlines.
  • the shapes of individual characters are recognized via a character recognition process, in operation 506 . Character recognition may be performed on any number of character fonts.
  • context analysis may be performed to divide the text into words.
  • recognized characters may be formatted for output, in operation 510 .
  • the text may be electronically searched for keywords to retrieve, from data storages, pieces of multimedia associated with the one or more keywords.
  • FIG. 6 is a block diagram illustrating application of a head-mountable display device for optical character recognition of a text from a physical environment, in accordance with certain embodiments.
  • a user wearing a head-mountable display device 602 with a built-in digital camera, may view a physical environment 610 .
  • the digital camera may scan the surroundings, such as advertisements, signs, billboards, magazines, newspapers and so forth, for text content.
  • the text content may then be processed using optical character recognition module 270 to retrieve keywords and display relevant pieces of multimedia content, such as images 604 , 606 , and 608 , on a display of the head-mountable display device 602 .
  • FIG. 7 is a block diagram illustrating the application of a tablet device for optical character recognition of a text from a text environment, in accordance with certain embodiments.
  • a tablet device 702 may be used to scan text environment 704 .
  • the text environment 704 may include any printed documents, newspapers, magazines and so forth.
  • the text scanned from the text environment may then be processed using optical character recognition module 270 to retrieve keywords and display relevant pieces of multimedia content on the screen of the tablet device 702 .
  • FIG. 8 is a block diagram showing a user device 800 having a browser within which a text 802 and the corresponding images 804 are displayed in a grid-like fashion.
  • the displayed portion of the text 802 is located within the left-hand section of the user device screen.
  • Two larger rectangular images 804 , 804 c are positioned, respectively, at the top and the bottom of the right-hand section of the user-device screen, while two smaller-sized square images 804 a , 804 b are positioned next to each other in the middle of the right-hand section of the user device screen.
  • FIG. 9 is a block diagram showing a graphical user interface of a user device 900 , displaying a text 902 and the corresponding multimedia content (images 904 , 904 a and videos 906 , 906 a ) adjacent to the text in a grid-like fashion.
  • the displayed portion of the text 902 is located in the left-hand section of the user device screen. Adjacent to the displayed portion of the text 902 are two videos and two images.
  • the videos 906 , 906 a are represented by larger-sized rectangular images and positioned, respectively, on top and at the bottom of the right-hand section of the user-device screen, while the images 904 , 904 a , which are smaller in size and square-shaped, are positioned next to each other in between the videos 906 , 906 a , in the middle of the right-hand section of the user device screen.
  • FIG. 10 shows a diagrammatic representation of a computing device for a machine in the example electronic form of a computer system 1000 , within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed.
  • the machine operates as a standalone device or can be connected (e.g., networked) to other machines.
  • the machine can operate in the capacity of a server or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
  • the machine can be a personal computer (PC), a tablet PC, a set-top box (STB), a PDA, a cellular telephone, a portable music player (e.g., a portable hard drive audio device, such as an Moving Picture Experts Group Audio Layer 3 (MP3) player), a web appliance, a network router, a switch, a bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
  • PC personal computer
  • PDA set-top box
  • MP3 Moving Picture Experts Group Audio Layer 3
  • MP3 Moving Picture Experts Group Audio Layer 3
  • web appliance e.g., a web appliance, a network router, a switch, a bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
  • MP3 Moving Picture Experts Group Audio Layer 3
  • machine shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of
  • the example computer system 1000 includes a processor or multiple processors 1002 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), or both), and a main memory 1004 and a static memory 1006 , which communicate with each other via a bus 1008 .
  • the computer system 1000 may further include a video display unit 1010 (e.g., a liquid crystal display (LCD) or cathode ray tube (CRT)).
  • the computer system 1000 may also include at least one input device 1012 , such as an alphanumeric input device (e.g., a keyboard), a cursor control device (e.g., a mouse), a microphone, a digital camera, a video camera, and so forth.
  • the computer system 1000 may also include a disk drive unit 1014 , a signal generation device 1016 (e.g., a speaker), and a network interface device 1018 .
  • the disk drive unit 1014 may include a computer-readable medium 1020 which may store one or more sets of instructions and data structures (e.g., instructions 1022 ) embodying or utilized by any one or more of the methodologies or functions described herein.
  • the instructions 1022 may also reside, completely or at least partially, within the main memory 1004 and/or within the processors 1002 during execution thereof by the computer system 1000 .
  • the main memory 1004 and the processors 1002 may also constitute machine-readable media.
  • the instructions 1022 may further be transmitted or received over the network 110 via the network interface device 1018 utilizing any one of a number of well-known transfer protocols (e.g., Hyper Text Transfer Protocol (HTTP), CAN, Serial, and Modbus).
  • HTTP Hyper Text Transfer Protocol
  • CAN Serial
  • Modbus any one of a number of well-known transfer protocols
  • While the computer-readable medium 1020 is shown in an example embodiment to be a single medium, the term “computer-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions.
  • the term “computer-readable medium” shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the machine and that causes the machine to perform any one or more of the methodologies of the present application, or that is capable of storing, encoding, or carrying data structures utilized by or associated with such a set of instructions.
  • the term “computer-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical and magnetic media. Such media may also include, without limitation, hard disks, floppy disks, flash memory cards, digital video disks, random access memory (RAM), read only memory (ROM), and the like.
  • the example embodiments described herein may be implemented in an operating environment comprising computer-executable instructions (e.g., software) installed on a computer, in hardware, or in a combination of software and hardware.
  • the computer-executable instructions may be written in a computer programming language or can be embodied in firmware logic. If written in a programming language conforming to a recognized standard, such instructions may be executed on a variety of hardware platforms and for interfaces to a variety of operating systems.
  • HTML Hypertext Markup Language
  • XML Extensible Markup Language
  • XSL Extensible Stylesheet Language
  • DSSSL Document Style Semantics and Specification Language
  • Cascading Style Sheets CSS
  • Synchronized Multimedia Integration Language SML
  • WML Wireless Markup Language
  • JavaTM JavaTM, JiniTM, C, C++, Perl, UNIX Shell, Visual Basic or Visual Basic Script, Virtual Reality Markup Language (VRML), ColdFusionTM or other compilers, assemblers, interpreters or other computer languages or platforms.
  • the disclosed technique provides a useful tool to enable people to easily purchase product items from multiple retailer websites within, for example, a social network, without leaving the website of the network.

Abstract

Provided is an example method for automated augmentation of text, web and physical environments using multimedia content. The method may comprise determining that a user device displays a portion of the text document in a predetermined format, analyzing the portion of the text document to generate one or more keywords, generating, based on the one or more keywords, at least one search query for a multimedia content search via a search engine, retrieving, from one or more data storages, one or more pieces of multimedia associated with the one or more keywords, and enabling the user device to display the one or more pieces of multimedia concurrently with the portion of the text document, wherein the one or more pieces of multimedia are displayed on the user device according to predetermined user settings.

Description

    FIELD
  • This disclosure relates generally to data processing and, more particularly, to methods and systems for automated augmentation of text, web and physical environments using multimedia content.
  • BACKGROUND
  • With the advent and rapid spread of electronic book (e-book) readers, tablet personal computers (PCs) and other hand-held devices, e-books are becoming very popular and the number of available e-books is steadily growing. The devices that can be used to view e-books and other textual content, such as websites utilize various formats, which are not limited to plain text and can also be used to display multimedia content, such as videos and images. However, e-books and web content, such as blogs are often insufficiently or poorly illustrated. Because illustrations promote better understanding of the contents, users may be forced to search the web for illustrations. This may be time-consuming and inconvenient.
  • SUMMARY
  • This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
  • Provided are methods and systems for automated augmentation of text, web and physical environments using multimedia content. In some example embodiments, a method for automated augmentation of text, web and physical environments using multimedia content comprises determining that a user device displays a portion of the text document in a predetermined format, analyzing the portion of the text document to generate keywords, generating, based on keywords, at least one search query for a multimedia content search via a search engine, retrieving, from one or more data storages, one or more pieces of multimedia associated with one or more keywords, and enabling the user device to display the one or more pieces of multimedia concurrently with the portion of the text document, wherein the pieces of multimedia are displayed on the user device according to predetermined user settings.
  • In some example embodiments, analyzing the portion of the text document to generate the keywords comprises parsing the portion of the text document to generate a plurality of terms and generating keywords based on the plurality of terms. In some example embodiments, generating the keywords comprises selecting the plurality of terms, which appear in the portion of the text document more than a predetermined number of times.
  • In some example embodiments, keywords may be generated based on contextual data occurring in a displayed portion of a text, such as, for example, a date, a location, or a name. In some example embodiments, the keywords may be generated based on those terms in the displayed portion of the text that start with an uppercase letter.
  • In some example embodiments, the method may further comprise applying an optical character recognition process to an image containing a text to retrieve the plurality of terms. In some example embodiments, the predetermined format of the text document may include an e-book reader text format or a web page document format. In some example embodiments, a portion of the text document is related to a virtual page of an e-book.
  • In some example embodiments, the multimedia content may comprise a text, a still image, an icon, an animated image, a video, and an audio. In some example embodiments, the pieces of multimedia are displayed on the user device as a mosaic. In some example embodiments, data storages are selected from a remote database, a web site, a local database, or a cache of the user device. In some example embodiments, the search engine is selected from a third party image search engine, a web search engine, or a local search engine embedded within the user device. In some example embodiments, the method may further comprise storing the pieces of multimedia associated with the keywords to local or remote data storage.
  • In some example embodiments, the method may further comprise generating unique identifiers associated with each piece of multimedia from the set of pieces of multimedia and corresponding keywords, wherein the unique identifiers are stored in the local or the remote data storage along with corresponding pieces of multimedia and the corresponding keywords.
  • In some example embodiments, the predetermined user settings define a number of displayable pieces of multimedia per a portion of the text document, a number of the displayable pieces of multimedia per a slideshow, types of the displayable pieces of multimedia, a size of the user device screen allocated for displaying the pieces of multimedia, a number of slideshows, and a grid style used for arrangement of multiple pieces of multimedia on the user device screen using a dynamic layout program.
  • In some example embodiments, the method may further comprise enabling a user device to capture an image for further displaying, wherein the captured image at least in part relates to the portion of the text document. In some example embodiments, pieces of multimedia are displayed dynamically depending on a currently displayable portion of the text document.
  • In some example embodiments, a web-based data storage with publicly available multimedia content may be used by the technology described herein to source multimedia content. Images, videos, and other multimedia content may be used. Such multimedia content or electronic links thereto may be added by the users manually to their profiles. The latter may enable sourcing multimedia content using the proposed method with greater relevance.
  • In some example embodiments, multimedia content may be added to e-books and other displayable text documents using the method proposed herein during the design stage, i.e. the related multimedia content may be embedded prior to their publication. The user may be able to select to display or hide the embedded multimedia content.
  • In some example embodiments, the displayed relevant multimedia content may be clicked to display the corresponding caption or some related information.
  • In further exemplary embodiments, modules, subsystems, or devices may be adapted to perform the recited steps. Other features and exemplary embodiments are described below.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
  • FIG. 1 shows a block diagram illustrating a system environment suitable for automated augmentation of text, web and physical environments using multimedia content, in accordance with certain embodiments.
  • FIG. 2 is a diagram of a system for automated augmentation of text, web and physical environments using multimedia content, in accordance with certain embodiments.
  • FIG. 3 shows a user device with an embedded system for automated augmentation of text, web and physical environments using multimedia content, in accordance with certain embodiments.
  • FIG. 4 is a process flow diagram showing a method for automated augmentation of text, web and physical environments using multimedia content, in accordance with certain embodiments.
  • FIG. 5 is a process flow diagram showing a method for optical character recognition of a text from text, web, and physical environments, in accordance with certain embodiments.
  • FIG. 6 is a block diagram illustrating the application of a head-mountable display device for optical character recognition of a text from a physical environment, in accordance with certain embodiments.
  • FIG. 7 is a block diagram illustrating the application of a tablet device for optical character recognition of a text from a text environment, in accordance with certain embodiments.
  • FIG. 8 is a graphical user interface of a user device displaying a text and the corresponding multimedia content within a browser, in accordance with certain embodiments.
  • FIG. 9 is an illustration of a graphical user interface of a user device, displaying a text and the corresponding multimedia content within a mobile application, in accordance with certain embodiments.
  • FIG. 10 is a diagrammatic representation of an example machine in the form of a computer system within which a set of instructions, for the machine to perform any one or more of the methodologies discussed herein, is executed.
  • DETAILED DESCRIPTION
  • The following detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show illustrations in accordance with example embodiments. These example embodiments, which are also referred to herein as “examples,” are described in enough detail to enable those skilled in the art to practice the present subject matter. The embodiments can be combined, other embodiments can be utilized, or structural, logical and electrical changes can be made without departing from the scope of what is claimed. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope is defined by the appended claims and their equivalents.
  • In this document, the terms “a” or “an” are used, as is common in patent documents, to include one or more than one. In this document, the term “or” is used to refer to a nonexclusive “or,” such that “A or B” includes “A but not B,” “B but not A,” and “A and B,” unless otherwise indicated. Furthermore, all publications, patents, and patent documents referred to in this document are incorporated by reference herein in their entirety, as though individually incorporated by reference. In the event of inconsistent usages between this document and those documents so incorporated by reference, the usage in the incorporated reference(s) should be considered supplementary to that of this document; for irreconcilable inconsistencies, the usage in this document controls.
  • In accordance with various embodiments and the corresponding disclosure thereof, a computer-implemented methods and systems for automated augmentation of text, web and physical environments using multimedia content are provided. The disclosed methods and systems provide a unique way to improve experience of reading text-only content such as e-books or text based web sites by adding visual supplemented contented associated with a text.
  • The methods disclosed herein may be implemented in various types of electronic user devices including portable e-book readers, tablet PCs, laptops, mobile and smart phones, personal digital assistants (PDAs), computers, and any other electronic devices configured to display digital content and interact with remote servers via a network such as the Internet.
  • In some example embodiments, the user devices may merely embed a web browser allowing users to browse web sites through the Internet and virtually interact with a remote system for automated augmentation of text, web and physical environments using multimedia content. In this case, when a user opens, for example, a text-only web page, the system for automated augmentation of text, web and physical environments may automatically enable a user device to display a text of a website and corresponding images or video content concurrently on the same screen.
  • In some embodiments, the user device may embed software allowing adding multimedia content to text, web and physical environments without the necessity of interacting with any remote systems. In this case, the user device may include an internal data storage providing multimedia content when users read text documents. In some additional and alternative embodiments, there may be provided hybrid systems such that the user device may embed software, which may allow adding multimedia content to text, web and physical environments with or without interaction with remote systems.
  • According to various example embodiments, once the user device is requested to display a text only document of any kind or a document or web page having a dominant text part, either software installed in the user device or in a remote server causes the user device to display both the text part and multimedia content simultaneously. The multimedia content may relate to a still image, an icon, an animated image, a video and an audio, or any combination thereof. In some example embodiments, the multimedia content comprises multiple images or video (audio-video), which may be displayed sequentially, i.e. as a slideshow, or simultaneously, as a mosaic, a grid-based layout, or in any combination thereof.
  • In some example embodiments, a transparent image, such as 24 bit transparent Alpha Portable Network Graphics (PNG) or a shadow may be used to overlay the pictures to simulate an illusion of depth. This approach may be applied to all images to make them appear in theme and aesthetically pleasing.
  • In some example embodiments, a video may be cropped and its size adjusted to fit within a designed grid-based layout of a displayable page. For example, a horizontal landscape video may be cropped so that it may fit within a vertical page, or within a square on the page. A video may be automatically played when a user lands on a page, or automatically stopped when a user moves on to a different page, whilst retaining a time position of the video, so that when the user comes back to that page, the video continues playing where it left off. The video may also contain volume functions overplayed graphically so that the user may adjust the volume of a video clip. The video may be played back in a looped fashion so that the video keeps repeating infinitely without the need for replay. Formats used for videos and motion may include animated gifs, compressed video, vector animation, or any other video formats.
  • Multimedia content may be retrieved either from a user device memory (e.g., a local data store or a database) or from a remote server. A process for retrieving multimedia content may involve an analysis of the text currently displayable on the user device. For example, a currently displayable page of an e-book may be analyzed. As a result of the text analysis, a number of terms may be generated. Such terms may relate to words (e.g., nouns), phrases, or sentences. Further, the terms may be used to generate keywords. To this end, it may be determined what terms appear more frequently in the text, and those terms are used to construe keywords. Those skilled in the art would appreciate that various methods for generating keywords may be utilized.
  • Once the keywords for a portion of the displayable text content are generated, search queries may be formulated. Such search queries may be then used in a local database or a remote search engine to find and retrieve multimedia content associated with the keywords. When the multimedia content is retrieved from remote servers, it may be locally stored on the user device to enable faster access in the following cases. Once multimedia content is retrieved, it may be displayed on a user device screen along with a corresponding text portion. For example, software used for visualization of text portions may include widgets to embed multimedia content. In some other examples, a part of the user device screen may be virtually allocated for displaying the multimedia content. It should be understood, that there exist other ways of displaying text content and multimedia content.
  • In some example embodiments, keyword extraction and accompanying multimedia search results may yield inaccurate results. Therefore, a user may have an option of deleting an image or a video, which appears to be irrelevant. When the user deletes an image, the next best fitting search result may replace the deleted image. The user may repeat the deletion process until he selects an appropriate image. The selected image may be saved so when a similar query is performed it yields more accurate results. Such intelligent learning may be employed to provide better accuracy for image results.
  • In some example embodiments, the user may assign custom images from search results to particular keywords in a book or a text. These assignments may be stored to a central database, which may be shared with other users, so that that the users have a better experience with their own books. User devices may be configured to check the central database each time a page or an eBook is loaded with new keyword data or multimedia content, which would help improving user experience with the page.
  • In some example embodiments, if there are not enough images to fill up the entire grid-based layout on the displayable page, the unused grid spaces may be filled up with placeholder images, which may be include solid colors, or randomly selected placeholder images from a database of placeholder images. This is to provide an aesthetically pleasing layout even if there are not enough images to populate the entire page.
  • In some example embodiments, in order to lay out multiple images on a displayable page automatically, a different grid-based layout may be randomly generated for each displayable page. This may be done in two different ways. The first is when each page includes image containers whose size may be randomly generated, forming a mosaic of squares including images and solid colors. The second is when preprogrammed templates are stored on the user device or on a remote server providing a layout for an image mosaic.
  • In yet more embodiments, the user device may display a text document in the form of an image. For example, the user device may have a digital camera, which may be used to capture some printed text documents. Once such image containing a text is displayed on the user device screen, the image may be subjected to an optical character recognition process to derive terms used therein. These terms may then be used to generate keywords and search queries.
  • In some example embodiments, multimedia content may be added to the first occurrence, or all occurrences of a term in the displayed portion of the text, whichever option may be selected by the user. If the user wishes the multimedia content to be added to all occurrences of the term in the displayed portion of the text, he or she may specify that a different image or video is added to each occurrence of the term.
  • In some example embodiments, the methods described herein may be utilized within a tablet device environment. A tablet device may embed both e-books and web pages and may also be used as a reader for blogs, articles, and other online and offline documents.
  • In some example embodiments, in addition to displaying relevant multimedia content, hyperlinks to relevant resources on the Internet may be provided within the text. Images, videos, and keywords may be linked to their sources. Links may be presented by placing a button over the image, which may reveals the link or opens up a URL associated with the link.
  • In some example embodiments, prepackaged eBooks may come bundled with accompanying multimedia content and layouts so that no analysis or multimedia content search is required when viewing the eBook. This prepackaged eBook may be downloaded from a remote server or transferred via a disk drive or the Internet. The prepackaged eBook may include image files, video files, template files, graphic files, text files, eBook files, and any other files applicable to view the eBook in an offline mode (e.g., having no connection to a WAN).
  • It should be understood that methods to conduct searches for relevant multimedia content are not limited to the methods described herein, which are provided merely as examples. Other example search methods may be used.
  • In some example embodiments, if a single sentence within a portion of the displayed text contains multiple occurrences of contextual data, such as, for example, a location, a date, and a name, the entire sentence may be used for performing a semantic search for relevant multimedia content.
  • In some example embodiments, multimedia content may be added to user-generated presentations and essays. Images and videos may be added instantly using a predetermined template or layout as the user types a word that may be used for generating a keyword in accordance with the embodiments described herein. Keywords may also be predefined by the user, stored in the keywords database, and used to conduct searches for multimedia content when reproduced by the user in the text of a presentation or essay. This approach may be facilitated by software integrated within a word processor, e.g. Microsoft Word, as a plug-in or used as a standalone application.
  • In some embodiments, portions of the text to which multimedia content has been added or individual images or videos may be shared via a social network. For example, this may be done by highlighting an image or quote to be shared and pressing a “share” button.
  • Accordingly, there are disclosed unique computer-implemented methods for adding multimedia content to displayable text, web and physical environments. The operations of such methods may be implemented by software modules integrated with a user device, a remote server or a combination thereof. The present techniques provide a useful means for improving overall experience of enjoying reading books or text only documents.
  • Referring now to the drawings, FIG. 1 shows a block diagram illustrating a system environment 100 suitable for adding multimedia content to displayable text, web and physical environments. The system environment 100 may comprise a user device 102 with a browser 112, which may provide the ability to browse the Internet and interact with various websites, e.g. web search engine 106. A user device may include a computer or a laptop. In some embodiments, the user device 102 may be a mobile device that includes a mobile application 114. A mobile device may include a tablet computer, a handheld cellular phone, a mobile phone, a smart phone, a PDA, a handheld device having wireless connection capability, or any other electronic device. The system environment 100 may further include a system for adding multimedia content to displayable text, web and physical environments 104, a search engine 106, including a web search engine, such as Google or Bing, a third-party image search engine, or a local search engine embedded within the user device, a web-based data storage 108 that allows storing and retrieving the digital media content, and a network (e.g. the Internet) 110.
  • The network 110 may couple the aforementioned modules and is a network of data processing nodes interconnected for the purpose of data communication that may be utilized to communicatively couple various components of the system environment 100. The network 110 may include the Internet or any other network capable of communicating data between devices.
  • The network 110 may include links using technologies such as Ethernet, 802.11, worldwide interoperability for microwave access (WiMAX), 2G/3G/4G mobile communications protocols, digital subscriber line (DSL), asynchronous transfer mode (ATM), InfiniBand, PCI Express Advanced Switching, etc. Similarly, the networking protocols used on the network 110 may include multiprotocol label switching (MPLS), the transmission control protocol/Internet protocol (TCP/IP), the User Datagram Protocol (UDP), the hypertext transport protocol (HTTP), the simple mail transfer protocol (SMTP), the file transfer protocol (FTP), etc. The data exchanged over the network 110 may be represented using technologies or formats including image data in binary form (e.g. Portable Network Graphics (PNG), the hypertext markup language (HTML), the extensible markup language (XML), etc. In addition, all or some links may be encrypted using conventional encryption technologies such as secure sockets layer (SSL), transport layer security (TLS), virtual private networks (VPNs), Internet Protocol security (IPsec), etc. In some embodiments, the entities on the network 110 may use custom or dedicated data communications technologies instead of, or in addition to, the ones described above.
  • The user device 102 may include a computer, a laptop, a tablet computer, a portable computing device, a PDA, a handheld cellular phone, a mobile phone, a smart phone, a handheld device having a wireless connection capability, or any other electronic device. In various embodiments, the user device 102 may comprise a browser that provides the ability to browse the Internet. The user device 102 may be used to communicate with the search engine 106. Accordingly, the user device 102 may receive or transmit data, such as search queries, via a wired or a wireless network.
  • The system for adding multimedia content to displayable text, web and physical environments 104, according to exemplary embodiments disclosed herein, may be used for adding multimedia content to the displayable text, web and physical environments by interacting with the search engine 106 and the web-based data storage 108, which may provide multimedia content as users read text documents. More specifically, the system for adding multimedia content to text, web and physical environments 104 may analyze the portion of the text displayed using the user device 102, generate keywords based on the more frequently occurring terms (e.g., nouns), phrases, or sentences, generate search queries based on these keywords, and send them to the search engine 106.
  • In some embodiments, the search engine 106 may conduct a search for multimedia content and store the multimedia content in the web-based data storage 108. In some embodiments, the user device 102 and the system for adding multimedia content to displayable text, web and physical environments 104 may interact with the web-based data storage 108 to retrieve the multimedia content, which may then be displayed on the user device screen by the system for adding multimedia content to displayable text, web physical environments 104 as embedded in the portion of the text displayed, or in a separate portion or portions of the user device screen (not shown).
  • In some example embodiments, the system environment 100 may comprise an intermediary server (not shown). The intermediary server may be configured to manage and parse all search queries from the user device and manage image resizing, image fetching, image searches, keyword extraction, account management, account details storage, template storage, template generation, uploading of templates and user generated data, and so forth. The intermediary server may be also configured to provide updates to eBook templates on the user device.
  • FIG. 2 is a block diagram of the system for adding multimedia content to text, web and physical environments 104. In the shown embodiment, the system for adding multimedia content to text, web and physical environments 104 may comprise a communication module 210, a determining module 220, an analyzing module 230, a search query generator 240, a retrieving module 250, a displaying module 260, an optical character recognition module 270, a terms database 280, and a keywords database 290.
  • The communication module 210 may be configured to provide user interactions with the system for adding multimedia content to displayable text, web and physical environments 104, as well as to provide interaction between the different modules of this system. The determining module 220 may be configured to determine which of the terms may be used as keywords, to generate such keywords and to store the keywords in the keywords database 290.
  • The analyzing module 230 may be configured to analyze the portion of the text displayed in order to identify terms (e.g. nouns), phrases, and sentences, thereby enabling their further use as keywords. The analyzing module 230 may be configured to analyze dates mentioned in the portion of the text displayed or the in the entire article/book to establish a time context for the images being sought. The search query generator 240 may be configured to convert into search query keywords, formulated based on the terms, selected by the analyzing and determining modules 230, 220 and stored in the keywords database 290.
  • The retrieving module 250 may be configured, in some embodiments, to retrieve multimedia content from the web-based data storage 108, while in some other embodiments, the retrieving module 250 may be configure to retrieve multimedia content from the multimedia content database 340 shown in FIG. 3.
  • The displaying module 260 may be configured to display multimedia content retrieved by the retrieving module 250. The optical character recognition module 270 may be configured to scan a portion of a printed text document into an image containing a text to be processed by the analyzing module 230 and the determining module 220.
  • The terms database 280 may be configured to store the terms that were identified in the displayed portion of the text by the analyzing module 230 and suitable to be used as keywords by the determining module 220. The keywords database 290 may be configured to store keywords, formulated based on the terms that were selected by the analyzing module 230 and the determining module 220.
  • FIG. 3 is a block diagram of the user device 102 with the system for adding multimedia content to displayable text, web and physical environments. In some embodiments, specifically when a printed text document is used, the optical character recognition module 314 may be used to scan the printed text document into an image containing text, and the image with the text is further subjected to optical character recognition to generate one or more keywords. To this end, the user device 102 may also comprise a digital camera 316 to capture the displayed portion of the text.
  • In some example embodiments, the digital camera may be attached to a head-mountable display device, such as a heads-up display. For example, a person wearing the head-mountable display device with an in-built camera may view the surroundings and by means of the camera and computer device may scan the environment such as advertisements, signs, billboards, magazines, and newspapers for text. The scanned material may be further processed by the computer device to generate keywords and provide relevant real-time images and other multimedia on the heads-up display.
  • Interaction with the system for adding multimedia content to displayable text, web and physical environments 104 may be performed using the communication module 302. The analyzing module 306 may analyze the portion of the text displayed to identify those terms (e.g. noun), phrases, or sentences, which occur in this portion of the text more frequently. The determining module 304 may then determine which of the found terms may be used as keywords, generating such keywords and storing them in the keywords database 290. The identified terms may be stored in the terms database 320 and the keywords generated based on the identified terms may be stored in the keywords database 330.
  • The search query generator 308 may convert the stored keywords into search queries, which may be sent to the search engine 106. In some embodiments, these search queries may be used to search for multimedia content in the multimedia content database 340.
  • In some embodiments, the retrieving module 310 may interact with the web-based data storage 108 to retrieve the stored multimedia content. In other embodiments, the retrieving module 310 may retrieve multimedia content from the multimedia content database 340. The retrieved multimedia content is then visualized by the displaying module 312.
  • FIG. 4 is a process flow diagram showing a method 400 for adding multimedia content to displayable text, web and physical environments, according to an exemplary embodiment. The method 400 may be performed by processing logic that may comprise hardware, software (such as software run on a general-purpose computer system or a hand-held device), or a combination of both.
  • The method 400 may be applied using the various modules discussed above with reference to FIG. 3. Each of these modules may include processing logic. It will be appreciated that the examples of the foregoing modules may be virtual, and the instructions said to be executed by a module may, in fact, be retrieved and executed by a processor. The foregoing modules may also include memory cards, servers, and/or computer discs. Although the various modules may be configured to perform some or all of the various steps described herein, fewer or more modules may be provided and still fall within the scope of the exemplary embodiments.
  • As shown in FIG. 4, the method 400 may commence at operation 402 with determining that a user device displays a portion of a text document of a predetermined format. In some embodiments, if the text is a printed text document and a digital camera is used to capture it, the text may be subjected to an optical recognition process that may commence at operation 404.
  • At operation 406 the portion of the text displayed may be analyzed by the analyzing module 230 to identify the terms (e.g. nouns), phrases, and sentences. The determining module 220 may then determine which of the found terms may be used for generating keywords, and may then generate keywords based on these terms at operation 408. At operation 410, the search query generator 240 may be used to generate at least one search query, which may then be sent to the search engine 106.
  • In some example embodiments, the displayed text content may be transmitted to a server for processing. The processing of the text to generate keywords and the sourcing of images may take place on a remote web server (in the cloud), and the server may transmit back to the user device a combination of both text and images, either packaged together or as separately, so that the user device can display the text and accompanying images without having to conduct the search queries or the keyword generation locally. Thus, the determining module 220 may rely on cloud-based keyword identification and extraction as well as cloud based image retrieval.
  • At operation 412, the retrieving module may retrieve the multimedia content, associated with one or more keywords, from the search engine 106 and store the one or more keywords in local or remote data storage at operation 414. At operation 416, the displaying module displays the stored multimedia content concurrently with the corresponding portion of the text on the user device screen.
  • FIG. 5 is a process flow diagram showing a method 500 for optical character recognition of a text from text, web and physical environments, according to an example embodiment. As shown in FIG. 5, the method 500 may commence in operation 502 with scanning text associated text, web and physical environments. In one example embodiment, text, web, and physical environments may be scanned using a head-mountable display device with a built-in digital camera or a tablet device. In operation 504, the scanned text may be parsed. During the parsing, the scanned text may be analyzed to separate the text from graphics and to detect presence of columns and headlines. After the text is parsed, the shapes of individual characters are recognized via a character recognition process, in operation 506. Character recognition may be performed on any number of character fonts. In operation 508, context analysis may be performed to divide the text into words. Finally, the recognized characters may be formatted for output, in operation 510. After the optical character recognition is performed, the text may be electronically searched for keywords to retrieve, from data storages, pieces of multimedia associated with the one or more keywords.
  • FIG. 6 is a block diagram illustrating application of a head-mountable display device for optical character recognition of a text from a physical environment, in accordance with certain embodiments. As shown in FIG. 6, a user wearing a head-mountable display device 602, with a built-in digital camera, may view a physical environment 610. The digital camera may scan the surroundings, such as advertisements, signs, billboards, magazines, newspapers and so forth, for text content. The text content may then be processed using optical character recognition module 270 to retrieve keywords and display relevant pieces of multimedia content, such as images 604, 606, and 608, on a display of the head-mountable display device 602.
  • FIG. 7 is a block diagram illustrating the application of a tablet device for optical character recognition of a text from a text environment, in accordance with certain embodiments. As shown in FIG. 7, a tablet device 702 may be used to scan text environment 704. The text environment 704 may include any printed documents, newspapers, magazines and so forth. The text scanned from the text environment may then be processed using optical character recognition module 270 to retrieve keywords and display relevant pieces of multimedia content on the screen of the tablet device 702.
  • FIG. 8 is a block diagram showing a user device 800 having a browser within which a text 802 and the corresponding images 804 are displayed in a grid-like fashion. The displayed portion of the text 802 is located within the left-hand section of the user device screen. To the right of the displayed portion of the text 802 four images are displayed. Two larger rectangular images 804, 804 c are positioned, respectively, at the top and the bottom of the right-hand section of the user-device screen, while two smaller-sized square images 804 a, 804 b are positioned next to each other in the middle of the right-hand section of the user device screen.
  • FIG. 9 is a block diagram showing a graphical user interface of a user device 900, displaying a text 902 and the corresponding multimedia content ( images 904, 904 a and videos 906, 906 a) adjacent to the text in a grid-like fashion. The displayed portion of the text 902 is located in the left-hand section of the user device screen. Adjacent to the displayed portion of the text 902 are two videos and two images. The videos 906, 906 a are represented by larger-sized rectangular images and positioned, respectively, on top and at the bottom of the right-hand section of the user-device screen, while the images 904, 904 a, which are smaller in size and square-shaped, are positioned next to each other in between the videos 906, 906 a, in the middle of the right-hand section of the user device screen.
  • FIG. 10 shows a diagrammatic representation of a computing device for a machine in the example electronic form of a computer system 1000, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed. In example embodiments, the machine operates as a standalone device or can be connected (e.g., networked) to other machines. In a networked deployment, the machine can operate in the capacity of a server or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine can be a personal computer (PC), a tablet PC, a set-top box (STB), a PDA, a cellular telephone, a portable music player (e.g., a portable hard drive audio device, such as an Moving Picture Experts Group Audio Layer 3 (MP3) player), a web appliance, a network router, a switch, a bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
  • The example computer system 1000 includes a processor or multiple processors 1002 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), or both), and a main memory 1004 and a static memory 1006, which communicate with each other via a bus 1008. The computer system 1000 may further include a video display unit 1010 (e.g., a liquid crystal display (LCD) or cathode ray tube (CRT)). The computer system 1000 may also include at least one input device 1012, such as an alphanumeric input device (e.g., a keyboard), a cursor control device (e.g., a mouse), a microphone, a digital camera, a video camera, and so forth. The computer system 1000 may also include a disk drive unit 1014, a signal generation device 1016 (e.g., a speaker), and a network interface device 1018.
  • The disk drive unit 1014 may include a computer-readable medium 1020 which may store one or more sets of instructions and data structures (e.g., instructions 1022) embodying or utilized by any one or more of the methodologies or functions described herein. The instructions 1022 may also reside, completely or at least partially, within the main memory 1004 and/or within the processors 1002 during execution thereof by the computer system 1000. The main memory 1004 and the processors 1002 may also constitute machine-readable media.
  • The instructions 1022 may further be transmitted or received over the network 110 via the network interface device 1018 utilizing any one of a number of well-known transfer protocols (e.g., Hyper Text Transfer Protocol (HTTP), CAN, Serial, and Modbus).
  • While the computer-readable medium 1020 is shown in an example embodiment to be a single medium, the term “computer-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “computer-readable medium” shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the machine and that causes the machine to perform any one or more of the methodologies of the present application, or that is capable of storing, encoding, or carrying data structures utilized by or associated with such a set of instructions. The term “computer-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical and magnetic media. Such media may also include, without limitation, hard disks, floppy disks, flash memory cards, digital video disks, random access memory (RAM), read only memory (ROM), and the like.
  • The example embodiments described herein may be implemented in an operating environment comprising computer-executable instructions (e.g., software) installed on a computer, in hardware, or in a combination of software and hardware. The computer-executable instructions may be written in a computer programming language or can be embodied in firmware logic. If written in a programming language conforming to a recognized standard, such instructions may be executed on a variety of hardware platforms and for interfaces to a variety of operating systems. Although not limited thereto, computer software programs for implementing the present method may be written in any number of suitable programming languages such as, for example, Hypertext Markup Language (HTML), Dynamic HTML, Extensible Markup Language (XML), Extensible Stylesheet Language (XSL), Document Style Semantics and Specification Language (DSSSL), Cascading Style Sheets (CSS), Synchronized Multimedia Integration Language (SMIL), Wireless Markup Language (WML), Java™, Jini™, C, C++, Perl, UNIX Shell, Visual Basic or Visual Basic Script, Virtual Reality Markup Language (VRML), ColdFusion™ or other compilers, assemblers, interpreters or other computer languages or platforms.
  • Thus, methods and systems for automated augmentation of text, web and physical environments using multimedia content have been described. The disclosed technique provides a useful tool to enable people to easily purchase product items from multiple retailer websites within, for example, a social network, without leaving the website of the network.
  • Although embodiments have been described with reference to specific example embodiments, it will be evident that various modifications and changes can be made to these example embodiments without departing from the broader spirit and scope of the present application. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Claims (20)

What is claimed is:
1. A computer-implemented method for automated augmentation of text, web and physical environments using multimedia content, the method comprising:
determining that a user device displays a portion of the text document in a predetermined format;
analyzing the portion of the text document to generate one or more keywords;
generating, based on the one or more keywords, at least one search query for a multimedia content search via a search engine;
retrieving, from one or more data storages, one or more pieces of multimedia associated with the one or more keywords; and
enabling the user device to display the one or more pieces of multimedia concurrently with the portion of the text document, wherein the one or more pieces of multimedia are displayed on the user device according to predetermined user settings.
2. The method of claim 1, wherein the analyzing of the portion of the text document to generate the one or more keywords includes:
parsing the portion of the text document;
identifying terms that start with an uppercase letter;
identifying terms related to contextual data; and
identifying sentences that contain multiple contextual data to be used as keyword combinations.
3. The method of claim 1, wherein the analysis of the portion of the text document to generate the one or more keywords is carried out on a remote web server.
4. The method of claim 1, wherein the displayable text document is a user-generated presentation or essay.
5. The method of claim 1, wherein the one or more keywords are predetermined by the user and stored in the keywords database.
6. The method of claim 1, further comprising adding, to the portion of the text document, hyperlinks to web resources related to the one or more keywords.
7. The method of claim 1, further comprising applying an optical character recognition to process an image containing text to generate the one or more keywords.
8. The method of claim 1, wherein the data storage includes one or more of the following: a remote database, a web site, a local database, and a cache of the user device.
9. The method of claim 1, wherein the search engine includes one or more of the following: a third party image search engine, a web search engine, and a local search engine embedded within the user device.
10. The method of claim 1, further comprising storing the pieces of multimedia associated with the one or more keywords to a local or remote data storage.
11. The method of claim 1, further comprising generating one or more unique identifiers associated with a piece of multimedia and the one or more keywords associated with the piece of multimedia, wherein the one or more unique identifiers are stored in the local or remote data storage along with the piece of multimedia and the one or more keywords.
12. The method of claim 1, wherein the predetermined user settings include one or more of the following: a first number of displayable pieces of multimedia per a portion of the text document, a second number of the displayable pieces of multimedia per a slideshow, types of the displayable pieces of multimedia, a size of the user device screen allocated for displaying the pieces of multimedia, a number of slideshows, and a grid style used for arrangement of multiple pieces of multimedia on the user device screen.
13. The method of claim 1, wherein the pieces of multimedia are displayed dynamically depending on a currently displayable portion of the text document.
14. The method of claim 1, further comprising sharing the text, to which multimedia content has been added, or individual pieces of multimedia via a social network.
15. A system for automated augmentation of text, web and physical environments using multimedia content, the system comprising:
a determining module configured to determine whether a user device displays a portion of the text document in a predetermined format;
an analyzing module configured to analyze the portion of the text document to generate one or more keywords;
a search query generator configured to generate, based on the one or more keywords, at least one search query for a multimedia content search via a search engine;
a retrieving module configured to retrieve, from one or more data storages, one or more pieces of multimedia associated with the one or more keywords; and
a displaying module configured to enable the user device to display the one or more pieces of multimedia concurrently with the portion of the text document, wherein the one or more pieces of multimedia are displayed on the user device according to predetermined user settings.
16. The system of claim 15, wherein the analyzing module is further configured to:
parse the portion of the text document;
identify terms that start with an uppercase letter;
identify terms that constitute contextual data; and
identify sentences that contain multiple contextual data to be used as keyword combinations.
17. The system of claim 15, wherein the displaying module is further configured to add to the portion of the text document hyperlinks to web resources related to the one or more keywords.
18. The system of claim 15, further comprising an optical character recognition module configured to process an image containing text to generate the one or more keywords.
19. The system of claim 15, wherein the one or more pieces of multimedia are displayed dynamically depending on a currently displayable portion of the text document.
20. A computer-readable medium having instructions stored thereon, which when executed by one or more computers, causes the one or more computers to:
determine that a user device displays a portion of the text document in a predetermined format;
analyze the portion of the text document to generate one or more keywords;
generate, based on the one or more keywords, at least one search query for a multimedia content search via a search engine;
retrieve, from one or more data storages, one or more pieces of multimedia associated with the one or more keywords; and
enable the user device to display the one or more pieces of multimedia concurrently with the portion of the text document, wherein the one or more pieces of multimedia are displayed in the user device according to predetermined user settings.
US13/693,246 2011-12-04 2012-12-04 Automated augmentation of text, web and physical environments using multimedia content Abandoned US20130145241A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US13/693,246 US20130145241A1 (en) 2011-12-04 2012-12-04 Automated augmentation of text, web and physical environments using multimedia content
US16/551,508 US11256848B2 (en) 2011-12-04 2019-08-26 Automated augmentation of text, web and physical environments using multimedia content
US17/651,783 US20220171915A1 (en) 2011-12-04 2022-02-18 Automated augmentation of text, web and physical environments using multimedia content

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201161566659P 2011-12-04 2011-12-04
US13/693,246 US20130145241A1 (en) 2011-12-04 2012-12-04 Automated augmentation of text, web and physical environments using multimedia content

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/551,508 Continuation US11256848B2 (en) 2011-12-04 2019-08-26 Automated augmentation of text, web and physical environments using multimedia content

Publications (1)

Publication Number Publication Date
US20130145241A1 true US20130145241A1 (en) 2013-06-06

Family

ID=48524907

Family Applications (3)

Application Number Title Priority Date Filing Date
US13/693,246 Abandoned US20130145241A1 (en) 2011-12-04 2012-12-04 Automated augmentation of text, web and physical environments using multimedia content
US16/551,508 Active US11256848B2 (en) 2011-12-04 2019-08-26 Automated augmentation of text, web and physical environments using multimedia content
US17/651,783 Abandoned US20220171915A1 (en) 2011-12-04 2022-02-18 Automated augmentation of text, web and physical environments using multimedia content

Family Applications After (2)

Application Number Title Priority Date Filing Date
US16/551,508 Active US11256848B2 (en) 2011-12-04 2019-08-26 Automated augmentation of text, web and physical environments using multimedia content
US17/651,783 Abandoned US20220171915A1 (en) 2011-12-04 2022-02-18 Automated augmentation of text, web and physical environments using multimedia content

Country Status (1)

Country Link
US (3) US20130145241A1 (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130174002A1 (en) * 2012-01-03 2013-07-04 International Business Machines Corporation Database Field Extraction for Contextual Collaboration
US20140047332A1 (en) * 2012-08-08 2014-02-13 Microsoft Corporation E-reader systems
US20140101542A1 (en) * 2012-10-09 2014-04-10 Microsoft Corporation Automated data visualization about selected text
US20140189501A1 (en) * 2012-12-31 2014-07-03 Adobe Systems Incorporated Augmenting Text With Multimedia Assets
US20140297678A1 (en) * 2013-03-27 2014-10-02 Cherif Atia Algreatly Method for searching and sorting digital data
WO2015017525A1 (en) * 2013-07-30 2015-02-05 Haiku Deck, Inc. Automatically evaluating content to create multimedia presentation
US20150169545A1 (en) * 2013-12-13 2015-06-18 International Business Machines Corporation Content Availability for Natural Language Processing Tasks
US20150193061A1 (en) * 2013-01-29 2015-07-09 Google Inc. User's computing experience based on the user's computing activity
US20150248380A1 (en) * 2012-05-15 2015-09-03 Google Inc. Extensible framework for ereader tools, including named entity information
US20160070782A1 (en) * 2014-09-10 2016-03-10 Microsoft Corporation Associating content items with document sections
US20160068002A1 (en) * 2014-09-05 2016-03-10 Todd Keller Hybrid print-electronic book
US9984486B2 (en) 2015-03-10 2018-05-29 Alibaba Group Holding Limited Method and apparatus for voice information augmentation and displaying, picture categorization and retrieving
US10073819B2 (en) * 2014-05-30 2018-09-11 Hewlett-Packard Development Company, L.P. Media table for a digital document
US20180268523A1 (en) * 2015-12-01 2018-09-20 Sony Corporation Surgery control apparatus, surgery control method, program, and surgery system
CN110275860A (en) * 2019-06-24 2019-09-24 深圳市理约云信息管理有限公司 A kind of system and method recording instruction process
EP3547160A1 (en) * 2018-03-27 2019-10-02 Nokia Technologies Oy Creation of rich content from textual content
US20210193109A1 (en) * 2019-12-23 2021-06-24 Adobe Inc. Automatically Associating Context-based Sounds With Text
US11256848B2 (en) 2011-12-04 2022-02-22 Ahmed Salama Automated augmentation of text, web and physical environments using multimedia content
US11354509B2 (en) * 2017-06-05 2022-06-07 Deepmind Technologies Limited Action selection based on environment observations and textual instructions

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11933986B2 (en) * 2022-03-11 2024-03-19 Bank Of America Corporation Apparatus and methods to extract data with smart glasses

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6904560B1 (en) * 2000-03-23 2005-06-07 Adobe Systems Incorporated Identifying key images in a document in correspondence to document text
US20080082497A1 (en) * 2006-09-29 2008-04-03 Leblang Jonathan A Method and system for identifying and displaying images in response to search queries
US20080086453A1 (en) * 2006-10-05 2008-04-10 Fabian-Baber, Inc. Method and apparatus for correlating the results of a computer network text search with relevant multimedia files
US20090235150A1 (en) * 2008-03-17 2009-09-17 Digitalsmiths Corporation Systems and methods for dynamically creating hyperlinks associated with relevant multimedia content
US20110179010A1 (en) * 2010-01-15 2011-07-21 Hulu Llc Method and apparatus for providing supplemental video content for third party websites
US20110213655A1 (en) * 2009-01-24 2011-09-01 Kontera Technologies, Inc. Hybrid contextual advertising and related content analysis and display techniques
US20120163707A1 (en) * 2010-12-28 2012-06-28 Microsoft Corporation Matching text to images
US20150170333A1 (en) * 2011-08-31 2015-06-18 Google Inc. Grouping And Presenting Images

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2002340033A1 (en) 2001-09-25 2003-04-07 Wildseed, Ltd. Wireless mobile image messaging
US8634696B2 (en) 2004-12-15 2014-01-21 Nikon Corporation Image reproduction system
US20090276500A1 (en) * 2005-09-21 2009-11-05 Amit Vishram Karmarkar Microblog search engine system and method
US8756528B2 (en) 2006-05-08 2014-06-17 Ascom (Sweden) Ab System and method of customizing video display layouts having dynamic icons
US8661035B2 (en) * 2006-12-29 2014-02-25 International Business Machines Corporation Content management system and method
US8406531B2 (en) 2008-05-15 2013-03-26 Yahoo! Inc. Data access based on content of image recorded by a mobile device
KR101023389B1 (en) * 2009-02-23 2011-03-18 삼성전자주식회사 Apparatus and method for improving performance of character recognition
US9418136B1 (en) * 2009-03-31 2016-08-16 Cellco Partnership Method and system for matching descriptive text for a multimedia content in a vendor's catalog with descriptive text for a multimedia content in media store's catalog
US20120179704A1 (en) * 2009-09-16 2012-07-12 Nanyang Technological University Textual query based multimedia retrieval system
US9323784B2 (en) * 2009-12-09 2016-04-26 Google Inc. Image search using text-based elements within the contents of images
US20120137237A1 (en) * 2010-08-13 2012-05-31 Sony Corporation System and method for digital image and video manipulation and transfer
US9275079B2 (en) 2011-06-02 2016-03-01 Google Inc. Method and apparatus for semantic association of images with augmentation data
AU2011204946C1 (en) * 2011-07-22 2012-07-26 Microsoft Technology Licensing, Llc Automatic text scrolling on a head-mounted display
US20130145241A1 (en) 2011-12-04 2013-06-06 Ahmed Salama Automated augmentation of text, web and physical environments using multimedia content

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6904560B1 (en) * 2000-03-23 2005-06-07 Adobe Systems Incorporated Identifying key images in a document in correspondence to document text
US20080082497A1 (en) * 2006-09-29 2008-04-03 Leblang Jonathan A Method and system for identifying and displaying images in response to search queries
US20080086453A1 (en) * 2006-10-05 2008-04-10 Fabian-Baber, Inc. Method and apparatus for correlating the results of a computer network text search with relevant multimedia files
US20090235150A1 (en) * 2008-03-17 2009-09-17 Digitalsmiths Corporation Systems and methods for dynamically creating hyperlinks associated with relevant multimedia content
US20110213655A1 (en) * 2009-01-24 2011-09-01 Kontera Technologies, Inc. Hybrid contextual advertising and related content analysis and display techniques
US20110179010A1 (en) * 2010-01-15 2011-07-21 Hulu Llc Method and apparatus for providing supplemental video content for third party websites
US20120163707A1 (en) * 2010-12-28 2012-06-28 Microsoft Corporation Matching text to images
US20150170333A1 (en) * 2011-08-31 2015-06-18 Google Inc. Grouping And Presenting Images

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Yeh et al., "A Case for Query by Image and Text Content: Searching Computer Help using Screenshots and Keywords", March 28-April 1, 2011, International World Wide Web Conference Committee (IW3C2), ACM 978-I-4503-0632-4/11/03, pages: 10 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11256848B2 (en) 2011-12-04 2022-02-22 Ahmed Salama Automated augmentation of text, web and physical environments using multimedia content
US20130174002A1 (en) * 2012-01-03 2013-07-04 International Business Machines Corporation Database Field Extraction for Contextual Collaboration
US9141715B2 (en) * 2012-01-03 2015-09-22 International Business Machines Corporation Automated hyperlinking in electronic communication
US10102187B2 (en) * 2012-05-15 2018-10-16 Google Llc Extensible framework for ereader tools, including named entity information
US20150248380A1 (en) * 2012-05-15 2015-09-03 Google Inc. Extensible framework for ereader tools, including named entity information
US20140047332A1 (en) * 2012-08-08 2014-02-13 Microsoft Corporation E-reader systems
US20140101542A1 (en) * 2012-10-09 2014-04-10 Microsoft Corporation Automated data visualization about selected text
US9940307B2 (en) * 2012-12-31 2018-04-10 Adobe Systems Incorporated Augmenting text with multimedia assets
US20140189501A1 (en) * 2012-12-31 2014-07-03 Adobe Systems Incorporated Augmenting Text With Multimedia Assets
US11822868B2 (en) * 2012-12-31 2023-11-21 Adobe Inc. Augmenting text with multimedia assets
US20150193061A1 (en) * 2013-01-29 2015-07-09 Google Inc. User's computing experience based on the user's computing activity
US20140297678A1 (en) * 2013-03-27 2014-10-02 Cherif Atia Algreatly Method for searching and sorting digital data
WO2015017525A1 (en) * 2013-07-30 2015-02-05 Haiku Deck, Inc. Automatically evaluating content to create multimedia presentation
US9792276B2 (en) * 2013-12-13 2017-10-17 International Business Machines Corporation Content availability for natural language processing tasks
US9830316B2 (en) 2013-12-13 2017-11-28 International Business Machines Corporation Content availability for natural language processing tasks
US20150169545A1 (en) * 2013-12-13 2015-06-18 International Business Machines Corporation Content Availability for Natural Language Processing Tasks
US10073819B2 (en) * 2014-05-30 2018-09-11 Hewlett-Packard Development Company, L.P. Media table for a digital document
US20160068002A1 (en) * 2014-09-05 2016-03-10 Todd Keller Hybrid print-electronic book
US20160070782A1 (en) * 2014-09-10 2016-03-10 Microsoft Corporation Associating content items with document sections
US10216833B2 (en) * 2014-09-10 2019-02-26 Microsoft Technology Licensing, Llc Associating content items with document sections
US9984486B2 (en) 2015-03-10 2018-05-29 Alibaba Group Holding Limited Method and apparatus for voice information augmentation and displaying, picture categorization and retrieving
US11127116B2 (en) * 2015-12-01 2021-09-21 Sony Corporation Surgery control apparatus, surgery control method, program, and surgery system
US20180268523A1 (en) * 2015-12-01 2018-09-20 Sony Corporation Surgery control apparatus, surgery control method, program, and surgery system
US11354509B2 (en) * 2017-06-05 2022-06-07 Deepmind Technologies Limited Action selection based on environment observations and textual instructions
US20220318516A1 (en) * 2017-06-05 2022-10-06 Deepmind Technologies Limited Action selection based on environment observations and textual instructions
EP3547160A1 (en) * 2018-03-27 2019-10-02 Nokia Technologies Oy Creation of rich content from textual content
WO2019185689A1 (en) * 2018-03-27 2019-10-03 Nokia Technologies Oy Creation of rich content from textual content
CN110275860A (en) * 2019-06-24 2019-09-24 深圳市理约云信息管理有限公司 A kind of system and method recording instruction process
US20210193109A1 (en) * 2019-12-23 2021-06-24 Adobe Inc. Automatically Associating Context-based Sounds With Text
US11727913B2 (en) * 2019-12-23 2023-08-15 Adobe Inc. Automatically associating context-based sounds with text

Also Published As

Publication number Publication date
US20220171915A1 (en) 2022-06-02
US20200193081A1 (en) 2020-06-18
US11256848B2 (en) 2022-02-22

Similar Documents

Publication Publication Date Title
US11256848B2 (en) Automated augmentation of text, web and physical environments using multimedia content
US9552212B2 (en) Caching intermediate data for scroll view rendering
TWI590157B (en) Compressed serialization of data for communication from a client-side application
US10769353B2 (en) Dynamic streaming content provided by server and client-side tracking application
US20100095198A1 (en) Shared comments for online document collaboration
US9940396B1 (en) Mining potential user actions from a web page
JP2015526808A (en) Creating variations when converting data to consumer content
JP2014029701A (en) Document processing for mobile devices
EP2859467A1 (en) Transforming data into consumable content
EP3008622A2 (en) Interaction of web content with an electronic application document
US20150227276A1 (en) Method and system for providing an interactive user guide on a webpage
US20140164915A1 (en) Conversion of non-book documents for consistency in e-reader experience
US10289747B2 (en) Dynamic file concatenation
WO2016018683A1 (en) Image based search to identify objects in documents
US20140164366A1 (en) Flat book to rich book conversion in e-readers
US20210019360A1 (en) Crowdsourcing-based structure data/knowledge extraction
US9648381B2 (en) Method and system for managing display of web-based content on portable communication devices
CN113557504A (en) System and method for improved search and categorization of media content items based on their destinations
JP5955186B2 (en) Information processing device
Mahdavi et al. Web transcoding for mobile devices using a tag-based technique
Rachovski et al. Conceptual model of an application for automated generation of webpage mobile versions
US10922476B1 (en) Resource-efficient generation of visual layout information associated with network-accessible documents
US11880424B1 (en) Image generation from HTML data using incremental caching
NL2025417B1 (en) Intelligent Content Identification and Transformation
US20230376555A1 (en) Generating a Snippet Packet Based on a Selection of a Portion of a Web Page

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION