WO2002037469A2 - Speech generating system and method - Google Patents

Speech generating system and method Download PDF

Info

Publication number
WO2002037469A2
WO2002037469A2 PCT/IL2001/001009 IL0101009W WO0237469A2 WO 2002037469 A2 WO2002037469 A2 WO 2002037469A2 IL 0101009 W IL0101009 W IL 0101009W WO 0237469 A2 WO0237469 A2 WO 0237469A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
selecting
audio
user
text
Prior art date
Application number
PCT/IL2001/001009
Other languages
French (fr)
Other versions
WO2002037469A3 (en
Inventor
Zeev Lavi
Moshe Gilad
Ronen Dvashi
Original Assignee
Infinity Voice Holdings Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Infinity Voice Holdings Ltd. filed Critical Infinity Voice Holdings Ltd.
Priority to AU2002214227A priority Critical patent/AU2002214227A1/en
Publication of WO2002037469A2 publication Critical patent/WO2002037469A2/en
Publication of WO2002037469A3 publication Critical patent/WO2002037469A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

Definitions

  • the present invention is relate to the field of text to speech conversion. BACKGROUND OF THE INVENTION
  • Advertisement targeting and personalization of WWW interactions generally require a user profile which is typically maintained at the behest of the site owner or the viewer. Such a profile may also be maintained by a third party advertiser that targets advertisements to a viewer through the WWW page or via other means.
  • An aspect of some embodiments of the invention relates to reading out and annotating WWW pages using computer generated speech.
  • a part of a WWW page is translated to a viewer's preferred language before being read out.
  • animation associated with the speech is added, for example an image of a stick figure waving appendages or lip movements for a face figure.
  • the animation may be determined automatically or it may be added manually to the page or an associated database.
  • an animation template database associates particular animation or an animation layout with a WWW page layout.
  • the WWW page layout may be determined, for example, by analyzing the WWW page and/or by the WWW page being created using an authoring system that uses standard layouts.
  • information other than or additional to animation is associated with a page template.
  • the read out subject matter comprises information missing from the WWW page, for example supplied by other means or information that is removed from the WWW page during conversion to a different screen format (e.g., a cellular telephone screen).
  • the added animation is added to support the understanding of a converted WWW page.
  • An aspect of some embodiments of the invention relates to the generation of sound effects in a speech rendering of a WWW page to indicate objects of interest and/or links.
  • a "bong" sound is generate prior to presenting a link, an image of interest, a matched search word and/or other objects of interest.
  • An aspect of some embodiments of the invention relates to a method of combining speech and sounds.
  • a sound effects set (possibly including speech) is mixed with an output of a text-to-speech generator, after the speech is generated, and before sounds are provided to a sound card.
  • a stereo signal is provided as an output.
  • An aspect of some embodiments of the invention relates to an audio-based automated salesman that is not associated with a particular site being viewed.
  • the salesmen can suggest various products from the site, or from competing sites, to the viewer.
  • the salesman is not an agent of the viewer or of the site, but may receive a commission from the site and/or the viewer.
  • the salesman converses in the native language of the viewer, optionally providing a transaction and/or explanation of the products on the site.
  • a viewer may interact with the salesman using audio and/or visual tools and may, in some embodiments of the invention, conclude a sale via the salesman.
  • An aspect of some embodiments of the invention relates to a method of text presentation, in which a text to speech conversion system retrieves a previously prepared audio file, in replacement for a text segment and/or a non-readable element.
  • the element is an advertisement.
  • An aspect of some embodiments of the invention relates to a method of analyzing a WWW page for read out.
  • a target page is divided into readable elements, each of which can be selected to be read out and a plurality of unreadable elements that can be removed.
  • the readable elements are automatically categorized, for example "menu", link list" and "article headline”.
  • the categorized elements are grouped into groups.
  • the groups are selected, for example pre-selected or ad-hoc, so that a resulting voice menu and/or voice menu structure used to read out the page has desired properties, for example, a minimum or maximum number of elements.
  • associations between the categories are predefined, to assist grouping of the categorized using a logical scheme.
  • a relative level of association between two categories can determine, for example, whether the two categories will be merged for a particular page, or whether other two categories will be merged.
  • the number of elements in a category may determine if it is to be merged (e.g., low numbers are merged, so that voice menus are not wasted).
  • the categorization of an item may be changed, for example if more than one categorization fits an item (or group of items), and the different categorization affords a more desirable menu structure.
  • a method of analyzing a WWW site for readout comprising: parsing the site to identify items for which to generate an audible indication; categorizing the identified items by category; grouping the categories; and generating at least one voice menu based on said grouping, wherein said grouping comprises grouping so that at least some of the generated menus have a desirable property.
  • said desirable property comprises a minimum number of elements in a menu.
  • said desirable property comprises a maximum number of elements in a menu.
  • grouping comprises grouping based on pre-defined associations of categories.
  • grouping comprises ordering said categories for presentation.
  • the generated menus include a main menu and sub menus.
  • said main menu is shorter than 10 items.
  • said main menu is shorter than 7 items.
  • said main menu is shorter than 5 items.
  • generating voice menus comprises merging the items in at least two categories into a single category.
  • grouping comprises changing the categorization of an item to achieve the desired property.
  • a method of audio browsing of data that includes text data comprising: selecting from a remote database, by a user, data including text data to be provided in an audio manner; automatically providing to said user, audio corresponding to said selected data; determining at least an indication of a content of said selected data; and automatically providing to said user, data in audio manner and relating to said determined indication.
  • selecting comprises selecting data by selecting a page.
  • selecting comprises selecting data by selecting a WWW site.
  • selecting comprises selecting data from a menu.
  • selecting comprises selecting using a telephone handset with no visual display assistance.
  • selecting comprises selecting using a telephone handset with a limited display incapable of satisfactory displaying of the data in a visual manner.
  • selecting comprises selecting using a cellular telephone.
  • said data comprises a text segment.
  • said data comprises an article.
  • said data comprises an audio clip.
  • said corresponding audio comprises a text to speech rendition of said text.
  • said corresponding audio comprises a translation of said text.
  • said corresponding audio comprises a recording of a human reading of said text.
  • determining at least an indication comprises matching a keyword of said data.
  • determining at least an indication comprises identifying a source of said data.
  • determining at least an indication comprises matching said data to a template.
  • said relating data comprises a help message.
  • said relating data comprises an unsolicited sales offer.
  • said relating data comprises a comparison with data from a different source.
  • said relating data comprises an unsolicited comment.
  • said relating data comprises an advertisement.
  • said relating data comprises audio of an interactive sales program.
  • said relating data is provided locally to said user. Alternatively or additionally, said relating data is provided to compensate for lack of visual display quality. Alternatively or additionally, said relating data is provided to compensate for data which is not presented and not selected by the user for audio presentation. In an exemplary embodiment of the invention, said relating data is provided in a language native to said user and other from a language of said data.
  • said relating data is personalized to match at least one attribute of said user.
  • said related data is sounded after said corresponding audio is sounded.
  • said related data is requested by said user.
  • FIG. 1 is a schematic diagram of a configuration including an Internet speech generator, in accordance with a n exemplary embodiment of the invention
  • Fig. 2 is a schematic block diagram of a speech and sound mixing system, in accordance with an exemplary embodiment of the invention
  • Fig. 3A is an exemplary WWW page to be read out in accordance with an exemplary embodiment of the invention.
  • Fig. 3B is a flowchart of a process of processing the page of Fig. 3A, in accordance with an exemplary embodiment of the invention
  • Fig. 3C is a flowchart of a process of reading out the page of Fig. 3A, in accordance with an exemplary embodiment of the invention
  • Fig. 4 is a block diagram of a cell-phone configuration, in accordance with an exemplary embodiment of the invention.
  • Fig. 5 is a schematic block diagram of a system topology, in accordance with an exemplary embodiment of the invention.
  • Fig. 1 is a schematic diagram of a configuration 100 including an Internet speech generator 108, in accordance with an exemplary embodiment of the invention.
  • Configuration 100 includes a viewer that browses a target site 106 via an Internet 104, for example using a browser executing on a general purpose computer, as known in the art or using other display tools.
  • Speech generator 108 generates speech and/or animation annotations for viewer 102.
  • speech generator 108 is on a separate computer from both viewer 102 and site 106.
  • some or all of the functionality of generator 108 may be located at viewer 102 and/or target site 106 and/or distributed between several computers, connected for example via Internet 104.
  • speech generator 108 is provided on a LAN that interconnects several viewers 102 and/or at an ISP or on a proxy server, for example one that serves a plurality of people with a same language need.
  • the speech generated by speech generator 108 may be transmitted over the network
  • LAN local area network
  • audio e.g., using standard methods
  • codes e.g., syllables, phonemic codes, optionally with phonic concatenation hints
  • program e.g., a Java applet
  • the speech generation is performed on the computer of viewer 102.
  • speech generator 108 comprises a page analyzer 110 that analyses the WWW page information at site 106. Exemplary types of analysis include selecting which text to convert to speech and link and advertisement detection. Speech annotation or conversion is performed by a speech annotator/converter 112. In an exemplary embodiment of the invention, this converter comprises a standard text-to- speech converter software module or unit. Alternatively or additionally, speech generator 108 comprises an animation annotator 114, for example for adding animation annotations to the WWW page or to the speech generated for the page. Alternatively or additionally, other multimedia elements may be added, for example, audio clips.
  • An optional database 116 maybe provided, for example for storing page templates (described below), for storing associations of animation with speech, for storing sound clips (e.g., to replace text and/or advertisements) and/or for storing help messages (described below).
  • Speech generator 108 may interact with site 106 in various ways, including, for example, site 106 may retrieve the annotations from generator 108, for incorporation in its output; site 106 may be retrieved by viewer 102 via generator 108 or a separate server (not shown) that annotates the contents of site 106; or viewer 102 may retrieve site 106 and annotations from generator 108 in parallel, and add the annotations at viewer 102.
  • Fig. 2 is a schematic block diagram of a speech and sound mixing system 200, in accordance with an exemplary embodiment of the invention.
  • HTML data (202) is parsed to yield plain text (204).
  • a text to speech generator 206 converts the plain text into audio signals, to be outputted as sound-waves (208) at viewer 102.
  • HTML is a marked-up text file.
  • some of the mark ups are used to modify the audio output at viewer 102.
  • the mark-ups are passed to text-to-speech generator 206. However, this may require a dedicated generator software and/or a special preprocessor for converting the text mark-ups into command parameters of generator 206.
  • the mark-ups are converted into audio effects and/or speech using a separate channel, which is mixed using a mixer 210 to form part of sounds 208.
  • links (212) cause a wave generator 214 to generate a "bong" sound preceding the recitations of the link.
  • wave generator 214 reads out the link using a phonetic readout, rather than as English.
  • the mark-up path is used to control various parameters of mixer 210, for example volume, speed, volume and voice type (e.g., women or child).
  • the generation of speech or other sound effects is automatic when the page is retrieved and/or displayed, for example immediate, or after a delay.
  • the audio may be generated and/or presented when viewer 102 interacts with a display.
  • links or active buttons are substituted for text with which audio is associated.
  • the viewer's browser can detect the interact with such areas.
  • the audio is played.
  • the pages are based on a template of the page structure. Although over 2 billion WWW pages are extant at present, many of the pages match one of a small number of formats.
  • templates for these pages associate one or more of the following with the page format, for automatic generation of text (e.g., for annotations), speech and/or animation:
  • the WWW page is analyzed, for example using methods known in the art, for example to detect links (e.g., based on HTML tags), headlines (e.g., based on relative or absolute font size).
  • links e.g., based on HTML tags
  • headlines e.g., based on relative or absolute font size
  • a user interface is provided for generator 108, for example to allow a user to set one or more of the following parameters:
  • speech characteristics such as speed, volume and voice type
  • Speech generator 108 may operate in various operational modes, including, for example, one or more of the following modes:
  • Unsolicited explanations may be provided, for example, based on a user profile that can indicate a viewer's past queries. Such a user profile and/or explanations may be stored locally to viewer 102 or remotely, for example at site 106 or at generator 108.
  • Unsolicited or solicited offer of a product which is the same as shown or is related to that shown on the WWW page.
  • the product is offered from the same vendor, in others from another vendor, for example under competing terms.
  • the contents of a competing site 106' for that product are provided to the viewer using audio.
  • a response from the user may be received using a speech input, which may be processed, for example at viewer 102, at generator 108 and/or at a separate speech recognition server.
  • a speech input may be processed, for example at viewer 102, at generator 108 and/or at a separate speech recognition server.
  • speech input may be used for other exemplary operational modes as well.
  • DTMF input may be used.
  • speech generator 108 completes a transaction with the viewer, for example regarding to the contents of the currently displayed page or not.
  • Animation can be added as a stand alone element or it may be associated with speech output or replace it, for example being in sign language.
  • the animation is that of a stick figure emphasizing the text or speech output.
  • the animation is that of a face, synchronized to the audio sounds.
  • Help is in response to a user request, for example, a user clicking on an item or an image of a product.
  • a software component at viewer 102 detects that a user clicked on a word and forwards this word (or image) to generator 108, for generating a "help" message.
  • Fig. 3A is an exemplary WWW page 300 to be read out in accordance with an exemplary embodiment of the invention.
  • Page 300 which is similar in structure to some news pages, includes readable and non-readable elements, elements at different levels of interest and elements having different levels of relevance to the page.
  • page 300 can include an article 306, having an image 308, a headline 310 and text paragraphs 312.
  • page 300 includes a link list 304 (e.g., a single item comprising multiple display and/or HTML elements), an auto-install control 302, other controls 314 and 316, an advertisement 326, a plurality of subject headlines (for other articles) 318 and lists of headlines 320.
  • page 300 may include a secondary article, for example including an image 322 and a headline 324.
  • page 300 is a multi-article page
  • many WWW pages include only a single article, which includes, for example, one or more images, titles and subtitles, text paragraphs and controls, usually at the start and/or end of the page.
  • Fig. 3B is a flowchart of a process 330 of processing page 300, in accordance with an exemplary embodiment of the invention.
  • the "nature" of the site is optionally recognized. For example, different variations and/or processing steps are performed on different types of WWW pages. Examples of WWW page types, include: News, portals, e-commerce, etc.
  • the page type is recognized by comparing the page address against a catalog of site and/or page types. Alternatively or additionally, the page address or site title may include keywords that identify the site type (e.g., checkout pages).
  • non-readable portions such as HTML commands (or other language commands, for other page description languages) images, text input boxes, pull-down lists and controls are removed. Optionally, some such portions may be retrained, for example to allow a user to select them using a special menu.
  • a text tag portion may be retained, for example an image title, so the user can be aware of what is missing from the page.
  • an image or other non-readable item may be requested by a user to be forwarded to him, for example, by e-mail, to a cellular telephone display and/or a fax.
  • the readable parts are categorized.
  • the categorization is based on the type of read-out to be applied. Alternatively or additionally, the categorization is based on the hierarchical order of read-out. Exemplary categories include: headline, banner, main menu bar, links list, mail address, articles, sub- articles and tables.
  • a "type name" tag is added to each readable part, in HTML code or other parsed stream.
  • the names used are based on the identification of the site nature. Alternatively, standard nomenclature may be used.
  • the display elements may be categorized using various methods. For example, some WWW sites include tags, such as "headline”, “menu” or "link". Although different tags may be used by different sites, a plurality of tags can be combined in a single category (or group, below). Alternatively or additionally, regular expressions or other rules may be used. In another example, a set of contiguous links is identified as a link list. If the text associated with the link is a multiword phrase, it is assumed the links are headlines. In another example, a sequence of paragraphs of same size font, possibly with headlines in another font, is recognized as an article. A multitude of text parsing engines are known in the art, for which a skilled practitioner may define recognition and categorization rules.
  • rule sand expressions may be in addition to or instead of the use of templates.
  • a frames-like approach (as once used in Al) is used to assist in recognizing elements in a page of a certain type.
  • the readable parts are optionally organized, in an order that they will be presented in the menu.
  • the order may be a property of the site type. Alternatively or additionally, the order may be determined based on the number of each element of each type. Alternatively or additionally, the order may be at least partly random. Alternatively or additionally, the order may be based on a perceived relative importance of different items. Perceived importance may be determined, for example, based on selection (for readout) statistics (e.g., order, frequency) of this or other users.
  • each menu cannot have too many options.
  • several readable parts are grouped together (340), so that the number of options will not be too great, for example, over 5, 6, 7 or 8 readable groups.
  • very short menus are undesirable, as they increase the total number of menus. So items with short menus are grouped together too.
  • the titles of the menus and/or menu elements are generated in real-time, to match the grouping of categories.
  • the categories are selected so that they can be naturally combined into single menu elements in various manners.
  • the final menu set can thus be, for example, a function of the number of elements in each categories, their relative perceived importance and the particular categories available on the page.
  • the determined names and or statistics of the site are stored in a database (342) for use in a next time the page is read out.
  • a manual setup step for the page is triggered (344), for example based on the number of request for the page and/or based on complaints.
  • page 300 is divided into the following groups: “menus”, “advertisement”, “main headline”, “links list”, “subject headlines #1”, “subject headlines #2”, and “secondary article”.
  • the advertisement is read out, without prompting the user.
  • an audio advertisement e.g., wav file
  • Fig. 3C is a flowchart 350 of an exemplary process of reading out an arbitrary page
  • a site or page is chosen.
  • a user sets up a limited number of favorite sites.
  • the site is selected form a hierarchical list provided by the system.
  • the user enters the site address or a keyword by voice input.
  • the user uses the telephone keys to enter the site address and/or keywords.
  • the address may be ambiguous, however, such ambiguity may be settled, for example, by comparing the entry against a catalog of favorite and/or common sites.
  • advertisements on the site are played.
  • the system requests an audio clip to replace the text and/or image, from the advertisement provider.
  • text to speech methods are used.
  • the page is analyzed for readable and non-readable parts this may take place, for example, between 352 and 354.
  • the order of readout and/or other readout properties can be a user-associated preference.
  • different preferences are associated with different pages, even for a same user.
  • a short menu of options is read out to a user. Responsive to the list, the user may dig dipper into the hierarchical structure of the site (e.g., alternative pages, sub articles). Alternatively or additionally, the system may read out an article or part of an article (358), before returning to the options list.
  • the listing may change to reflect the fact that some articles have been read, for example, by putting them last in the list and/or using a different bong sound before the read and unread articles.
  • the page includes tags indicating for which articles and/or other readable or non-readable page elements there is a previously prepared audio equivalent, at the WWW site sever and/or at a different location.
  • tone keys are used to navigate the option lists of 356.
  • a user can activate the keys (or use a voice command) during a read out, for example, to bookmark, to stop, to fast forward, rewind, to receive help, to follow a link, to activate a preset utility, to go down a level in hierarchy or to go up a level in hierarchy.
  • the keys for these and/or other actions may be preset and/or read out to the user, as one of the options.
  • a key is active for an item while it is being read and for a short time after, possible even after a next item is being read.
  • Table la shows a process of site analysis (generally corresponding to Fig. 3B), in accordance with an exemplary embodiment of the invention.
  • Table lb shows the application of this method to a particular CNN main page.
  • Table Ila shows the steps in an exemplary process of reading out a page in accordance with an exemplary embodiment of the invention.
  • the system will read the banner to the user (either by reading the text in the banner or by playing the clip or other audio file of the banner.
  • the system will offer the user the articles to hear: "Press 1 for main article, press 2 for sub articles, press 0 to return "
  • some articles may be available only to members, which may require a payment authorization act or a log-in act. Alternatively, such acts may be implicit.
  • the system may warn the user of the cost of reading out an article. Possibly, the system detects one or more price-quotes on the WWW page and reads them out, for example as part of the menu.
  • Narious databases for example, have a standard record structure that includes a title, a link and a price quote. Such a structure may be used to drive parsing that detects the quote.
  • Fig. 4 is a block diagram of a cell-phone configuration 400, in accordance with an exemplary embodiment of the invention.
  • Information from a source site 402 is transmitted, for example over the Internet or via a dedicated line to a cellular operator 401.
  • the content is converted at operator 401, at source site 402 or intermediate between them, using a converter 404, which converts the format and/or level of details form a format suitable for personal computers to a format suitable for cellular telephones. This conversion may be in real-time or it may be off-line.
  • a text to speech converter and/or annotator 406 preferably converts parts of the converted content to speech or adds a layer of audio annotations.
  • the annotations are designed to compensate for content removed or made less desirable by converter 404.
  • the converted and annotated content is then transmitted to a cellular telephone 408, using methods known in the art.
  • converter 404 and converter 406 are combined, for example, to convert an HTML page into a hybrid image and audio content.
  • the cellular telephone may serve as a browsing terminal in a configuration as shown in Fig. 1, possibly with no special allowance being made for cellular conversion, if any.
  • the cellular conversion may be performed after the audio annotations are added.
  • FIG. 5 is a more detailed schematic block diagram 500 of a system topology, in accordance with an exemplary embodiment of the invention.
  • a source 502 comprises, for example, one or more of a public web service 510, a hosted web service 512 and a corporate Intranet or Extranet.
  • the data from source 502 is provided to a gateway server 504, optionally through a proxy 516.
  • Gateway 504 may utilize, for example, multiple language/voice generation and/or translation engines 506.
  • An optional language ID engine 522 maybe used to determine the language of the site, for example using methods known in the art, such as word recognition, character sets, language tags, letter frequency, page title and a language previously associated with the page address.
  • a data collection server 520 may be optionally provided for tracking usage of the system and/or for billing.
  • a telephone system 508, including a base station 526, a telephone company operating system 528 and a network 530, may be used as a user input and output device.
  • an Interactive Voice Response system 524 is used for receiving user input commands, by gateway server 504.
  • server 504 includes an application backbone and framework, to which are attached various software and/or hardware modules, for example, a telephony module, a network resource management module, a customization database module, a billing database module, e-mail and Intranet servers, ASR (automatic speech recognition) and TTS (text to speech) modules, an optimization engine (e.g., for aggregating page elements into menus), a web engine, a language server, an interactive ad server and/or content proxy servers.
  • various software and/or hardware modules for example, a telephony module, a network resource management module, a customization database module, a billing database module, e-mail and Intranet servers, ASR (automatic speech recognition) and TTS (text to speech) modules, an optimization engine (e.g., for aggregating page elements into menus), a web engine, a language server, an interactive ad server and/or content proxy servers.

Abstract

A method of analyzing a WWW site for readout, in which audible items are grouped (340) so as to generate voice menus (342) having desirable property. Also disclosed is a method of audio browsing in which additional related audio data is sounded to a user.

Description

SPEECH GENERATING SYSTEM AND METHOD FIELD OF THE INVENTION
The present invention is relate to the field of text to speech conversion. BACKGROUND OF THE INVENTION
Conversion of text to speech is known, also for the reading out, e.g., for visually impaired, of WWW pages. Many WWW pages include audio segments, for download or even for immediate playback.
The use of advertising on the Internet, also as part of WWW pages is notorious. A particular subject being studied by many operators in the field of Internet advertising is the targeting of advertisements. Advertisement targeting and personalization of WWW interactions generally require a user profile which is typically maintained at the behest of the site owner or the viewer. Such a profile may also be maintained by a third party advertiser that targets advertisements to a viewer through the WWW page or via other means.
PCT publication WO 98/44424, the disclosure of which is incorporated herein by reference describes an automatic converter that modifies HTML pages by replacing text output objects by audio objects or by text objects containing translated subject matter.
PCT publication WO 00/07372, the disclosure of which is incorporated herein by reference describes an automatic annotator that can add advertisements as an audio annotation on a stream that may include Internet information or can add ASL (American sign language) gestures to an audio stream, based on an automated voice recognition of the audio channel contents.
SUMMARY OF THE INVENTION
An aspect of some embodiments of the invention relates to reading out and annotating WWW pages using computer generated speech. In an exemplary embodiment of the invention, a part of a WWW page is translated to a viewer's preferred language before being read out. Alternatively or additionally, animation associated with the speech is added, for example an image of a stick figure waving appendages or lip movements for a face figure. The animation may be determined automatically or it may be added manually to the page or an associated database. Optionally, an animation template database associates particular animation or an animation layout with a WWW page layout. The WWW page layout may be determined, for example, by analyzing the WWW page and/or by the WWW page being created using an authoring system that uses standard layouts. Optionally, information other than or additional to animation is associated with a page template. Optionally, the read out subject matter comprises information missing from the WWW page, for example supplied by other means or information that is removed from the WWW page during conversion to a different screen format (e.g., a cellular telephone screen).
Optionally, the added animation is added to support the understanding of a converted WWW page.
An aspect of some embodiments of the invention relates to the generation of sound effects in a speech rendering of a WWW page to indicate objects of interest and/or links. In an exemplary embodiment of the invention, a "bong" sound is generate prior to presenting a link, an image of interest, a matched search word and/or other objects of interest. An aspect of some embodiments of the invention relates to a method of combining speech and sounds. In an exemplary embodiment of the invention, a sound effects set (possibly including speech) is mixed with an output of a text-to-speech generator, after the speech is generated, and before sounds are provided to a sound card. Alternatively or additionally, a stereo signal is provided as an output. An aspect of some embodiments of the invention relates to an audio-based automated salesman that is not associated with a particular site being viewed. In an exemplary embodiment of the invention, when a user enters a site, the salesmen can suggest various products from the site, or from competing sites, to the viewer. In an exemplary embodiment of the invention, the salesman is not an agent of the viewer or of the site, but may receive a commission from the site and/or the viewer. Alternatively or additionally, the salesman converses in the native language of the viewer, optionally providing a transaction and/or explanation of the products on the site. A viewer may interact with the salesman using audio and/or visual tools and may, in some embodiments of the invention, conclude a sale via the salesman. An aspect of some embodiments of the invention relates to a method of text presentation, in which a text to speech conversion system retrieves a previously prepared audio file, in replacement for a text segment and/or a non-readable element. In an exemplary embodiment of the invention, the element is an advertisement.
An aspect of some embodiments of the invention relates to a method of analyzing a WWW page for read out. In an exemplary embodiment of the invention, a target page is divided into readable elements, each of which can be selected to be read out and a plurality of unreadable elements that can be removed. In an exemplary embodiment of the invention, the readable elements are automatically categorized, for example "menu", link list" and "article headline". Optionally, the categorized elements are grouped into groups. In an exemplary embodiment of the invention, the groups are selected, for example pre-selected or ad-hoc, so that a resulting voice menu and/or voice menu structure used to read out the page has desired properties, for example, a minimum or maximum number of elements. In an exemplary embodiment of the invention, associations between the categories are predefined, to assist grouping of the categorized using a logical scheme. Thus, a relative level of association between two categories can determine, for example, whether the two categories will be merged for a particular page, or whether other two categories will be merged. Alternatively or additionally, the number of elements in a category may determine if it is to be merged (e.g., low numbers are merged, so that voice menus are not wasted). In some embodiments of the invention, the categorization of an item may be changed, for example if more than one categorization fits an item (or group of items), and the different categorization affords a more desirable menu structure.
There is thus provided in accordance with an exemplary embodiment of the invention, a method of analyzing a WWW site for readout, comprising: parsing the site to identify items for which to generate an audible indication; categorizing the identified items by category; grouping the categories; and generating at least one voice menu based on said grouping, wherein said grouping comprises grouping so that at least some of the generated menus have a desirable property. Optionally, said desirable property comprises a minimum number of elements in a menu. Alternatively or additionally, said desirable property comprises a maximum number of elements in a menu.
In an exemplary embodiment of the invention, grouping comprises grouping based on pre-defined associations of categories. Alternatively or additionally, grouping comprises ordering said categories for presentation.
In an exemplary embodiment of the invention, the generated menus include a main menu and sub menus. Optionally, said main menu is shorter than 10 items. Optionally, said main menu is shorter than 7 items. Optionally, said main menu is shorter than 5 items.
Optionally, generating voice menus comprises merging the items in at least two categories into a single category.
In an exemplary embodiment of the invention, grouping comprises changing the categorization of an item to achieve the desired property.
There is also provided in accordance with an exemplary embodiment of the invention, a method of audio browsing of data that includes text data, comprising: selecting from a remote database, by a user, data including text data to be provided in an audio manner; automatically providing to said user, audio corresponding to said selected data; determining at least an indication of a content of said selected data; and automatically providing to said user, data in audio manner and relating to said determined indication. Optionally, selecting comprises selecting data by selecting a page. Alternatively or additionally, selecting comprises selecting data by selecting a WWW site. Alternatively or additionally, selecting comprises selecting data from a menu. Alternatively or additionally, selecting comprises selecting using a telephone handset with no visual display assistance. Alternatively or additionally, selecting comprises selecting using a telephone handset with a limited display incapable of satisfactory displaying of the data in a visual manner. Alternatively or additionally, selecting comprises selecting using a cellular telephone. In an exemplary embodiment of the invention, said data comprises a text segment. Alternatively or additionally, said data comprises an article. Alternatively or additionally, said data comprises an audio clip.
In an exemplary embodiment of the invention, said corresponding audio comprises a text to speech rendition of said text. Alternatively or additionally, said corresponding audio comprises a translation of said text. Alternatively or additionally, said corresponding audio comprises a recording of a human reading of said text. In an exemplary embodiment of the invention, determining at least an indication comprises matching a keyword of said data. Alternatively or additionally, determining at least an indication comprises identifying a source of said data. Alternatively or additionally, determining at least an indication comprises matching said data to a template.
In an exemplary embodiment of the invention, said relating data comprises a help message. Alternatively or additionally, said relating data comprises an unsolicited sales offer. Alternatively or additionally, said relating data comprises a comparison with data from a different source. Alternatively or additionally, said relating data comprises an unsolicited comment. Alternatively or additionally, said relating data comprises an advertisement. Alternatively or additionally, said relating data comprises audio of an interactive sales program.
In an exemplary embodiment of the invention, said relating data is provided locally to said user. Alternatively or additionally, said relating data is provided to compensate for lack of visual display quality. Alternatively or additionally, said relating data is provided to compensate for data which is not presented and not selected by the user for audio presentation. In an exemplary embodiment of the invention, said relating data is provided in a language native to said user and other from a language of said data.
In an exemplary embodiment of the invention, said relating data is personalized to match at least one attribute of said user. In an exemplary embodiment of the invention, said related data is sounded after said corresponding audio is sounded.
In an exemplary embodiment of the invention, said related data is requested by said user.
BRIEF DESCRIPTION OF THE FIGURES Fig. 1 is a schematic diagram of a configuration including an Internet speech generator, in accordance with a n exemplary embodiment of the invention;
Fig. 2 is a schematic block diagram of a speech and sound mixing system, in accordance with an exemplary embodiment of the invention;
Fig. 3A is an exemplary WWW page to be read out in accordance with an exemplary embodiment of the invention;
Fig. 3B is a flowchart of a process of processing the page of Fig. 3A, in accordance with an exemplary embodiment of the invention;
Fig. 3C is a flowchart of a process of reading out the page of Fig. 3A, in accordance with an exemplary embodiment of the invention; Fig. 4 is a block diagram of a cell-phone configuration, in accordance with an exemplary embodiment of the invention; and
Fig. 5 is a schematic block diagram of a system topology, in accordance with an exemplary embodiment of the invention.
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS Fig. 1 is a schematic diagram of a configuration 100 including an Internet speech generator 108, in accordance with an exemplary embodiment of the invention. Configuration 100 includes a viewer that browses a target site 106 via an Internet 104, for example using a browser executing on a general purpose computer, as known in the art or using other display tools. Speech generator 108, generates speech and/or animation annotations for viewer 102. As shown, speech generator 108 is on a separate computer from both viewer 102 and site 106. Alternatively, some or all of the functionality of generator 108 may be located at viewer 102 and/or target site 106 and/or distributed between several computers, connected for example via Internet 104. In some embodiments of the invention speech generator 108 is provided on a LAN that interconnects several viewers 102 and/or at an ISP or on a proxy server, for example one that serves a plurality of people with a same language need.
The speech generated by speech generator 108 may be transmitted over the network
(LAN or Internet) as audio (e.g., using standard methods) or as codes (e.g., syllables, phonemic codes, optionally with phonic concatenation hints) for a program (e.g., a Java applet) on viewer 102 to convert into audio. Alternatively, the speech generation is performed on the computer of viewer 102.
In an exemplary embodiment of the invention, speech generator 108 comprises a page analyzer 110 that analyses the WWW page information at site 106. Exemplary types of analysis include selecting which text to convert to speech and link and advertisement detection. Speech annotation or conversion is performed by a speech annotator/converter 112. In an exemplary embodiment of the invention, this converter comprises a standard text-to- speech converter software module or unit. Alternatively or additionally, speech generator 108 comprises an animation annotator 114, for example for adding animation annotations to the WWW page or to the speech generated for the page. Alternatively or additionally, other multimedia elements may be added, for example, audio clips.
An optional database 116 maybe provided, for example for storing page templates (described below), for storing associations of animation with speech, for storing sound clips (e.g., to replace text and/or advertisements) and/or for storing help messages (described below).
Speech generator 108 may interact with site 106 in various ways, including, for example, site 106 may retrieve the annotations from generator 108, for incorporation in its output; site 106 may be retrieved by viewer 102 via generator 108 or a separate server (not shown) that annotates the contents of site 106; or viewer 102 may retrieve site 106 and annotations from generator 108 in parallel, and add the annotations at viewer 102.
Fig. 2 is a schematic block diagram of a speech and sound mixing system 200, in accordance with an exemplary embodiment of the invention. HTML data (202) is parsed to yield plain text (204). A text to speech generator 206 converts the plain text into audio signals, to be outputted as sound-waves (208) at viewer 102. HTML is a marked-up text file. In an exemplary embodiment of the invention, some of the mark ups are used to modify the audio output at viewer 102. In one exemplary embodiment of the invention, the mark-ups are passed to text-to-speech generator 206. However, this may require a dedicated generator software and/or a special preprocessor for converting the text mark-ups into command parameters of generator 206. In an exemplary embodiment of the invention, the mark-ups are converted into audio effects and/or speech using a separate channel, which is mixed using a mixer 210 to form part of sounds 208. In an exemplary embodiment of the invention, links (212) cause a wave generator 214 to generate a "bong" sound preceding the recitations of the link. Alternatively or additionally, wave generator 214 reads out the link using a phonetic readout, rather than as English. Alternatively or additionally, the mark-up path is used to control various parameters of mixer 210, for example volume, speed, volume and voice type (e.g., women or child).
In an exemplary embodiment of the invention, the generation of speech or other sound effects is automatic when the page is retrieved and/or displayed, for example immediate, or after a delay. Alternatively or additionally, the audio may be generated and/or presented when viewer 102 interacts with a display. In one example, links or active buttons are substituted for text with which audio is associated. Alternatively or additionally, the viewer's browser can detect the interact with such areas. In another example, when a user clicks on a page portion that has associated audio, the audio is played. In an exemplary embodiment of the invention, the pages are based on a template of the page structure. Although over 2 billion WWW pages are extant at present, many of the pages match one of a small number of formats. Typically, this is because there are a small number of accepted WWW page formats. Alternatively or additionally, many pages are generated using standardized tools, that include formats. In an exemplary embodiment of the invention, templates for these pages, associate one or more of the following with the page format, for automatic generation of text (e.g., for annotations), speech and/or animation:
(a) order of reading;
(b) identification of important vs. less important material;
(c) locations suitable for showing animation; and (d) advertising.
Alternatively or additionally, the WWW page is analyzed, for example using methods known in the art, for example to detect links (e.g., based on HTML tags), headlines (e.g., based on relative or absolute font size).
In an exemplary embodiment of the invention, a user interface is provided for generator 108, for example to allow a user to set one or more of the following parameters:
(a) speech characteristics, such as speed, volume and voice type;
(b) type and existence of animation; and
(c) existence and/or parameters of language translation. In an exemplary embodiment of the invention, the user interface is via a WWW page. Speech generator 108 may operate in various operational modes, including, for example, one or more of the following modes:
(a) Unsolicited explanations. Such explanations may be provided, for example, based on a user profile that can indicate a viewer's past queries. Such a user profile and/or explanations may be stored locally to viewer 102 or remotely, for example at site 106 or at generator 108.
(b) Unsolicited or solicited offer of a product which is the same as shown or is related to that shown on the WWW page. In some cases, the product is offered from the same vendor, in others from another vendor, for example under competing terms. Optionally, the contents of a competing site 106' for that product are provided to the viewer using audio.
In an exemplary embodiment of the invention, a response from the user may be received using a speech input, which may be processed, for example at viewer 102, at generator 108 and/or at a separate speech recognition server. Such speech input may be used for other exemplary operational modes as well. Alternatively or additionally, DTMF input may be used.
(c) Translation and recitations of various parts of site 106, for example automatically, or on selection of a text portion or other display object by a user.
(d) Perform transaction. In an exemplary embodiment of the invention, speech generator 108 completes a transaction with the viewer, for example regarding to the contents of the currently displayed page or not.
(e) Ask viewer 102 questions, for example, what is it that the viewer likes about the displayed page and/or product.
(f) Add animation. Animation can be added as a stand alone element or it may be associated with speech output or replace it, for example being in sign language. In an exemplary embodiment of the invention, the animation is that of a stick figure emphasizing the text or speech output. Alternatively or additionally, the animation is that of a face, synchronized to the audio sounds.
(g) Help. Help, as opposed to unsolicited explanations, is in response to a user request, for example, a user clicking on an item or an image of a product. In some embodiments of the invention, a software component at viewer 102 detects that a user clicked on a word and forwards this word (or image) to generator 108, for generating a "help" message.
Fig. 3A is an exemplary WWW page 300 to be read out in accordance with an exemplary embodiment of the invention. Page 300, which is similar in structure to some news pages, includes readable and non-readable elements, elements at different levels of interest and elements having different levels of relevance to the page. For example, page 300 can include an article 306, having an image 308, a headline 310 and text paragraphs 312. In addition, page 300 includes a link list 304 (e.g., a single item comprising multiple display and/or HTML elements), an auto-install control 302, other controls 314 and 316, an advertisement 326, a plurality of subject headlines (for other articles) 318 and lists of headlines 320. In addition, page 300 may include a secondary article, for example including an image 322 and a headline 324.
Although page 300 is a multi-article page, many WWW pages include only a single article, which includes, for example, one or more images, titles and subtitles, text paragraphs and controls, usually at the start and/or end of the page.
Fig. 3B is a flowchart of a process 330 of processing page 300, in accordance with an exemplary embodiment of the invention.
At 332, the "nature" of the site is optionally recognized. For example, different variations and/or processing steps are performed on different types of WWW pages. Examples of WWW page types, include: News, portals, e-commerce, etc. In an exemplary embodiment of the invention, the page type is recognized by comparing the page address against a catalog of site and/or page types. Alternatively or additionally, the page address or site title may include keywords that identify the site type (e.g., checkout pages). At 334, non-readable portions, such as HTML commands (or other language commands, for other page description languages) images, text input boxes, pull-down lists and controls are removed. Optionally, some such portions may be retrained, for example to allow a user to select them using a special menu. Alternatively or additionally, a text tag portion may be retained, for example an image title, so the user can be aware of what is missing from the page. In an exemplary embodiment of the invention, an image or other non-readable item may be requested by a user to be forwarded to him, for example, by e-mail, to a cellular telephone display and/or a fax.
At 336, the readable parts are categorized. In an exemplary embodiment of the invention, the categorization is based on the type of read-out to be applied. Alternatively or additionally, the categorization is based on the hierarchical order of read-out. Exemplary categories include: headline, banner, main menu bar, links list, mail address, articles, sub- articles and tables. In an exemplary embodiment of the invention, a "type name" tag is added to each readable part, in HTML code or other parsed stream. Optionally, the names used are based on the identification of the site nature. Alternatively, standard nomenclature may be used.
The display elements may be categorized using various methods. For example, some WWW sites include tags, such as "headline", "menu" or "link". Although different tags may be used by different sites, a plurality of tags can be combined in a single category (or group, below). Alternatively or additionally, regular expressions or other rules may be used. In another example, a set of contiguous links is identified as a link list. If the text associated with the link is a multiword phrase, it is assumed the links are headlines. In another example, a sequence of paragraphs of same size font, possibly with headlines in another font, is recognized as an article. A multitude of text parsing engines are known in the art, for which a skilled practitioner may define recognition and categorization rules. The use of rule sand expressions may be in addition to or instead of the use of templates. In an exemplary embodiment of the invention, a frames-like approach (as once used in Al) is used to assist in recognizing elements in a page of a certain type. At 338, the readable parts are optionally organized, in an order that they will be presented in the menu. The order may be a property of the site type. Alternatively or additionally, the order may be determined based on the number of each element of each type. Alternatively or additionally, the order may be at least partly random. Alternatively or additionally, the order may be based on a perceived relative importance of different items. Perceived importance may be determined, for example, based on selection (for readout) statistics (e.g., order, frequency) of this or other users.
In an exemplary embodiment of the invention, the user operates the system using a telephone. Thus, each menu cannot have too many options. In an exemplary embodiment of the invention, several readable parts are grouped together (340), so that the number of options will not be too great, for example, over 5, 6, 7 or 8 readable groups. Alternatively or additionally, very short menus are undesirable, as they increase the total number of menus. So items with short menus are grouped together too.
Possibly, the titles of the menus and/or menu elements are generated in real-time, to match the grouping of categories. In an exemplary embodiment of the invention, the categories are selected so that they can be naturally combined into single menu elements in various manners. Optionally, the desirability of associating two or more particular categories into a single menu. The final menu set can thus be, for example, a function of the number of elements in each categories, their relative perceived importance and the particular categories available on the page. Optionally, the determined names and or statistics of the site are stored in a database (342) for use in a next time the page is read out.
Optionally, a manual setup step for the page is triggered (344), for example based on the number of request for the page and/or based on complaints. In an exemplary embodiment of the invention, page 300 is divided into the following groups: "menus", "advertisement", "main headline", "links list", "subject headlines #1", "subject headlines #2", and "secondary article". In an exemplary embodiment of the invention, the advertisement is read out, without prompting the user. Optionally, an audio advertisement (e.g., wav file) is provided by the advertisement provider instead of the text advertisement. Fig. 3C is a flowchart 350 of an exemplary process of reading out an arbitrary page
300.
At 352, a site or page is chosen. In an exemplary embodiment of the invention, a user sets up a limited number of favorite sites. Alternatively or additionally, the site is selected form a hierarchical list provided by the system. Alternatively or additionally, the user enters the site address or a keyword by voice input. Alternatively or additionally, the user uses the telephone keys to enter the site address and/or keywords. In a tone telephone, the address may be ambiguous, however, such ambiguity may be settled, for example, by comparing the entry against a catalog of favorite and/or common sites.
At 354, advertisements on the site are played. In an exemplary embodiment of the invention, the system requests an audio clip to replace the text and/or image, from the advertisement provider. Alternatively, text to speech methods are used.
In the method of Fig. 3B, the page is analyzed for readable and non-readable parts this may take place, for example, between 352 and 354. In an exemplary embodiment of the invention, the order of readout and/or other readout properties can be a user-associated preference. Optionally, different preferences are associated with different pages, even for a same user.
In an exemplary embodiment of the invention, a short menu of options is read out to a user. Responsive to the list, the user may dig dipper into the hierarchical structure of the site (e.g., alternative pages, sub articles). Alternatively or additionally, the system may read out an article or part of an article (358), before returning to the options list. The listing may change to reflect the fact that some articles have been read, for example, by putting them last in the list and/or using a different bong sound before the read and unread articles.
Optionally, some of the articles may be retrieved as audio files. In an exemplary embodiment of the invention, the page includes tags indicating for which articles and/or other readable or non-readable page elements there is a previously prepared audio equivalent, at the WWW site sever and/or at a different location.
Once reading is completed, the user can exit (360).
In an exemplary embodiment of the invention, tone keys are used to navigate the option lists of 356. Alternatively or additionally, a user can activate the keys (or use a voice command) during a read out, for example, to bookmark, to stop, to fast forward, rewind, to receive help, to follow a link, to activate a preset utility, to go down a level in hierarchy or to go up a level in hierarchy. The keys for these and/or other actions may be preset and/or read out to the user, as one of the options. Optionally, a key is active for an item while it is being read and for a short time after, possible even after a next item is being read.
Following are tables showing examples of system messages and system readout, for reading out a WWW site (CNN in this example), in accordance with an exemplary embodiment of the invention. Table la shows a process of site analysis (generally corresponding to Fig. 3B), in accordance with an exemplary embodiment of the invention. Table lb shows the application of this method to a particular CNN main page.
Figure imgf000014_0001
Figure imgf000015_0001
Figure imgf000016_0001
Figure imgf000017_0001
TABLE lb
Table Ila shows the steps in an exemplary process of reading out a page in accordance with an exemplary embodiment of the invention.
Figure imgf000017_0002
seconds". Banner reading:
3. If there is a banner in the site, the system will read the banner to the user (either by reading the text in the banner or by playing the clip or other audio file of the banner.
4. Before reading / playing the Banner, the system will announce to the user: "The site is processed and will be read in a few seconds"
Article choosing:
5. The system will offer the user the articles to hear: "Press 1 for main article, press 2 for sub articles, press 0 to return "
6. If the user clicks " 1 "- the system reads to him (after the "Text to speech" operation) the main article title and the whole article. After reading the article the system will ask the user for a next action by repeating the previous message.
7. If the user clicks "2" - the system will read the first "sub title" and than will announce "press 1 to hear article, press pound for next sub title, press star for previous article, press 0 to return ".
8. If the article is the first one read, than the "*" option may not be offered. If it is the last one read, than the "#" option may not be offered.
9. If the user requested to hear the sub article, than the system will read the whole article to him. At the end, the system will return and read the previous message: "press 1 to hear article, press pound for next sub title, press star for previous article, press 0 to return "
End of process:
10. If the user asks to return (he clicks on "0") then the system stops and returns to the main menu.
Output Web site is read to the customer
TABLE Ila Table lib shows the reading out of a main page of the CNN site (September 3, 2000, at 22:00 Israel time). As the user can choose various options of parts of the site to hear, and in order to simplify the presentation, several possibilities will be described.
Figure imgf000019_0001
In some embodiments of the invention, some articles may be available only to members, which may require a payment authorization act or a log-in act. Alternatively, such acts may be implicit. The system may warn the user of the cost of reading out an article. Possibly, the system detects one or more price-quotes on the WWW page and reads them out, for example as part of the menu. Narious databases, for example, have a standard record structure that includes a title, a link and a price quote. Such a structure may be used to drive parsing that detects the quote.
Fig. 4 is a block diagram of a cell-phone configuration 400, in accordance with an exemplary embodiment of the invention. Information from a source site 402 is transmitted, for example over the Internet or via a dedicated line to a cellular operator 401. The content is converted at operator 401, at source site 402 or intermediate between them, using a converter 404, which converts the format and/or level of details form a format suitable for personal computers to a format suitable for cellular telephones. This conversion may be in real-time or it may be off-line.
A text to speech converter and/or annotator 406 preferably converts parts of the converted content to speech or adds a layer of audio annotations. In an exemplary embodiment of the invention, the annotations are designed to compensate for content removed or made less desirable by converter 404. The converted and annotated content is then transmitted to a cellular telephone 408, using methods known in the art. Alternatively, converter 404 and converter 406 are combined, for example, to convert an HTML page into a hybrid image and audio content. Alternatively, the cellular telephone may serve as a browsing terminal in a configuration as shown in Fig. 1, possibly with no special allowance being made for cellular conversion, if any. For example, the cellular conversion may be performed after the audio annotations are added.
Fig. 5 is a more detailed schematic block diagram 500 of a system topology, in accordance with an exemplary embodiment of the invention. A source 502 comprises, for example, one or more of a public web service 510, a hosted web service 512 and a corporate Intranet or Extranet.
The data from source 502 is provided to a gateway server 504, optionally through a proxy 516. Gateway 504 may utilize, for example, multiple language/voice generation and/or translation engines 506. An optional language ID engine 522 maybe used to determine the language of the site, for example using methods known in the art, such as word recognition, character sets, language tags, letter frequency, page title and a language previously associated with the page address. A data collection server 520 may be optionally provided for tracking usage of the system and/or for billing. A telephone system 508, including a base station 526, a telephone company operating system 528 and a network 530, may be used as a user input and output device. In an exemplary embodiment of the invention, an Interactive Voice Response system 524 is used for receiving user input commands, by gateway server 504.
In an exemplary embodiment of the invention, server 504 includes an application backbone and framework, to which are attached various software and/or hardware modules, for example, a telephony module, a network resource management module, a customization database module, a billing database module, e-mail and Intranet servers, ASR (automatic speech recognition) and TTS (text to speech) modules, an optimization engine (e.g., for aggregating page elements into menus), a web engine, a language server, an interactive ad server and/or content proxy servers. It will be appreciated that the above described methods of web site annotation and readout may be varied in many ways, including, changing the order of steps, which steps are performed on-line or off-line, such as table or index preparation, and the exact implementation used, which can include various hardware and software combinations. In addition, a multiplicity of various features has been described. It should be appreciated that different features may be combined in different ways. In particular, not all the features are necessary in every exemplary embodiment of the invention. Software as described herein is preferably provided on a computer readable media, such as a diskette or an optical disk. Alternatively or additionally, it may be stored on a computer, for example in a main memory or on a hard disk, both of which are also computer readable media. Where methods have been described, also computer hardware programmed to perform the methods is within the scope of the description. When used in the following claims, the terms "comprises", "includes", "have" and their conjugates mean "including but not limited to".
It will be appreciated by a person skilled in the art that the present invention is not limited by what has thus far been described. Rather, the scope of the present invention is limited only by the following claims.

Claims

1. A method of analyzing a WWW site for readout, comprising: parsing the site to identify items for which to generate an audible indication; categorizing the identified items by category; grouping the categories; and generating at least one voice menu based on said grouping, wherein said grouping comprises grouping so that at least some of the at least one generated menu has a desirable property.
2. A method according to claim 1, wherein said desirable property comprises a minimum number of elements in a menu.
3. A method according to claim 1, wherein said desirable property comprises a maximum number of elements in a menu.
4. A method according to claim 1, wherein grouping comprises grouping based on predefined associations of categories.
5. A method according to claim 1, wherein grouping comprises ordering said categories for presentation.
6. A method according to claim 1, wherein said at least one menu comprises a main menu and sub menus.
7. A method according to claim 6, wherein said main menu is shorter than 10 items.
8. A method according to claim 6, wherein said main menu is shorter than 7 items.
9. A method according to claim 6, wherein said main menu is shorter than 5 items.
10. A method according to claim 1, wherein generating at least one voice menu comprises merging the items in at least two categories into a single category.
11. A method according to claim 1, wherein grouping comprises changing the categorization of an item to achieve the desired property.
12. A method of audio browsing of data that includes text data, comprising: selecting from a remote database, by a user, data including text data to be provided in an audio manner; automatically providing to said user, audio corresponding to said selected data; determining at least an indication of a content of said selected data; and automatically providing to said user, data in audio manner and relating to said determined indication.
13. A method according to claim 12, wherein selecting comprises selecting data by selecting a page.
14. A method according to claim 12, wherein selecting comprises selecting data by selecting a WWW site.
15. A method according to claim 12, wherein selecting comprises selecting data from a menu.
16. A method according to claim 12, wherein selecting comprises selecting using a telephone handset with no visual display assistance.
17. A method according to claim 12, wherein selecting comprises selecting using a telephone handset with a limited display incapable of satisfactory displaying of the data in a visual manner.
18. A method according to claim 12, wherein selecting comprises selecting using a cellular telephone.
19. A method according to claim 12, wherein said data comprises a text segment.
20. A method according to claim 12, wherein said data comprises an article.
21. A method according to claim 12, wherein said data comprises an audio clip.
22. A method according to claim 12, wherein said corresponding audio comprises a text to speech rendition of said text.
23. A method according to claim 12, wherein said corresponding audio comprises a translation of said text.
24. A method according to claim 12, wherein said corresponding audio comprises a recording of a human reading of said text.
25. A method according to claim 12, wherein determining at least an indication comprises matching a keyword of said data.
26. A method according to claim 12, wherein determining at least an indication comprises identifying a source of said data.
27. A method according to claim 12, wherein determining at least an indication comprises matching said data to a template.
28. A method according to claim 12, wherein said relating data comprises an advertisement.
29. A method according to claim 12, wherein said relating data comprises a help message.
30. A method according to claim 12, wherein said relating data comprises an unsolicited sales offer.
31. A method according to claim 12, wherein said relating data comprises a comparison with data from a different source.
32. A method according to claim 12, wherein said relating data comprises an unsolicited comment.
33. A method according to claim 12, wherein said relating data comprises audio of an interactive sales program.
34. A method according to claim 12, wherein said relating data is provided locally to said user.
35. A method according to claim 12, wherein said relating data is provided to compensate for lack of visual display quality.
36. A method according to claim 12, wherein said relating data is provided to compensate for data which is not presented and not selected by the user for audio presentation.
37. A method according to claim 12, wherein said relating data is provided in a language native to said user and other from a language of said data.
38. A method according to claim 12, wherein said relating data is personalized to match at least one attribute of said user.
39. A method according to claim 12, wherein said related data is sounded after said corresponding audio is sounded.
40. A method according to claim 12, wherein said related data is requested by said user.
PCT/IL2001/001009 2000-10-30 2001-10-30 Speech generating system and method WO2002037469A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002214227A AU2002214227A1 (en) 2000-10-30 2001-10-30 Speech generating system and method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IL139347 2000-10-30
IL13934700A IL139347A0 (en) 2000-10-30 2000-10-30 Speech generating system and method

Publications (2)

Publication Number Publication Date
WO2002037469A2 true WO2002037469A2 (en) 2002-05-10
WO2002037469A3 WO2002037469A3 (en) 2002-08-29

Family

ID=11074769

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2001/001009 WO2002037469A2 (en) 2000-10-30 2001-10-30 Speech generating system and method

Country Status (3)

Country Link
AU (1) AU2002214227A1 (en)
IL (1) IL139347A0 (en)
WO (1) WO2002037469A2 (en)

Cited By (117)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2856867A1 (en) * 2003-06-25 2004-12-31 France Telecom Time script generating system for multimedia documents, has mark generator generating starting and ending time markers for each textual part according to estimated voice synthesis period to report markers in time script
US8352268B2 (en) 2008-09-29 2013-01-08 Apple Inc. Systems and methods for selective rate of speech and speech preferences for text to speech synthesis
US8380507B2 (en) 2009-03-09 2013-02-19 Apple Inc. Systems and methods for determining the language to use for speech generated by a text to speech engine
US8712776B2 (en) 2008-09-29 2014-04-29 Apple Inc. Systems and methods for selective text to speech synthesis
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
US8996384B2 (en) 2009-10-30 2015-03-31 Vocollect, Inc. Transforming components of a web page to voice prompts
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US10607141B2 (en) 2010-01-25 2020-03-31 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5333237A (en) * 1989-10-10 1994-07-26 Hughes Aircraft Company Hypermedia structured knowledge base system
US5463713A (en) * 1991-05-07 1995-10-31 Kabushiki Kaisha Meidensha Synthesis of speech from text
US5884262A (en) * 1996-03-28 1999-03-16 Bell Atlantic Network Services, Inc. Computer network audio access and conversion system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5333237A (en) * 1989-10-10 1994-07-26 Hughes Aircraft Company Hypermedia structured knowledge base system
US5463713A (en) * 1991-05-07 1995-10-31 Kabushiki Kaisha Meidensha Synthesis of speech from text
US5884262A (en) * 1996-03-28 1999-03-16 Bell Atlantic Network Services, Inc. Computer network audio access and conversion system

Cited By (161)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
FR2856867A1 (en) * 2003-06-25 2004-12-31 France Telecom Time script generating system for multimedia documents, has mark generator generating starting and ending time markers for each textual part according to estimated voice synthesis period to report markers in time script
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US8942986B2 (en) 2006-09-08 2015-01-27 Apple Inc. Determining user intent based on ontologies of domains
US9117447B2 (en) 2006-09-08 2015-08-25 Apple Inc. Using event alert text as input to an automated assistant
US8930191B2 (en) 2006-09-08 2015-01-06 Apple Inc. Paraphrasing of user requests and results by automated digital assistant
US10381016B2 (en) 2008-01-03 2019-08-13 Apple Inc. Methods and apparatus for altering audio output signals
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US9865248B2 (en) 2008-04-05 2018-01-09 Apple Inc. Intelligent text-to-speech conversion
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US10108612B2 (en) 2008-07-31 2018-10-23 Apple Inc. Mobile device having human language translation capability with positional feedback
US8712776B2 (en) 2008-09-29 2014-04-29 Apple Inc. Systems and methods for selective text to speech synthesis
US8352268B2 (en) 2008-09-29 2013-01-08 Apple Inc. Systems and methods for selective rate of speech and speech preferences for text to speech synthesis
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US8751238B2 (en) 2009-03-09 2014-06-10 Apple Inc. Systems and methods for determining the language to use for speech generated by a text to speech engine
US8380507B2 (en) 2009-03-09 2013-02-19 Apple Inc. Systems and methods for determining the language to use for speech generated by a text to speech engine
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10795541B2 (en) 2009-06-05 2020-10-06 Apple Inc. Intelligent organization of tasks items
US10475446B2 (en) 2009-06-05 2019-11-12 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US11080012B2 (en) 2009-06-05 2021-08-03 Apple Inc. Interface for a virtual digital assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US8996384B2 (en) 2009-10-30 2015-03-31 Vocollect, Inc. Transforming components of a web page to voice prompts
US8903716B2 (en) 2010-01-18 2014-12-02 Apple Inc. Personalized vocabulary for digital assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10706841B2 (en) 2010-01-18 2020-07-07 Apple Inc. Task flow identification based on user intent
US9548050B2 (en) 2010-01-18 2017-01-17 Apple Inc. Intelligent automated assistant
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US10607141B2 (en) 2010-01-25 2020-03-31 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10984327B2 (en) 2010-01-25 2021-04-20 New Valuexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10984326B2 (en) 2010-01-25 2021-04-20 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US11410053B2 (en) 2010-01-25 2022-08-09 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10607140B2 (en) 2010-01-25 2020-03-31 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US10049675B2 (en) 2010-02-25 2018-08-14 Apple Inc. User profiling for voice input processing
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10102359B2 (en) 2011-03-21 2018-10-16 Apple Inc. Device access using voice authentication
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9966060B2 (en) 2013-06-07 2018-05-08 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10657961B2 (en) 2013-06-08 2020-05-19 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US10497365B2 (en) 2014-05-30 2019-12-03 Apple Inc. Multi-command single utterance input method
US10169329B2 (en) 2014-05-30 2019-01-01 Apple Inc. Exemplar-based natural language processing
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10083690B2 (en) 2014-05-30 2018-09-25 Apple Inc. Better resolution when referencing to concepts
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10904611B2 (en) 2014-06-30 2021-01-26 Apple Inc. Intelligent automated assistant for TV user interactions
US9668024B2 (en) 2014-06-30 2017-05-30 Apple Inc. Intelligent automated assistant for TV user interactions
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10431204B2 (en) 2014-09-11 2019-10-01 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US9986419B2 (en) 2014-09-30 2018-05-29 Apple Inc. Social reminders
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US11556230B2 (en) 2014-12-02 2023-01-17 Apple Inc. Data detection
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US10311871B2 (en) 2015-03-08 2019-06-04 Apple Inc. Competing devices responding to voice triggers
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US11069347B2 (en) 2016-06-08 2021-07-20 Apple Inc. Intelligent automated assistant for media exploration
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10553215B2 (en) 2016-09-23 2020-02-04 Apple Inc. Intelligent automated assistant
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services

Also Published As

Publication number Publication date
IL139347A0 (en) 2001-11-25
AU2002214227A1 (en) 2002-05-15
WO2002037469A3 (en) 2002-08-29

Similar Documents

Publication Publication Date Title
WO2002037469A2 (en) Speech generating system and method
US8849895B2 (en) Associating user selected content management directives with user selected ratings
US8510277B2 (en) Informing a user of a content management directive associated with a rating
US8849659B2 (en) Spoken mobile engine for analyzing a multimedia data stream
US9092542B2 (en) Podcasting content associated with a user account
US6885736B2 (en) System and method for providing and using universally accessible voice and speech data files
US8001490B2 (en) System, method and computer program product for a content publisher for wireless devices
EP0848373B1 (en) A sytem for interactive communication
CN100568241C (en) Be used for concentrating the method and system of Content Management
US20070214148A1 (en) Invoking content management directives
US6771743B1 (en) Voice processing system, method and computer program product having common source for internet world wide web pages and voice applications
US20020097261A1 (en) Apparatus and method for simple wide-area network navigation
US20060155769A1 (en) Serving signals
JP2008027454A (en) System and method for using voice over telephone to access, process, and carry out transaction over internet
US20070208564A1 (en) Telephone based search system
WO2001014999A2 (en) System and method for structured news release generation and distribution
KR20040035589A (en) System for providing information converted in response to search request
WO2002063460A2 (en) Method and system for automatically creating voice xml file
KR20010085572A (en) Electronic bulletin board system and mail server
JPH11232192A (en) Data processing system and method for archiving and accessing electronic message
JP3789614B2 (en) Browser system, voice proxy server, link item reading method, and storage medium storing link item reading program
US20040150676A1 (en) Apparatus and method for simple wide-area network navigation
US10672037B1 (en) Automatic generation of electronic advertising messages containing one or more automatically selected stock photography images
US7272659B2 (en) Information rewriting method, recording medium storing information rewriting program and information terminal device
KR20050045650A (en) Information suppling system and method with info-box

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP