US20080240379A1 - Automatic retrieval and presentation of information relevant to the context of a user's conversation - Google Patents

Automatic retrieval and presentation of information relevant to the context of a user's conversation Download PDF

Info

Publication number
US20080240379A1
US20080240379A1 US11/882,479 US88247907A US2008240379A1 US 20080240379 A1 US20080240379 A1 US 20080240379A1 US 88247907 A US88247907 A US 88247907A US 2008240379 A1 US2008240379 A1 US 2008240379A1
Authority
US
United States
Prior art keywords
party
conversation
information
parameter
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/882,479
Inventor
Ariel Maislos
Ruben Maislos
Eran Arbel
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pudding Ltd
Original Assignee
Pudding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pudding Ltd filed Critical Pudding Ltd
Priority to US11/882,479 priority Critical patent/US20080240379A1/en
Assigned to PUDDING LTD reassignment PUDDING LTD ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ARBEL, ERAN, MAISLOS, ARIEL, MAISLOS, RUBEN
Publication of US20080240379A1 publication Critical patent/US20080240379A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4938Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals comprising a voice browser which renders and interprets, e.g. VoiceXML
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • G06F16/748Hypervideo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1083In-session procedures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M7/00Arrangements for interconnection between switching centres
    • H04M7/0024Services and arrangements where telephone services are combined with data services
    • H04M7/0036Services and arrangements where telephone services are combined with data services where the data service is an information service
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/40Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2203/00Aspects of automatic or semi-automatic exchanges
    • H04M2203/65Aspects of automatic or semi-automatic exchanges related to applications where calls are combined with other types of communication
    • H04M2203/655Combination of telephone service and social networking

Definitions

  • the present invention relates to techniques for information retrieval and presentation.
  • the present inventors are now disclosing a technique wherein a multi-party voice conversation is monitored (i.e. by monitoring electronic media content of the multi-party voice conversation), and in accordance with at least one feature of the electronic media content, information is retrieved and presented to at least one conversation party of the multi-party voice conversation.
  • Exemplary information sources from which information is retrieved include but are not limited to search engines, news services, images or video banks, RSS feeds, and blogs.
  • the information source may be local (for example, the local file system of a desktop computer or PDA) and/or may be remote (for example, a remote “Internet” search engine accessible via the Internet).
  • an “implicit information retrieval request” may be formulated, thereby relieving the user of any requirement to explicitly formulate an information retrieval request and direct that information retrieval request to an information retrieval service.
  • the present inventors are now disclosing that the nature of the information retrieval and/or presentation of the retrieved information may be adapted to certain detectable features of the conversation and/or features of the conversation participants.
  • a demographic profile of a given user may be generated (i.e. either from detectable features of the conversation and/or other information sources).
  • two individuals are speaking to each other in English (for example, using a “Skype” connection, or on cell phones), but one of the individuals has a Spanish accent.
  • the individual with the Spanish accent may be presented with retrieved Spanish-language information (for example, from a Spanish-language newswire retrieved using “keywords” translated from the English language conversation).
  • two users are speaking about applying to law-school.
  • One speaker is younger (say less than 25 years old) and another speaker is over 40.
  • the “age demographic” of the speakers is detected from electronic media content of the multi-party conversation, and the older user may be served an article about law-school essay strategies for older law-school applicants, while the younger user may be served a profile from a dating website for college-aged students interested in dating a pre-law major.
  • the Boston-based user may be provided information about New England law schools while the Florida-based user may be provided information about Florida law schools. This is an example of retrieving information according to a location of a participant in a multi-party conversation.
  • a man and woman may be speaking about movies, and the “gender demographic” is detected.
  • the man may be served information (for example, movie starting times) about movies popular with men (for example, horror movies, action movies, etc) while the woman may be served information about movies popular with women (for example, romance women). If the man is located on the “north side of town” and the woman on the “south side of town,” the man may be provided information about movie start times on the “north side” while the woman is provide information about movie start times on the “south side.”
  • information may be retrieved and/or presented in accordance with an emotion of one or more conversation-participants. For example, if it is detected that a person is angry, a link to anger-management material may be presented. In a similar example, if it is detected that a person is angry, a link to a clip of relaxing music may be presented.
  • rock-and-roll band may be pre-categorized as “happy songs” or “sad songs.” If one or both of the conversation-participants are detected as “happy” (for example, according to key words, body language, and/or voice tones), then links to clips of “happy songs” are presented.
  • information may be retrieved and/or presented in accordance with a “conversation participants relation.”
  • a “conversation participants relation” if it is determined or assessed that two conversation participants are spouses or lovers, when they speak about the particular rock-and-roll band, links to clips to “love songs” from this band are presented to the users.
  • the “most popular” songs from the band may be presented to the users instead, and the “romantic” songs may be filtered out.
  • information may be retrieved and/or presented in accordance with a physiological status feature of the user.
  • a link to a Wikipedia article or a Medscape article about the flu may be presented to the user.
  • information may be retrieved and/or presented in accordance with one or more personality traits personality-profile feature of the user.
  • personality traits personality-profile feature of the user.
  • an “extroverted” or “people-oriented” person would, when discussing a certain city with a friend, receive information about “people-oriented” activities that are done in groups. Conversely, an “introverted” person may receive information about activities done in solitude.
  • the method includes the steps of: a) monitoring a multi-party voice conversation not directed at the entity doing the monitoring; and b) in accordance with content of the monitored voice conversation, retrieving and presenting information to at least one party of the multi-party voice conversation.
  • Non-limiting examples of information items that may be retrieved include but are not limited to i) a social-network profile; ii) a weather forecast; iii) a traffic forecast; iv) a Wikipedia entry; v) a news article; vi) an online forum entry; vii) a blog entry; viii) a social bookmarking web service entry; ix) a music clip; and x) a film clip.
  • the retrieving includes assigning a keyword weight in accordance with an demographic parameter of a given party of the multi-party voice conversation estimated from electronic media of the multi-party conversation, the estimated demographic parameter being selected from the group consisting of: i) an age parameter; ii) a gender parameter; and iii) an ethnicity parameter.
  • the retrieving includes selecting or emphasizing an information-source from a plurality of candidate information-sources in accordance with an demographic parameter of a given party of the multi-party voice conversation estimated from electronic media of the multi-party conversation, the estimated demographic parameter being selected from the group consisting of: i) an age parameter; ii) a gender parameter; and iii) an ethnicity parameter.
  • the retrieving includes effecting a disambiguation in accordance with an demographic parameter of a given party of the multi-party voice conversation estimated from electronic media of the multi-party conversation, the estimated demographic parameter being selected from the group consisting of: i) an age parameter; ii) a gender parameter; and iii) an ethnicity parameter.
  • the assigning includes assigning a keyword weight in accordance with a speech delivery feature of a given party of the multi-party voice conversation estimated from electronic media of the multi-party conversation the speech delivery feature being selected from the group consisting of: i) a loudness parameter; ii) a speech tempo parameter; and iii) an emotional outburst parameter.
  • the retrieving includes selecting or emphasizing an information-source from a plurality of candidate information-sources in accordance with a geographic location of a given party of the multi-party voice conversation estimated from electronic media of the multi-party conversation.
  • the retrieving includes selecting or emphasizing an information-source from a plurality of candidate information-sources in accordance with an accent feature of at least one given party of the multi-party voice conversation.
  • the retrieving includes assigning a keyword weight in accordance with an demographic parameter of a given party of the multi-party voice conversation estimated from electronic media of the multi-party conversation, the estimated demographic parameter being selected from the group consisting of: i) an age parameter; ii) a gender parameter; and iii) an ethnicity parameter.
  • the information-presenting for a first set of words extracted from the multi-party conversation includes displacing earlier-presented retrieved information associated with a second earlier set of words extracted from the multi-party conversation in accordance with relative speech delivery parameters of the first and second set extracted words in accordance with a speech delivery feature being selected from the group consisting of: i) a loudness parameter; ii) a speech tempo parameter; and iii) an emotional outburst parameter.
  • the multi-party voice conversation is carried out between a plurality of client terminal devices communicating via a wide-area network, and for a given client device of the client device plurality: i) the information retrieval is carried out for incoming content relative to the given client device; and ii) the information presenting is on a display screen of the given client device.
  • a method of providing information-retrieval services comprising: a) monitoring a terminal device for incoming media content and outgoing media content of a multi-party conversation; and b) in accordance with the incoming media content, retrieving information over a remote network and presenting the retrieved information on the monitored-terminal device.
  • the retrieving includes sending content of the multi-party conversation to an Internet search engine
  • the presenting includes presenting search results from the Internet search engine.
  • the retrieving includes retrieving at least one of: i) a social-network profile; ii) a weather forecast; iii) a traffic forecast; iv) a Wikipedia entry; v) a news article; vi) an online forum entry; vii) a blog entry; viii) a social bookmarking web service entry; ix) a music clip; and x) a film clip.
  • a method of providing information-retrieval services comprising: a) monitoring a given terminal client device for an incoming or outgoing remote call; and b) upon detecting a the incoming or outgoing remote call, sending content of the detected incoming call or outgoing call over a wide-area network to a search engine; and c) presenting search results from the search engine on the monitored terminal device.
  • the at least one feature of the electronic media content includes at least one speech delivery feature i.e. describing how a given set of words is delivered by a given speaker.
  • speech delivery features include but are not limited to: accent features (i.e. which may be indicative, for example, of whether or not a person is a native speaker and/or an ethnic origin), speech tempo features (i.e. which may be indicative of a mood or emotional state), voice pitch features (i.e. which may be indicative, for example, of an age of a speaker), voice loudness features, voice inflection features (i.e. which may indicative of a mood including but not limited to angry, confused, excited, joking, sad, sarcastic, serious, etc) and an emotional outburst feature (defined here as a presence of laughing and/or crying).
  • accent features i.e. which may be indicative, for example, of whether or not a person is a native speaker and/or an ethnic origin
  • speech tempo features i.e. which may be indicative of a mood
  • a speaker speaks some sentences or words loudly, or in an excited state, while other sentences or words are spoken more quietly.
  • different words are given a different “weight” accordance to an assigned importance, and words or phrases spoken “loudly” or in an “excited stated” are given a higher weight than words or phrases spoken quietly.
  • the multi-party conversation is a video conversation
  • the at least one feature of the electronic media content includes a video content feature
  • Exemplary video content features include but are not limited to:
  • visible physical characteristic of a person in an image including but not limited to indications of a size of a person and/or a person's weight and/or a person's height and/or eye color and/or hair color and/or complexion;
  • a detected physical movement feature for example, a body-movement feature including but not limited to a feature indicative of hand gestures or other gestures associated with speaking.
  • the at least one feature of the electronic media content includes at least one key words features indicative of a presence and/or absence of key words or key phrases in the spoken content and the information search and/or retrieval is carried out in accordance with the at least one key word feature.
  • the key words feature is determined by using a speech-to-text converter for extracting text.
  • the extracted text is then analyzed for the presence of key words or phrases.
  • the electronic media content may be compared with sound clips that include the key words or phrases.
  • the at least one feature of the electronic media content includes at least one topic category feature—for example, a feature indicative if a topic of a conversation or portion thereof matches one or more topic categories selected from a plurality of topic categories for example, including but not limited to sports (i.e. a conversation related to sports), romance (i.e. a romantic conversation), business (i.e. a business conversation), current events, etc.
  • topic category feature for example, a feature indicative if a topic of a conversation or portion thereof matches one or more topic categories selected from a plurality of topic categories for example, including but not limited to sports (i.e. a conversation related to sports), romance (i.e. a romantic conversation), business (i.e. a business conversation), current events, etc.
  • the at least one feature of the electronic media content includes at least one topic change feature.
  • Exemplary topic change features include but are not limited to a topic change frequency, an impending topic change likelihood, an estimated time until a next topic change, and a time since a previous topic change.
  • retrieved information is displayed to a user, and when the conversation topic changes, previously-displayed information associated with a ‘previous topic’ is either removed from the user display and replaced with newer information, or is “scrolled down” or displayed less prominently.
  • the rate at which new information (i.e. in accordance with newer topic of the conversation) replaces older information can be adjusted in accordance with a number of factors, for example, the personality of one or more users (for example, with impulsive users, displayed retrieved information is replaced faster), an emotion associated with one or more words, and other factors.
  • the at least one feature of the electronic media content includes at least one feature ‘demographic property’ indicative of and/or derived from at least one demographic property or estimated demographic property (for example, age, gender, etc) of a person involved in the multi-party conversation (for example, a speaker). For example, two users who are over the age of 30 who speak about “Madonna” may be served a link to music clips from Madonna's song in the 1980s, while teenagers may be served a link to a music clip of one of Madonna's more recently released song.
  • two users with a demographic profile of “devout catholic” may be served an image of the blessed virgin Mary.
  • Exemplary demographic property features include but are not limited to gender features (for example, related to voice pitch or from hair length or any other gender features), educational level features (for example, related to spoken vocabulary words used), household income feature (for example, related to educational level features and/or key words related to expenditures and/or images of room furnishings), a weight feature (for example, related to overweight/underweight—e.g. related to size in an image or breathing rate where obese individuals or more likely to breath at a faster rate), age features (for example, related to an image of a balding head or gray hair and/or vocabulary choice and/or voice pitch), ethnicity (for example, related to skin color and/or accent and/or vocabulary choice).
  • gender features for example, related to voice pitch or from hair length or any other gender features
  • educational level features for example, related to spoken vocabulary words used
  • household income feature for example, related to educational level features and/or key words related to expenditures and/or images of room furnishings
  • a weight feature for example, related to overweight/underweight—e.g. related to size in
  • Another feature that, in some embodiments, may indicate a person's demography is the use (or lack of usage) of certain expressions, including but not limited to profanity. For example, people from certain regions or age groups may be more likely to use profanity (or a certain type), while those from other regions or age groups may be less likely to use profanity (or a certain type).
  • Demographic property features may be derived from audio and/or video features and/or word content features.
  • Exemplary features from which demographic property features may be derived from include but are not limited to: idiom features (for example, certain ethnic groups or people from certain regions of the United States may tend to use certain idioms), accent features, grammar compliance features (for example, more highly educated people are less likely to make grammatical errors), and sentence length features (for example, more highly educated people are more likely to use longer or more ‘complicated features’).
  • people associated with the more highly educated demographic group are more likely to be served content or links to content from the “New York Times” (i.e. a publication with more “complicated” writing and vocabulary”) while a “less educated user” is served content or links to content from the “New York Post” (i.e. a publication with more “complicated” writing and vocabulary”)
  • the at least one feature of the electronic media content includes at least one ‘physiological feature’ indicative of and/or derived from at least one physiological property or estimated demographic property (for example, age, gender, etc) of a person involved in the multi-party conversation (for example, a speaker)—i.e. as derived from the electronic media content of the multi-party conversation.
  • physiological property or estimated demographic property for example, age, gender, etc
  • Exemplary physiological parameters include but are not limited to breathing parameters (for example, breathing rate or changes in breathing rate), a sweat parameters (for example, indicative if a subject is sweating or how much—this may be determined, for example, by analyzing a ‘shininess’ of a subject's skin, a coughing parameter (i.e. a presence or absence of coughing, a loudness or rate of coughing, a regular or irregularity of patterns of coughing), a voice-hoarseness parameter, and a body-twitching parameter (for example, twitching of the entire body due to, for example, chills, or twitching of a given body part—for example, twitching of an eyebrow).
  • breathing parameters for example, breathing rate or changes in breathing rate
  • a sweat parameters for example, indicative if a subject is sweating or how much—this may be determined, for example, by analyzing a ‘shininess’ of a subject's skin
  • a coughing parameter i.e. a presence or
  • a person may twitch a body part when nervous or lying. If it is assessed that a user or speaker is “lying” this could also influence search results.
  • the at least one feature of the electronic media content includes at least one feature ‘background item feature’ indicative of and/or derived from background sounds and/or a background image. It is noted that the background sounds may be transmitted along with the voice of the conversation, and thus may be included within the electronic media content of the conversation.
  • news article about recently-passed local ordinances regulating dog-ownership may be displayed.
  • the background sound may be determined or identified, for example, by comparing the electronic media content of the conversation with one or more sound clips that include the sound it is desired to detect. These sound clips may thus serve as a ‘template.’
  • an item i.e. good or service
  • a news article about the Pope may be provided, or a link to a religious blog may be provided.
  • the at least one feature of the electronic media content includes at least one feature temporal and/or spatial localization feature indicative of and/or derived from a specific location or time.
  • a Philadelphia-located user for example, having a phone number in the 215 area code
  • sports stories for example, from a newswire
  • a Baltimore-located user for example, having a phone number in the 301 area code
  • This localization feature may be determined from the electronic media of the multi-party conversation.
  • this localization feature may be determined from data from an external source for example, a GPS and/or mobile phone triangulation.
  • an ‘external source’ for localization information is a dialed telephone number.
  • certain area codes or exchanges may be associated (but not always) with certain physical locations.
  • the at least one feature of the electronic media content includes at least one ‘historical feature’ indicative of electronic media content of a previous multi-party conversation and/or an earlier time period in the conversation—for example, electronic media content who age is at least, for example, 5 minutes, or 30 minutes, or one hour, or 12 hours, or one day, or several times, or a week, or several weeks.
  • the at least one feature of the electronic media content includes at least one ‘deviation feature.’
  • Exemplary deviation features of the electronic media content of the multi-party conversation include but are not limited to:
  • a) historical deviation features i.e. a feature of a given subject or person that changes temporally so that a given time, the behavior of the feature differs from its previously-observed behavior.
  • a certain subject or individual usually speaks slowly, and at a later time, this behavior ‘deviates’ when the subject or individual speaks quickly.
  • a typically soft-spoken individual speaks with a louder voice.
  • an individual who 3 months ago was observed (e.g. via electronic media content) to be of average or above-average weight is obese.
  • This individual may be served a Wikipedia link about weight-loss.
  • a user who is consistently obese may not be served the link in order not to “annoy” the user.
  • a person who is normally polite may become angry and rude—this may an example of ‘user behavior features.’
  • inter-subject deviation features for example, a ‘well-educated’ person associated with a group of lesser educated persons (for example, speaking together in the same multi-party conversation), or a ‘loud-spoken’ person associated with a group of ‘soft-spoken’ persons, or ‘Southern-accented’ person associated with a group of persons with Boston accents, etc. If distinct conversations are recorded, then historical deviation features associated with a single conversation are referred to as intra-conversation deviation features, while historical deviation features associated with distinct conversations are referred to as inter-conversation deviation features.
  • voice-property deviation features for example, an accent deviation feature, a voice pitch deviation feature, a voice loudness deviation feature, and/or a speech rate deviation feature. This may related to user-group deviation features as well as historical deviation features.
  • physiological deviation features for example, breathing rate deviation features, weight deviation features—this may related to user-group deviation features as well as historical deviation features.
  • vocabulary or word-choice deviation features for example, profanity deviation features indicating use of profanity—this may related to user-group deviation features as well as historical deviation features.
  • person-versus-physical-location for example, a person with a Southern accent whose location is determined to be in a Northern city (e.g. Boston) might be provided with a hotel coupon
  • the at least one feature of the electronic media content includes at least one ‘person-recognition feature.’ This may be useful, for example, for providing pertinent retrieved information targeted for a specific person.
  • the person-recognition feature allows access to a database of person-specific data where the person-recognition feature functions, at least in part, as a ‘key’ of the database.
  • the ‘data’ may be previously-provided data about the person, for example, demographic data or other data, that is provided in any manner, for example, derived from electronic media of a previous conversation, or in any other manner.
  • this may obviate the need for users to explicitly provide account information and/or to log in order to receive ‘personalized’ retrieved information.
  • the user simply uses the service, and the user's voice is recognized from a voice-print. Once the system recognizes the specific user, it is possible to present retrieved information in accordance with previously-stored data describing preferences of the specific user.
  • Exemplary ‘person-recognition’ features include but are not limited to biometric features (for example, voice-print or facial features) or other person visual appearance features, for example, the presence or absence of a specific article of clothing.
  • the at least one feature of the electronic media content includes at least one ‘person-influence feature.’
  • the boss may have more influence and may function as a so-called gatekeeper.
  • the ‘influencing statement’ may be assigned more importance. For example, if party ‘A’ says ‘we should spend more money on clothes’ and party ‘B’ responds by saying ‘I agree’ this could imbue party A's statement with additional importance, because it was an ‘influential statement.’
  • a user has several conversations in one day.
  • the first conversation is with an “influential person” who may be “important” for example, a client/boss to whom the user of a device shows deference.
  • Previous search results may be cleared from a display screen or scrolled down, and replaced with search results that relate to the conversation with “important” person.
  • the user may speak with a less “influential” person—for example, a child.
  • previously-display retrieved information (for example, retrieved in accordance with the first conversation) is not replaced with information retrieved from the second conversation.
  • the retrieval and/or presentation of information includes presenting information to a first individual (for example, person ‘A’) in accordance with one or more feature of media content from a second individual different from the first individual (for example, person ‘B’).
  • Some embodiments of the present invention provide apparatus for retrieving and presenting information.
  • the apparatus may be operative to implement any method or any step of any method disclosed herein.
  • the apparatus may be implemented using any combination of software and/or hardware.
  • the data storage may be implemented using any combination of volatile and/or non-volatile memory, and may reside in a single device or reside on a plurality devices either locally or over a wide area.
  • the aforementioned apparatus may be provided as a single client device (for example, as a handset or laptop or desktop configured to present retrieved information in accordance with the electronic media content).
  • the ‘data storage’ is volatile and/or non-volatile memory of the client device—for example, where outgoing and incoming content is digitally stored in the client device or a peripheral storage device of the client device.
  • the apparatus may be distributed on a plurality of devices for example with a ‘client-server’ architecture.
  • FIGS. 1A-1C describe exemplary use scenarios.
  • FIG. 2A-2D , 4 , 5 A- 5 C provide flow charts of exemplary techniques for locating, retrieving and/or presenting information related to electronic media content of a multi-party conversation.
  • FIG. 3 describes an exemplary technique for computing one or more features of electronic media content including voice content.
  • FIG. 6 provides a block diagram of an exemplary system for retrieving and presenting information in according with some embodiments of the present invention.
  • FIG. 7 describes an exemplary system for providing electronic media content of a multi-party conversation.
  • FIGS. 8-14 describes exemplary systems for computing various features.
  • Embodiments of the present invention relate to a technique for retrieving and displaying information in accordance with the context and/or content of voice content including but not limited to voice content transmitted over a telecommunications network in the context of a multiparty conversation.
  • the present use scenarios and many other examples relate to the case where the multi-party conversation is transmitted via a telecommunications network (e.g. circuit switched and/or packet switched).
  • a telecommunications network e.g. circuit switched and/or packet switched.
  • two or more people are conversing ‘in the same room’ and the conversation is recorded by a single microphones or plurality of microphones (and optionally one or more cameras) deployed ‘locally’ without any need for transmitting content of the conversation via a telecommunications network.
  • a first user i.e. ‘party 1 ’
  • a “car phone” i.e. a mobile phone mounted in a car, for example, in proximity of an onboard navigator system
  • a second user i.e. ‘party 2 ’
  • VOIP software residing on the desktop, such as Skype® software.
  • retrieved information is served to party 1 in accordance with content of the conversation.
  • Party 2 mentions “Millard Fillmore”
  • information about Millard Fillmore is retrieved and displayed on a client device associated with “party 1 ”—either the “small screen” of “Party 1 ”'s car-mounted cellphone or the “larger screen” of party 1 's onboard navigator device.
  • Party 1 there is no need for “Party 1 ” to provide any search query whatsoever—a conversation is monitored that is not directed to the entity doing the monitoring but rather, the words of Party 1 are directed exclusively to other co-conversationalist(s)—in this case Party 2 , and the words of Party 2 are directed exclusively to Party 1 .
  • Party 2 “knows” that Party 1 is driving and cannot key in a search query, for example a standard Internet search engine.
  • a search query for example a standard Internet search engine.
  • Party 1 unexpectedly knows extensive information about Millard Fillmore (i.e. a rather exotic topic)
  • Party 1 succeeds in surprising party 1 .
  • party 1 is located in Cleveland and party 2 is located in Boston.
  • Party 2 driving in a region of the city where a building was recently on fire. They are discussing the building fire.
  • FIG. 1B after the word “fire” is meant, a news story about a fire is displayed on a screen of a user 1 .
  • the fire is not a major fire, and at the time, a number of small fires are being handled in different cities throughout the United States. Thus, a certain amount of “disambiguation” is required in order to serve information about the “correct fire.”
  • party 1 proposes going go a Yankees game.
  • Party 2 does not mention anything specific about the Yankees. Nevertheless, information about the Yankees (for example, an article about the history of the Yankees, or a news story about their latest game) is retrieved and served to the client terminal device of party 2 . This is one example of information being retrieved and served (i.e. to the “cellphone” of party 2 ) in accordance with “incoming” (i.e. incoming to the “cellphone” client terminal device of Party 2 ) electronic media content of the multi-party conversation.
  • ‘providing’ of media or media content includes one or more of the following: (i) receiving the media content (for example, at a server cluster comprising at least one cluster, for example, operative to analyze the media content and/or at a proxy); (ii) sending the media content; (iii) generating the media content (for example, carried out at a client device such as a cell phone and/or PC); (iv) intercepting; and (v) handling media content, for example, on the client device, on a proxy or server.
  • a ‘multi-party’ voice conversation includes two or more parties, for example, where each party communicated using a respective client device including but not limited to desktop, laptop, cell-phone, and personal digital assistant (PDA).
  • a respective client device including but not limited to desktop, laptop, cell-phone, and personal digital assistant (PDA).
  • PDA personal digital assistant
  • the electronic media content from the multi-party conversation is provided from a single client device (for example, a single cell phone or desktop).
  • the media from the multi-party conversation includes content from different client devices.
  • the media electronic media content from the multi-party conversation is from a single speaker or a single user.
  • the media electronic media content from the multi-party conversation is from multiple speakers.
  • the electronic media content may be provided as streaming content.
  • streaming audio (and optionally video) content may be intercepted, for example, as transmitted a telecommunications network (for example, a packet switched or circuit switched network).
  • the conversation is monitored on an ongoing basis during a certain time period.
  • the electronic media content is pre-stored content, for example, stored in any combination of volatile and non-volatile memory.
  • ‘presenting of retrieved information in accordance with a least one feature’ includes one or more of the following:
  • a client device i.e. a screen of a client device
  • This configuring may be accomplished, for example, by displaying the retrieved information using an email client and/or a web browser and/or any other client residing on the client device;
  • FIG. 2A refers to an exemplary technique for retrieving and presenting information in accordance with content of a multi-party conversation.
  • step S 109 electronic digital media content including spoken or voice content (e.g. of a multi-party audio conversation) is provided—e.g. received and/or intercepted and/or handled.
  • spoken or voice content e.g. of a multi-party audio conversation
  • step S 111 one or more aspects of electronic voice content (for example, content of multi-party audio conversation are analyzed), or context features are computed.
  • the words of the conversation are extracted from the voice conversation and the words are analyzed, for example, for a presence of key phrases.
  • an accent of one or more parties to the conversation is detected. If, for example, one party has a ‘Texas accent’ then this increases a likelihood that the party will receive (for example, on her terminal such as a cellphone or desktop) information from a Texas-based online newspaper or magazine.
  • the multi-party conversation is a ‘video conversation’ (i.e. voice plus video).
  • a conversation participant is wearing, for example, a hat or jacket associated with a certain sports team (for example, a particular baseball team), and if that sports team is scheduled an “away game” in a different city, a local weather forecast or traffic forecast associated with the game may be presented either to the “fan” or to a co-conversationalist (for example, using a different client terminal device) who could then “impress” the “fan” with his knowledge.
  • step S 113 one or more operations are carried out to retrieve and present information in accordance with results of the analysis of step S 111 .
  • the information may be retrieved from any source, including but not limited to online search engines, news services (for example, newswires or “news sites” like www.cnn.com or www.nytimes.com), images or video banks, RSS feeds, weather or traffic forecasts, Youtube® clips, sports statistics, Diugg, social editing sites, music banks, shopping sights such as Amazon, Deel.icio.us and blogs.
  • the information source may be local (for example, the local file system of a desktop computer or PDA) and/or may be remote (for example, a remote “Internet” search engine accessible via the Internet).
  • advertisement information may be served together with the retrieved information
  • the retrieved information includes information other than advertisement such as: Wikipedia entries, entries from social networks (such as dating sites, myspace, LinkedIn, etc), news articles, blohs, video or audio clips, or just about any form of information.
  • FIG. 2B presents a flow-chart of a technique where outgoing and/or incoming content is monitored S 411 , and in accordance with the content, information is retrieved and presented S 415 .
  • outgoing and/or incoming content is monitored S 411 , and in accordance with the content, information is retrieved and presented S 415 .
  • This is accomplished in accordance with “incoming content” was discussed with reference to FIG. 1C .
  • FIG. 2C provides a flow-chart wherein a terminal device is monitored S 411 for an incoming and/or outgoing call with another client terminal device.
  • a terminal device is monitored S 411 for an incoming and/or outgoing call with another client terminal device.
  • information is retrieved in accordance with incoming and/or outgoing content of the multi-party conversation and presented.
  • FIG. 2D provides a flow chart of an exemplary technique where: (i) a first information retrieval and presentation is carried out in accordance with a first “batch” of content or words (S 411 and S 415 ); and (ii) when the topic changes or another event occurs S 425 (for example, a speaker gets excited about something, raises his or her voice, looks up, repeats a phrase, etc—for example, beyond some threshold), information may be retrieved and presented (i.e. by displacing the previously-retrieved information from the first batch of electronic media content) in accordance with content 8429 of a “second batch” of content or words.
  • the “earlier” information may be scrolled down.
  • a “link” or interface element “pointing” to most recent content may be re-configured to, upon user invocation, provide the retrieved information for the “second batch” of content rather than the “first batch” of content, after, for example, the topic has changed and/or the user or conversation-participant has indicated a particular emotion or body language, etc.
  • FIG. 3 provides exemplary types of features that are computed or assessed S 111 when analyzing the electronic media content. These features include but are not limited to speech delivery features S 151 , video features 8155 , conversation topic parameters or features S 159 , key word(s) feature S 161 , demographic parameters or features S 163 , health or physiological parameters of features S 167 , background features S 169 , localization parameters or features S 175 , influence features S 175 , history features S 179 , and deviation features S 183 .
  • a multi-party conversation i.e. voice and optionally video
  • assess i.e. determine and/or estimate
  • S 163 if a conversation participant is a member of a certain demographic group from a current conversation and/or historical conversations. This information may then be used to more effectively retrieve and present “pertinent” information to the user and/or an associate of the user.
  • Relevant demographic groups include but are not limited to: (i) age; (ii) gender; (iii) educational level; (iv) household income; (v) ethnic group and/or national origin; (vi) medical condition.
  • the age of a conversation participant is determined in accordance with a number of features, including but not limited to one or more of the following: speech content features and speech delivery features.
  • the user's physical appearance can also be indicative of a user's age and/or gender. For example, gray hair may indicate an older person, facial hair may indicate a male, etc.
  • Information retrieval and/or presentation can be customized using this demographic parameter as well. For example, if it assumed that a conversationalist is college educated people, the n
  • this feature also may be assessed or determined using one or more of speech content features and speech delivery features.
  • “child-oriented” content for example, a link to a Sesame Street clip
  • “parent-oriented content” for example, an article from Parenting magazine online
  • the first user may be served an article about popular movies for young children.
  • the conversation then shifts to the topic of vacations, and a dog barking is detected in the background for the second “cell phone” then the second user on the second cell phone may be served an article about popular “pet-friendly” vacation destinations.
  • ech content features includes slang or idioms that tend to be used by a particular ethnic group or non-native English speakers whose mother tongue is a specific language (or who come from a certain area of the world).
  • ech delivery features relates to a speaker's accent.
  • the skilled artisan is referred, for example, to US 2004/0096050, incorporated herein by reference in its entirety, and to US 2006/0067508, incorporated herein by reference in its entirety.
  • a user's medical condition may be assessed in accordance with one or more audio and/or video features.
  • breathing sounds may be analyzed, and breathing rate may be determined. This may be indicative of whether or not a person has some sort of respiratory ailment, and data from a medical database could be presented to the user.
  • breathing sounds may determine user emotions and/or user interest in a topic.
  • Biometric Data for Example, Voice-Print Data
  • Demographic Data with Reference to FIG. 4
  • the system may determine from a first conversation (or set of conversations) specific data about a given user with a certain level of certainty.
  • the earlier demographic profile may be refined in a later conversation by gathering more ‘input data points.’
  • the user may be averse to giving ‘account information’—for example, because there is a desire not to inconvenience the user.
  • step S 211 content (i.e. voice content and optionally video content) if a multi-party conversation is analyzed and one or more biometric parameters or features (for example, voice print or face ‘print’) are computed.
  • biometric parameters or features for example, voice print or face ‘print’
  • the results of the analysis and optionally demographic data are stored and are associated with a user identity and/or voice print data.
  • the identity of the user is determined and/or the user is associated with the previous conversation using voice print data based on analysis of voice and/or video content S 215 .
  • the previous demographic information of the user is available.
  • the demographic profile is refined by analyzing the second conversation.
  • FIG. 5A provides a flow chart of an exemplary technique for retrieving and providing information.
  • certain words are given “weights” in the information retrieval according to one or more features of a conversation participant. For example, if it is determined that a given conversation-participant is “dominant” in the conversation (i.e. either from a personality profile or from the interaction between conversation-participants), words spoken by this participant may be given a greater weight in information retrieval or search.
  • words spoken excitedly and/or with certain body language may be given greater weight.
  • FIG. 5B relates to a technique where a term disambiguation S 309 may be carried out in accordance with one or more features of a conversation participant. For example, if it assessed that a person is an avid investor or computer enthusiast, then the word “apple” may be handled by retrieving information related to Apple Computer.
  • Madonna this could refer either to the “Virgin Mary” to a singer. If it is assessed that a conversation participant is an avid catholic, it is more likely the former. If it is assessed that a conversation participant is likes pop-music (for example, from background sounds, age demographics, slang, etc), then Madonna is more likely to refer to the singer.
  • words are given greater “weight” or priority in accordance with body language and/or speech delivery features.
  • FIG. 6 provides a block diagram of an exemplary system 100 for retrieval and presentation of information in according with some embodiments of the present invention.
  • the apparatus or system, or any component thereof may reside on any location within a computer network (or single computer device) i.e. on the client terminal device 10 , on a server or cluster of servers (not shown), proxy, gateway, etc.
  • Any component may be implemented using any combination of hardware (for example, non-volatile memory, volatile memory, CPUs, computer devices, etc) and/or software—for example, coded in any language including but not limited to machine language, assembler, C, C++, Java, C#, Perl etc.
  • the exemplary system 100 may an input 110 for receiving one or more digitized audio and/or visual waveforms, a speech recognition engine 154 (for converting a live or recorded speech signal to a sequence of words), one or more feature extractor(s) 118 , a historical data storage 142 , and a historical data storage updating engine 150 .
  • any element in FIG. 6 may be implemented as any combination of software and/or hardware.
  • any element in FIG. 6 and any element described in the present disclosure may be either reside on or within a single computer device, or be a distributed over a plurality of devices in a local or wide-area network.
  • Audio and/or Video Input 110 is Audio and/or Video Input 110
  • the media input 110 for receiving a digitized waveform is a streaming input. This may be useful for ‘eavesdropping’ on a multi-party conversation in substantially real time.
  • substantially real time refers to refer time with no more than a predetermined time delay, for example, a delay of at most 15 seconds, or at most 1 minute, or at most 5 minutes, or at most 30 minutes, or at most 60 minutes.
  • a multi-party conversation is conducted using client devices or communication terminals 10 (i.e. N terminals, where N is greater than or equal to two) via the Internet 2 .
  • client devices or communication terminals 10 i.e. N terminals, where N is greater than or equal to two
  • VOIP software such as Skype® software resides on each terminal 10 .
  • ‘streaming media input’ 110 may reside as a ‘distributed component’ where an input for each party of the multi-party conversation resides on a respective client device 10 .
  • streaming media signal input 110 may reside at least in part ‘in the cloud’ (for example, at one or more servers deployed over wide-area and/or publicly accessible network such as the Internet 20 ).
  • audio streaming signals and/or video streaming signals of the conversation may be intercepted as they are transmitted over the Internet.
  • input 110 does not necessarily receive or handle a streaming signal.
  • stored digital audio and/or video waveforms may be provided stored in non-volatile memory (including but not limited to flash, magnetic and optical media) or in volatile memory.
  • the multiparty conversation is not required to be a VOIP conversation.
  • two or more parties are speaking to each other in the same room, and this conversation is recorded (for example, using a single microphone, or more than one microphone).
  • the system 100 may include a ‘voice-print’ identifier (not shown) for determining an identity of a speaking party (or for distinguishing between speech of more than one person).
  • At least one communication device is a cellular telephone communicating over a cellular network.
  • two or more parties may converse over a ‘traditional’ circuit-switched phone network, and the audio sounds may be streamed to information retrieval and presentation system 100 and/or provided as recording digital media stored in volatile and/or non-volatile memory.
  • FIG. 8 provides a block diagram of several exemplary feature extractor(s)—this is not intended as comprehensive but just to describe a few feature extractor(s). These include: text feature extractor(s) 210 for computing one or more features of the words extracted by speech recognition engine 154 (i.e. features of the words spoken); speech delivery features extractor(s) 220 for determining features of how words are spoken; speaker visual appearance feature extractor(s) 230 (i.e. provided in some embodiments where video as well as audio signals are analyzed); and background features (i.e. relating to background sounds or noises and/or background images).
  • text feature extractor(s) 210 for computing one or more features of the words extracted by speech recognition engine 154 (i.e. features of the words spoken); speech delivery features extractor(s) 220 for determining features of how words are spoken; speaker visual appearance feature extractor(s) 230 (i.e. provided in some embodiments where video as well as audio signals are analyzed); and background features (i.e. relating to background sounds
  • the feature extractors may employ any technique for feature extraction of media content known in the art, including but not limited to heuristically techniques and/or ‘statistical AI’ and/or ‘data mining techniques’ and/or ‘machine learning techniques’ where a training set is first provided to a classifier or feature calculation engine.
  • the training may be supervised or unsupervised.
  • Exemplary techniques include but are not limited to tree techniques (for example binary trees), regression techniques, Hidden Markov Models, Neural Networks, and meta-techniques such as boosting or bagging.
  • this statistical model is created in accordance with previously collected “training” data.
  • a scoring system is created.
  • a voting model for combining more than one technique is used.
  • a first feature may be determined in accordance with a different feature, thus facilitating ‘feature combining.’
  • one or more feature extractors or calculation engine may be operative to effect one or more ‘classification operations’—e.g. determining a gender of a speaker, age range, ethnicity, income, and many other possible classification operations.
  • FIG. 8 Each element described in FIG. 8 is described in further detail below.
  • FIG. 9 provides a block diagram of exemplary text feature extractors.
  • a phrase detector 260 may identify certain phrases or expressions spoken by a participant in a conversation.
  • this may indicate a current desire or preference. For example, if a speaker says “I am quite hungry” this may indicate that a food product add should be sent to the speaker.
  • a speaker may use certain idioms that indicate general desire or preference rather than a desire at a specific moment. For example, a speaker may make a general statement regarding a preference for American cars, or a professing love for his children, or a distaste for a certain sport or activity. These phrases may be detected and stored as part of a speaker profile, for example, in historical data storage 142 .
  • the speaker profile built from detecting these phrases, and optionally performing statistical analysis, may be useful for present or future provisioning of ads to the speaker or to another person associated with the speaker.
  • the phrase detector 260 may include, for example, a database of pre-determined words or phrases or regular expressions.
  • the computational cost associated with analyzing text to determine the appearance of certain regular phrases may increase with the size of the set of phrases.
  • the exact set of phrases may be determined by various business considerations.
  • certain sponsors may ‘purchase’ the right to include certain phrases relevant for the sponsor's product in the set of words or regular expressions.
  • the text feature extractor(s) 210 may be used to provide a demographic profile of a given speaker. For example, usage of certain phrases may be indicative of an ethnic group of a national origin of a given speaker. As will be described below, this may be determined using some sort of statistical model, or some sort of heuristics, or some sort of scoring system.
  • pre-determined conversation ‘training sets’ of more educated people and conversation ‘training sets’ of less educated people For each training set, frequencies of various words may be computed. For each pre-determined conversation ‘training set,’ a language model of word (or word combination) frequencies may be constructed.
  • This principle could be applied using pre-determined ‘training sets’ for native English speakers vs. non-native English speakers, training sets for different ethnic groups, and training sets for people from different regions.
  • This principle may also be used for different conversation ‘types.’ For example, conversations related to computer technologies would tend to provide an elevated frequency for one set of words, romantic conversations would tend to provide an elevated frequency for another set of words, etc.
  • various training sets can be prepared. For a given segment of analyzed conversation, word frequencies (or word combination frequencies) can then be compared with the frequencies of one or more training sets.
  • POS tagger 264 a part of speech (POS) tagger 264 is provided.
  • FIG. 10 provides a block diagram of an exemplary system 220 for detecting one or more speech delivery features. This includes an accent detector 302 , tone detector 306 , speech tempo detector 310 , and speech volume detector 314 (i.e. for detecting loudness or softness.
  • speech delivery feature extractor 220 or any component thereof may be pre-trained with ‘training data’ from a training set.
  • FIG. 11 provides a block diagram of an exemplary system 230 for detecting speaker appearance features—i.e. for video media content for the case where the multi-party conversation includes both voice and video. This includes a body gestures feature extractor(s) 352 , and physical appearance features extractor 356 .
  • FIG. 12 provides a block diagram of an exemplary background feature extractor(s) 250 .
  • This includes (i) audio background features extractor 402 for extracting various features of background sounds or noise including but not limited to specific sounds or noises such as pet sounds, an indication of background talking, an ambient noise level, a stability of an ambient noise level, etc; and (ii) visual background features extractor 406 which may, for example, identify certain items or features in the room, for example, certain products are brands present in a room.
  • FIG. 13 provides a block diagram of additional feature extractors 118 for determining one or more features of the electronic media content of the conversations. Certain features may be ‘combined features’ or ‘derived features’ derived from one or more other features.
  • a conversation harmony level classifier for example, determining if a conversation is friendly or unfriendly and to what extent
  • a deviation feature calculation engine 456 for a feature engine for demographic feature(s) 460 , a feature engine for physiological status 464 , a feature engine for conversation participants relation status 468 (for example, family members, business partners, friends
  • FIG. 14 provides a block diagram of exemplary demographic feature calculators or classifiers. This includes gender classifier 502 , ethnic group classifier 506 , income level classifier 510 , age classifier 514 , national/regional origin classifier 518 , tastes (for example, clothes and good) classifier 522 , educational level classifier 5267 , marital status classifier 530 , job status classifier 534 (i.e. employed vs. unemployed, manager vs. employee, etc), religion classifier 538 (i.e. Jewish, Christian, Malawi, Muslim, etc).
  • a religion of a person is detected, for example, using key-words, accent and/or speaker location.
  • One example relates to a speaker with who often speaks about Jewish topics, or may often listen to Klezmer music or Yiddish music in the background.
  • certain recipes may be presented to the speaker—if the speaker is Jewish, recipes that include pork may be filtered out.
  • each of the verbs, “comprise” “include” and “have”, and conjugates thereof are used to indicate that the object or objects of the verb are not necessarily a complete listing of members, components, elements or parts of the subject or subjects of the verb.
  • an element means one element or more than one element.

Abstract

Methods, apparatus and computer-code for electronically retrieving and presenting information are disclosed herein. In some embodiments, information is retrieved and presented in accordance with at least one feature of electronic media content of a multi-party conversation. Optionally, the multi-party conversation is a video conversation and at least one feature is a video content feature. Exemplary features include but are not limited to speech delivery features, key word features, topic features, background sound or image features, deviation features and biometric features.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This patent application claims the benefit of U.S. Provisional Patent Application No. 60/821,272 filed Aug. 2, 2006 by the present inventors, and U.S. Provisional Patent Application No. 60/824,323 filed Sep. 1, 2006 by the present inventors
  • FIELD OF THE INVENTION
  • The present invention relates to techniques for information retrieval and presentation.
  • BACKGROUND AND RELATED ART
  • Knowledge bases contain enormous amounts of information on any topic imaginable. To tap this information, however, users need to explicitly issue a search request. The explicit search process requires the user to:
  • (i) realize that he needs a specific piece of information;
  • (ii) select the information source(s)
  • (iii) formulate a query expression and execute it against the information.
  • The following published patent applications provide potentially relevant background material: US 2006/0167747; US 2003/0195801; US 2006/0188855; US 2002/0062481; and US 2005/0234779. All references cited herein are incorporated by reference in their entirety. Citation of a reference does not constitute an admission that the reference is prior art.
  • SUMMARY
  • The present inventors are now disclosing a technique wherein a multi-party voice conversation is monitored (i.e. by monitoring electronic media content of the multi-party voice conversation), and in accordance with at least one feature of the electronic media content, information is retrieved and presented to at least one conversation party of the multi-party voice conversation.
  • Exemplary information sources from which information is retrieved include but are not limited to search engines, news services, images or video banks, RSS feeds, and blogs. The information source may be local (for example, the local file system of a desktop computer or PDA) and/or may be remote (for example, a remote “Internet” search engine accessible via the Internet).
  • Not wishing to be bound by any theory, it is noted that by monitoring the multi-party voice conversation that is not directed to an entity doing the monitoring, an “implicit information retrieval request” may be formulated, thereby relieving the user of any requirement to explicitly formulate an information retrieval request and direct that information retrieval request to an information retrieval service.
  • Furthermore, the present inventors are now disclosing that the nature of the information retrieval and/or presentation of the retrieved information may be adapted to certain detectable features of the conversation and/or features of the conversation participants.
  • In one example, a demographic profile of a given user may be generated (i.e. either from detectable features of the conversation and/or other information sources). Thus, in one particular example, two individuals are speaking to each other in English (for example, using a “Skype” connection, or on cell phones), but one of the individuals has a Spanish accent. According to this example, the individual with the Spanish accent may be presented with retrieved Spanish-language information (for example, from a Spanish-language newswire retrieved using “keywords” translated from the English language conversation).
  • In another example related to retrieval and/or presentation of information in accordance with a demographic profile, two users are speaking about applying to law-school. One speaker is younger (say less than 25 years old) and another speaker is over 40. The “age demographic” of the speakers is detected from electronic media content of the multi-party conversation, and the older user may be served an article about law-school essay strategies for older law-school applicants, while the younger user may be served a profile from a dating website for college-aged students interested in dating a pre-law major.
  • If, for example, one user is speaking on a cell phone in Boston and the user is speaking on a cell phone in Florida, the Boston-based user may be provided information about New England law schools while the Florida-based user may be provided information about Florida law schools. This is an example of retrieving information according to a location of a participant in a multi-party conversation.
  • In another example related to retrieval and/or presentation of information in accordance with a demographic profile, a man and woman may be speaking about movies, and the “gender demographic” is detected. The man may be served information (for example, movie starting times) about movies popular with men (for example, horror movies, action movies, etc) while the woman may be served information about movies popular with women (for example, romance women). If the man is located on the “north side of town” and the woman on the “south side of town,” the man may be provided information about movie start times on the “north side” while the woman is provide information about movie start times on the “south side.”
  • In another example, information may be retrieved and/or presented in accordance with an emotion of one or more conversation-participants. For example, if it is detected that a person is angry, a link to anger-management material may be presented. In a similar example, if it is detected that a person is angry, a link to a clip of relaxing music may be presented.
  • In another example related to emotion-based information retrieval, if two people are speaking about a given rock-and-roll band, links to clips of the band's music may be presented. In one variation, certain songs of the rock-and-roll band may be pre-categorized as “happy songs” or “sad songs.” If one or both of the conversation-participants are detected as “happy” (for example, according to key words, body language, and/or voice tones), then links to clips of “happy songs” are presented.
  • In another example, information may be retrieved and/or presented in accordance with a “conversation participants relation.” Thus, if it is determined or assessed that two conversation participants are spouses or lovers, when they speak about the particular rock-and-roll band, links to clips to “love songs” from this band are presented to the users. Alternatively, if it is determined that two conversation participants are not friends or lovers but only business acquaintances, the “most popular” songs from the band may be presented to the users instead, and the “romantic” songs may be filtered out.
  • In another example, information may be retrieved and/or presented in accordance with a physiological status feature of the user. In this example, if a user coughs often during the conversation, a link to a Wikipedia article or a Medscape article about the flu may be presented to the user.
  • In another example, information may be retrieved and/or presented in accordance with one or more personality traits personality-profile feature of the user. According to one particular example, if an “extroverted” or “people-oriented” person would, when discussing a certain city with a friend, receive information about “people-oriented” activities that are done in groups. Conversely, an “introverted” person may receive information about activities done in solitude.
  • It is now disclosed for the first time a method of providing information-retrieval services. The method includes the steps of: a) monitoring a multi-party voice conversation not directed at the entity doing the monitoring; and b) in accordance with content of the monitored voice conversation, retrieving and presenting information to at least one party of the multi-party voice conversation.
  • Non-limiting examples of information items that may be retrieved include but are not limited to i) a social-network profile; ii) a weather forecast; iii) a traffic forecast; iv) a Wikipedia entry; v) a news article; vi) an online forum entry; vii) a blog entry; viii) a social bookmarking web service entry; ix) a music clip; and x) a film clip.
  • According to some embodiments, the retrieving includes assigning a keyword weight in accordance with an demographic parameter of a given party of the multi-party voice conversation estimated from electronic media of the multi-party conversation, the estimated demographic parameter being selected from the group consisting of: i) an age parameter; ii) a gender parameter; and iii) an ethnicity parameter.
  • According to some embodiments, the retrieving includes selecting or emphasizing an information-source from a plurality of candidate information-sources in accordance with an demographic parameter of a given party of the multi-party voice conversation estimated from electronic media of the multi-party conversation, the estimated demographic parameter being selected from the group consisting of: i) an age parameter; ii) a gender parameter; and iii) an ethnicity parameter.
  • According to some embodiments, the retrieving includes effecting a disambiguation in accordance with an demographic parameter of a given party of the multi-party voice conversation estimated from electronic media of the multi-party conversation, the estimated demographic parameter being selected from the group consisting of: i) an age parameter; ii) a gender parameter; and iii) an ethnicity parameter.
  • According to some embodiments, the assigning includes assigning a keyword weight in accordance with a speech delivery feature of a given party of the multi-party voice conversation estimated from electronic media of the multi-party conversation the speech delivery feature being selected from the group consisting of: i) a loudness parameter; ii) a speech tempo parameter; and iii) an emotional outburst parameter.
  • According to some embodiments, the retrieving includes selecting or emphasizing an information-source from a plurality of candidate information-sources in accordance with a geographic location of a given party of the multi-party voice conversation estimated from electronic media of the multi-party conversation.
  • According to some embodiments, the retrieving includes selecting or emphasizing an information-source from a plurality of candidate information-sources in accordance with an accent feature of at least one given party of the multi-party voice conversation.
  • According to some embodiments, the retrieving includes assigning a keyword weight in accordance with an demographic parameter of a given party of the multi-party voice conversation estimated from electronic media of the multi-party conversation, the estimated demographic parameter being selected from the group consisting of: i) an age parameter; ii) a gender parameter; and iii) an ethnicity parameter.
  • According to some embodiments, the information-presenting for a first set of words extracted from the multi-party conversation includes displacing earlier-presented retrieved information associated with a second earlier set of words extracted from the multi-party conversation in accordance with relative speech delivery parameters of the first and second set extracted words in accordance with a speech delivery feature being selected from the group consisting of: i) a loudness parameter; ii) a speech tempo parameter; and iii) an emotional outburst parameter.
  • According to some embodiments, the multi-party voice conversation is carried out between a plurality of client terminal devices communicating via a wide-area network, and for a given client device of the client device plurality: i) the information retrieval is carried out for incoming content relative to the given client device; and ii) the information presenting is on a display screen of the given client device.
  • It is now disclosed for the first time a method of providing information-retrieval services, the method comprising: a) monitoring a terminal device for incoming media content and outgoing media content of a multi-party conversation; and b) in accordance with the incoming media content, retrieving information over a remote network and presenting the retrieved information on the monitored-terminal device.
  • According to some embodiments, the retrieving includes sending content of the multi-party conversation to an Internet search engine, and the presenting includes presenting search results from the Internet search engine.
  • According to some embodiments, the retrieving includes retrieving at least one of: i) a social-network profile; ii) a weather forecast; iii) a traffic forecast; iv) a Wikipedia entry; v) a news article; vi) an online forum entry; vii) a blog entry; viii) a social bookmarking web service entry; ix) a music clip; and x) a film clip.
  • It is now disclosed for the first time a method of providing information-retrieval services, the method comprising: a) monitoring a given terminal client device for an incoming or outgoing remote call; and b) upon detecting a the incoming or outgoing remote call, sending content of the detected incoming call or outgoing call over a wide-area network to a search engine; and c) presenting search results from the search engine on the monitored terminal device.
  • A Discussion of Various Features of Electronic Media Content
  • According to some embodiments, the at least one feature of the electronic media content includes at least one speech delivery feature i.e. describing how a given set of words is delivered by a given speaker. Exemplary speech delivery features include but are not limited to: accent features (i.e. which may be indicative, for example, of whether or not a person is a native speaker and/or an ethnic origin), speech tempo features (i.e. which may be indicative of a mood or emotional state), voice pitch features (i.e. which may be indicative, for example, of an age of a speaker), voice loudness features, voice inflection features (i.e. which may indicative of a mood including but not limited to angry, confused, excited, joking, sad, sarcastic, serious, etc) and an emotional outburst feature (defined here as a presence of laughing and/or crying).
  • In another example, a speaker speaks some sentences or words loudly, or in an excited state, while other sentences or words are spoken more quietly. According to this example, when retrieving and/or presenting information, different words are given a different “weight” accordance to an assigned importance, and words or phrases spoken “loudly” or in an “excited stated” are given a higher weight than words or phrases spoken quietly.
  • In some embodiments, the multi-party conversation is a video conversation, and the at least one feature of the electronic media content includes a video content feature.
  • Exemplary video content features include but are not limited to:
  • i) visible physical characteristic of a person in an image—including but not limited to indications of a size of a person and/or a person's weight and/or a person's height and/or eye color and/or hair color and/or complexion;
  • ii) feature of objects or person's in the ‘background’—i.e. background object other than a given speaker—for example, including but not limited to room furnishing features and a number of people in the room simultaneously with the speaker;
  • iii) a detected physical movement feature—for example, a body-movement feature including but not limited to a feature indicative of hand gestures or other gestures associated with speaking.
  • According to some embodiments, the at least one feature of the electronic media content includes at least one key words features indicative of a presence and/or absence of key words or key phrases in the spoken content and the information search and/or retrieval is carried out in accordance with the at least one key word feature.
  • In one example, the key words feature is determined by using a speech-to-text converter for extracting text. The extracted text is then analyzed for the presence of key words or phrases. Alternatively or additionally, the electronic media content may be compared with sound clips that include the key words or phrases.
  • According to some embodiments, the at least one feature of the electronic media content includes at least one topic category feature—for example, a feature indicative if a topic of a conversation or portion thereof matches one or more topic categories selected from a plurality of topic categories for example, including but not limited to sports (i.e. a conversation related to sports), romance (i.e. a romantic conversation), business (i.e. a business conversation), current events, etc.
  • According to some embodiments, the at least one feature of the electronic media content includes at least one topic change feature. Exemplary topic change features include but are not limited to a topic change frequency, an impending topic change likelihood, an estimated time until a next topic change, and a time since a previous topic change.
  • Thus in one example, retrieved information is displayed to a user, and when the conversation topic changes, previously-displayed information associated with a ‘previous topic’ is either removed from the user display and replaced with newer information, or is “scrolled down” or displayed less prominently. The rate at which new information (i.e. in accordance with newer topic of the conversation) replaces older information can be adjusted in accordance with a number of factors, for example, the personality of one or more users (for example, with impulsive users, displayed retrieved information is replaced faster), an emotion associated with one or more words, and other factors.
  • In some embodiments, the at least one feature of the electronic media content includes at least one feature ‘demographic property’ indicative of and/or derived from at least one demographic property or estimated demographic property (for example, age, gender, etc) of a person involved in the multi-party conversation (for example, a speaker). For example, two users who are over the age of 30 who speak about “Madonna” may be served a link to music clips from Madonna's song in the 1980s, while teenagers may be served a link to a music clip of one of Madonna's more recently released song.
  • On the other hand, two users with a demographic profile of “devout catholic” may be served an image of the blessed virgin Mary.
  • Exemplary demographic property features include but are not limited to gender features (for example, related to voice pitch or from hair length or any other gender features), educational level features (for example, related to spoken vocabulary words used), household income feature (for example, related to educational level features and/or key words related to expenditures and/or images of room furnishings), a weight feature (for example, related to overweight/underweight—e.g. related to size in an image or breathing rate where obese individuals or more likely to breath at a faster rate), age features (for example, related to an image of a balding head or gray hair and/or vocabulary choice and/or voice pitch), ethnicity (for example, related to skin color and/or accent and/or vocabulary choice). Another feature that, in some embodiments, may indicate a person's demography is the use (or lack of usage) of certain expressions, including but not limited to profanity. For example, people from certain regions or age groups may be more likely to use profanity (or a certain type), while those from other regions or age groups may be less likely to use profanity (or a certain type).
  • Not wishing to be bound by theory, it is noted that there are some situations where it is possible to perform ‘on the fly demographic profiling’ (i.e. obtaining demographic features derived from the media content) obviating the need, for example, for ‘explicitly provided’ demographic data for example, from questionnaires or purchased demographic data. This may allow, for example, targeting of more appropriate or more pertinent information.
  • Demographic property features may be derived from audio and/or video features and/or word content features. Exemplary features from which demographic property features may be derived from include but are not limited to: idiom features (for example, certain ethnic groups or people from certain regions of the United States may tend to use certain idioms), accent features, grammar compliance features (for example, more highly educated people are less likely to make grammatical errors), and sentence length features (for example, more highly educated people are more likely to use longer or more ‘complicated features’).
  • In one example related to “educational level,” people associated with the more highly educated demographic group are more likely to be served content or links to content from the “New York Times” (i.e. a publication with more “complicated” writing and vocabulary”) while a “less educated user” is served content or links to content from the “New York Post” (i.e. a publication with more “complicated” writing and vocabulary”)
  • In some embodiments, the at least one feature of the electronic media content includes at least one ‘physiological feature’ indicative of and/or derived from at least one physiological property or estimated demographic property (for example, age, gender, etc) of a person involved in the multi-party conversation (for example, a speaker)—i.e. as derived from the electronic media content of the multi-party conversation.
  • Exemplary physiological parameters include but are not limited to breathing parameters (for example, breathing rate or changes in breathing rate), a sweat parameters (for example, indicative if a subject is sweating or how much—this may be determined, for example, by analyzing a ‘shininess’ of a subject's skin, a coughing parameter (i.e. a presence or absence of coughing, a loudness or rate of coughing, a regular or irregularity of patterns of coughing), a voice-hoarseness parameter, and a body-twitching parameter (for example, twitching of the entire body due to, for example, chills, or twitching of a given body part—for example, twitching of an eyebrow).
  • In one example, if the user is “excited” when speaking certain key words, this could cause the user to be served information where the key words spoken when excited are given extra “weight” in any information search or retrieval or display.
  • In another example, a person may twitch a body part when nervous or lying. If it is assessed that a user or speaker is “lying” this could also influence search results.
  • In some embodiments, the at least one feature of the electronic media content includes at least one feature ‘background item feature’ indicative of and/or derived from background sounds and/or a background image. It is noted that the background sounds may be transmitted along with the voice of the conversation, and thus may be included within the electronic media content of the conversation.
  • In one example, if a dog is barking in the background and this is detected, news article about recently-passed local ordinances regulating dog-ownership may be displayed.
  • The background sound may be determined or identified, for example, by comparing the electronic media content of the conversation with one or more sound clips that include the sound it is desired to detect. These sound clips may thus serve as a ‘template.’
  • In another example, if a certain furniture item (for example, an ‘expensive’ furniture item) is detected in the background of a video conversation, an item (i.e. good or service) appropriate for the ‘upscale’ income group may be provided.
  • If it is determined that a user is affluent, then when the user mentions “boat” information about yachts may be displayed to the use. Conversely, a less-affluent user that discusses boats in a conversation may be provided information related to ferry cruises or fishing.
  • In yet another example, if an image of a crucifix is detected in the background of a video conversation, a news article about the Pope may be provided, or a link to a Catholic blog may be provided.
  • In some embodiments, the at least one feature of the electronic media content includes at least one feature temporal and/or spatial localization feature indicative of and/or derived from a specific location or time. Thus, in one example, when a Philadelphia-located user (for example, having a phone number in the 215 area code) discussed “sports” he/she is served sports stories (for example, from a newswire) about a recent Phillies or Eagles game, while a Baltimore-located user (for example, having a phone number in the 301 area code) is served sports stories about a recent Orioles or Ravens game.
  • This localization feature may be determined from the electronic media of the multi-party conversation.
  • Alternatively or additionally, this localization feature may be determined from data from an external source for example, a GPS and/or mobile phone triangulation.
  • Another example of an ‘external source’ for localization information is a dialed telephone number. For example, certain area codes or exchanges may be associated (but not always) with certain physical locations.
  • In some embodiments, the at least one feature of the electronic media content includes at least one ‘historical feature’ indicative of electronic media content of a previous multi-party conversation and/or an earlier time period in the conversation—for example, electronic media content who age is at least, for example, 5 minutes, or 30 minutes, or one hour, or 12 hours, or one day, or several times, or a week, or several weeks.
  • In some embodiments, the at least one feature of the electronic media content includes at least one ‘deviation feature.’ Exemplary deviation features of the electronic media content of the multi-party conversation include but are not limited to:
  • a) historical deviation features—i.e. a feature of a given subject or person that changes temporally so that a given time, the behavior of the feature differs from its previously-observed behavior. Thus, in one example, a certain subject or individual usually speaks slowly, and at a later time, this behavior ‘deviates’ when the subject or individual speaks quickly. In another example, a typically soft-spoken individual speaks with a louder voice.
  • In another example, an individual who 3 months ago was observed (e.g. via electronic media content) to be of average or above-average weight is obese. This individual may be served a Wikipedia link about weight-loss. In contrast, a user who is consistently obese may not be served the link in order not to “annoy” the user.
  • In another example, a person who is normally polite may become angry and rude—this may an example of ‘user behavior features.’
  • b) inter-subject deviation features—for example, a ‘well-educated’ person associated with a group of lesser educated persons (for example, speaking together in the same multi-party conversation), or a ‘loud-spoken’ person associated with a group of ‘soft-spoken’ persons, or ‘Southern-accented’ person associated with a group of persons with Boston accents, etc. If distinct conversations are recorded, then historical deviation features associated with a single conversation are referred to as intra-conversation deviation features, while historical deviation features associated with distinct conversations are referred to as inter-conversation deviation features.
  • c) voice-property deviation features—for example, an accent deviation feature, a voice pitch deviation feature, a voice loudness deviation feature, and/or a speech rate deviation feature. This may related to user-group deviation features as well as historical deviation features.
  • d) physiological deviation features—for example, breathing rate deviation features, weight deviation features—this may related to user-group deviation features as well as historical deviation features.
  • e) vocabulary or word-choice deviation features—for example, profanity deviation features indicating use of profanity—this may related to user-group deviation features as well as historical deviation features.
  • f) person-versus-physical-location—for example, a person with a Southern accent whose location is determined to be in a Northern city (e.g. Boston) might be provided with a hotel coupon
  • In some embodiments, the at least one feature of the electronic media content includes at least one ‘person-recognition feature.’ This may be useful, for example, for providing pertinent retrieved information targeted for a specific person. Thus, in one example, the person-recognition feature allows access to a database of person-specific data where the person-recognition feature functions, at least in part, as a ‘key’ of the database. In one example, the ‘data’ may be previously-provided data about the person, for example, demographic data or other data, that is provided in any manner, for example, derived from electronic media of a previous conversation, or in any other manner.
  • In some embodiments, this may obviate the need for users to explicitly provide account information and/or to log in order to receive ‘personalized’ retrieved information. Thus, in one example, the user simply uses the service, and the user's voice is recognized from a voice-print. Once the system recognizes the specific user, it is possible to present retrieved information in accordance with previously-stored data describing preferences of the specific user.
  • Exemplary ‘person-recognition’ features include but are not limited to biometric features (for example, voice-print or facial features) or other person visual appearance features, for example, the presence or absence of a specific article of clothing.
  • It is noted that the possibility of recognizing a person via a ‘person-recognition’ feature does not rule out the possibility of using more ‘conventional’ techniques—for example, logins, passwords, PINs, etc.
  • In some embodiments, the at least one feature of the electronic media content includes at least one ‘person-influence feature.’ Thus, it is recognized that during certain conversations, certain individuals may have more influence than others—for example, in a conversation between a boss and an employee, the boss may have more influence and may function as a so-called gatekeeper. For example, if one party of the conversation makes a certain statement, and this statement appears to influence one or more other parties of the conversation, the ‘influencing statement’ may be assigned more importance. For example, if party ‘A’ says ‘we should spend more money on clothes’ and party ‘B’ responds by saying ‘I agree’ this could imbue party A's statement with additional importance, because it was an ‘influential statement.’
  • In one example, a user has several conversations in one day. The first conversation is with an “influential person” who may be “important” for example, a client/boss to whom the user of a device shows deference. When the conversation with the “important” person begins, previous search results may be cleared from a display screen or scrolled down, and replaced with search results that relate to the conversation with “important” person. Subsequently, the user may speak with a less “influential” person—for example, a child. In this example, during the second subsequent conversation, previously-display retrieved information (for example, retrieved in accordance with the first conversation) is not replaced with information retrieved from the second conversation.
  • In some embodiments, the retrieval and/or presentation of information includes presenting information to a first individual (for example, person ‘A’) in accordance with one or more feature of media content from a second individual different from the first individual (for example, person ‘B’).
  • Apparatus for Retrieving Information
  • Some embodiments of the present invention provide apparatus for retrieving and presenting information. The apparatus may be operative to implement any method or any step of any method disclosed herein. The apparatus may be implemented using any combination of software and/or hardware.
  • The data storage may be implemented using any combination of volatile and/or non-volatile memory, and may reside in a single device or reside on a plurality devices either locally or over a wide area.
  • The aforementioned apparatus may be provided as a single client device (for example, as a handset or laptop or desktop configured to present retrieved information in accordance with the electronic media content). In this example, the ‘data storage’ is volatile and/or non-volatile memory of the client device—for example, where outgoing and incoming content is digitally stored in the client device or a peripheral storage device of the client device.
  • Alternatively or additionally, the apparatus may be distributed on a plurality of devices for example with a ‘client-server’ architecture.
  • These and further embodiments will be apparent from the detailed description and examples that follow.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • While the invention is described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that the invention is not limited to the embodiments or drawings described. It should be understood that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the invention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning “having the potential to’), rather than the mandatory sense (i.e. meaning “must”).
  • FIGS. 1A-1C describe exemplary use scenarios.
  • FIG. 2A-2D, 4, 5A-5C provide flow charts of exemplary techniques for locating, retrieving and/or presenting information related to electronic media content of a multi-party conversation.
  • FIG. 3 describes an exemplary technique for computing one or more features of electronic media content including voice content.
  • FIG. 6 provides a block diagram of an exemplary system for retrieving and presenting information in according with some embodiments of the present invention.
  • FIG. 7 describes an exemplary system for providing electronic media content of a multi-party conversation.
  • FIGS. 8-14 describes exemplary systems for computing various features.
  • DETAILED DESCRIPTION OF EMBODIMENTS
  • The present invention will now be described in terms of specific, example embodiments. It is to be understood that the invention is not limited to the example embodiments disclosed. It should also be understood that not every feature of the presently disclosed apparatus, device and computer-readable code for information retrieval and presentation is necessary to implement the invention as claimed in any particular one of the appended claims. Various elements and features of devices are described to fully enable the invention. It should also be understood that throughout this disclosure, where a process or method is shown or described, the steps of the method may be performed in any order or simultaneously, unless it is clear from the context that one step depends on another being performed first.
  • Embodiments of the present invention relate to a technique for retrieving and displaying information in accordance with the context and/or content of voice content including but not limited to voice content transmitted over a telecommunications network in the context of a multiparty conversation.
  • Certain examples of related to this technique are now explained in terms of exemplary use scenarios. After presentation of the use scenarios, various embodiments of the present invention will be described with reference to flow-charts and block diagrams. It is noted that the use scenarios relate to the specific case where the retrieved information is presented ‘visually’ by the client device. In other examples, the information may be presented by audio means—for example, before, during or following a call or conversation.
  • Also, it is noted that the present use scenarios and many other examples relate to the case where the multi-party conversation is transmitted via a telecommunications network (e.g. circuit switched and/or packet switched). In other example, two or more people are conversing ‘in the same room’ and the conversation is recorded by a single microphones or plurality of microphones (and optionally one or more cameras) deployed ‘locally’ without any need for transmitting content of the conversation via a telecommunications network.
  • Use Scenario 1 Example of FIG. 1A
  • According to this scenario, a first user (i.e. ‘party 1’) of a “car phone” (i.e. a mobile phone mounted in a car, for example, in proximity of an onboard navigator system) a second user (i.e. ‘party 2’) using VOIP software residing on the desktop, such as Skype® software.
  • In this example, at time t=t1, retrieved information is served to party 1 in accordance with content of the conversation. In the example of FIG. 1A, when Party 2 mentions “Millard Fillmore,” information about Millard Fillmore (for example, from a search engine or Wikipedia article) is retrieved and displayed on a client device associated with “party 1”—either the “small screen” of “Party 1”'s car-mounted cellphone or the “larger screen” of party 1's onboard navigator device.
  • It is noted that in the Example of FIG. 1A, there is no need for “Party 1” to provide any search query whatsoever—a conversation is monitored that is not directed to the entity doing the monitoring but rather, the words of Party 1 are directed exclusively to other co-conversationalist(s)—in this case Party 2, and the words of Party 2 are directed exclusively to Party 1.
  • In the example of FIG. 1A, Party 2 “knows” that Party 1 is driving and cannot key in a search query, for example a standard Internet search engine. Thus, when party 1 unexpectedly knows extensive information about Millard Fillmore (i.e. a rather exotic topic), Party 1 succeeds in surprising party 1.
  • It is noted the decision to search on “Millard Fillmore” rather than “school” may be made using natural language processing techniques—for example, language-model based techniques discussed below.
  • Use Scenario 2 Example of FIG. 1B
  • In this example, party 1 is located in Cleveland and party 2 is located in Boston. Party 2 driving in a region of the city where a building was recently on fire. They are discussing the building fire. In the example of FIG. 1B, after the word “fire” is meant, a news story about a fire is displayed on a screen of a user 1. The fire is not a major fire, and at the time, a number of small fires are being handled in different cities throughout the United States. Thus, a certain amount of “disambiguation” is required in order to serve information about the “correct fire.”
  • In the example of FIG. 1B, it is possible to detect the location of party 2 (i.e. Boston) (for example, using a phone number or other technique) and to serve the “correct” local news story to the device of party 1.
  • Use Scenario 3 Example of FIG. 1C
  • In this example, party 1 proposes going go a Yankees game. Party 2 does not mention anything specific about the Yankees. Nevertheless, information about the Yankees (for example, an article about the history of the Yankees, or a news story about their latest game) is retrieved and served to the client terminal device of party 2. This is one example of information being retrieved and served (i.e. to the “cellphone” of party 2) in accordance with “incoming” (i.e. incoming to the “cellphone” client terminal device of Party 2) electronic media content of the multi-party conversation.
  • SOME BRIEF DEFINITIONS
  • As used herein, ‘providing’ of media or media content includes one or more of the following: (i) receiving the media content (for example, at a server cluster comprising at least one cluster, for example, operative to analyze the media content and/or at a proxy); (ii) sending the media content; (iii) generating the media content (for example, carried out at a client device such as a cell phone and/or PC); (iv) intercepting; and (v) handling media content, for example, on the client device, on a proxy or server.
  • As used herein, a ‘multi-party’ voice conversation includes two or more parties, for example, where each party communicated using a respective client device including but not limited to desktop, laptop, cell-phone, and personal digital assistant (PDA).
  • In one example, the electronic media content from the multi-party conversation is provided from a single client device (for example, a single cell phone or desktop). In another example, the media from the multi-party conversation includes content from different client devices.
    Similarly, in one example, the media electronic media content from the multi-party conversation is from a single speaker or a single user. Alternatively, in another example, the media electronic media content from the multi-party conversation is from multiple speakers.
    The electronic media content may be provided as streaming content. For example, streaming audio (and optionally video) content may be intercepted, for example, as transmitted a telecommunications network (for example, a packet switched or circuit switched network). Thus, in some embodiments, the conversation is monitored on an ongoing basis during a certain time period.
    Alternatively or additionally, the electronic media content is pre-stored content, for example, stored in any combination of volatile and non-volatile memory.
  • As used herein, ‘presenting of retrieved information in accordance with a least one feature’ includes one or more of the following:
  • i) configuring a client device (i.e. a screen of a client device) to display the retrieved information such that display of the client device displays the retrieved information in accordance with the feature of media content. This configuring may be accomplished, for example, by displaying the retrieved information using an email client and/or a web browser and/or any other client residing on the client device;
  • ii) sending or directing or targeting an the retrieved information to a client device in accordance with the feature of the media content (for example, from a client to a server, via an email message, an SMS or any other method);
  • DETAILED DESCRIPTION OF BLOCK DIAGRAMS AND FLOW CHARTS
  • FIG. 2A refers to an exemplary technique for retrieving and presenting information in accordance with content of a multi-party conversation.
  • In step S109, electronic digital media content including spoken or voice content (e.g. of a multi-party audio conversation) is provided—e.g. received and/or intercepted and/or handled.
  • In step S111, one or more aspects of electronic voice content (for example, content of multi-party audio conversation are analyzed), or context features are computed. In one example, the words of the conversation are extracted from the voice conversation and the words are analyzed, for example, for a presence of key phrases.
  • In another example, discussed further below, an accent of one or more parties to the conversation is detected. If, for example, one party has a ‘Texas accent’ then this increases a likelihood that the party will receive (for example, on her terminal such as a cellphone or desktop) information from a Texas-based online newspaper or magazine.
  • In another example, the multi-party conversation is a ‘video conversation’ (i.e. voice plus video). In a particular example, if a conversation participant is wearing, for example, a hat or jacket associated with a certain sports team (for example, a particular baseball team), and if that sports team is scheduled an “away game” in a different city, a local weather forecast or traffic forecast associated with the game may be presented either to the “fan” or to a co-conversationalist (for example, using a different client terminal device) who could then “impress” the “fan” with his knowledge.
  • In step S113, one or more operations are carried out to retrieve and present information in accordance with results of the analysis of step S111.
  • The information may be retrieved from any source, including but not limited to online search engines, news services (for example, newswires or “news sites” like www.cnn.com or www.nytimes.com), images or video banks, RSS feeds, weather or traffic forecasts, Youtube® clips, sports statistics, Diugg, social editing sites, music banks, shopping sights such as Amazon, Deel.icio.us and blogs. The information source may be local (for example, the local file system of a desktop computer or PDA) and/or may be remote (for example, a remote “Internet” search engine accessible via the Internet).
  • Although advertisement information may be served together with the retrieved information, in many examples, the retrieved information includes information other than advertisement such as: Wikipedia entries, entries from social networks (such as dating sites, myspace, LinkedIn, etc), news articles, blohs, video or audio clips, or just about any form of information.
  • FIG. 2B presents a flow-chart of a technique where outgoing and/or incoming content is monitored S411, and in accordance with the content, information is retrieved and presented S415. One example of how this is accomplished in accordance with “incoming content” was discussed with reference to FIG. 1C.
  • FIG. 2C provides a flow-chart wherein a terminal device is monitored S411 for an incoming and/or outgoing call with another client terminal device. In the event that an incoming and/or outgoing call or a “connection” is detected S415, information is retrieved in accordance with incoming and/or outgoing content of the multi-party conversation and presented.
  • It is known that a conversation can “flow” and in many conversations, multiple topics are discussed. FIG. 2D provides a flow chart of an exemplary technique where: (i) a first information retrieval and presentation is carried out in accordance with a first “batch” of content or words (S411 and S415); and (ii) when the topic changes or another event occurs S425 (for example, a speaker gets excited about something, raises his or her voice, looks up, repeats a phrase, etc—for example, beyond some threshold), information may be retrieved and presented (i.e. by displacing the previously-retrieved information from the first batch of electronic media content) in accordance with content 8429 of a “second batch” of content or words.
  • In one example, the “earlier” information may be scrolled down. Alternatively or additionally, a “link” or interface element “pointing” to most recent content may be re-configured to, upon user invocation, provide the retrieved information for the “second batch” of content rather than the “first batch” of content, after, for example, the topic has changed and/or the user or conversation-participant has indicated a particular emotion or body language, etc.
  • Obtaining a Demographic Profile of a Conversation Participant from Audio and/or Video Data Relating to a Multi-Party Voice and Optionally Video Conversation (with reference to FIG. 3)
  • FIG. 3 provides exemplary types of features that are computed or assessed S111 when analyzing the electronic media content. These features include but are not limited to speech delivery features S151, video features 8155, conversation topic parameters or features S159, key word(s) feature S161, demographic parameters or features S163, health or physiological parameters of features S167, background features S169, localization parameters or features S175, influence features S175, history features S179, and deviation features S183.
  • Thus, in some embodiments, by analyzing and/or monitoring a multi-party conversation (i.e. voice and optionally video), it is possible to assess (i.e. determine and/or estimate) S163 if a conversation participant is a member of a certain demographic group from a current conversation and/or historical conversations. This information may then be used to more effectively retrieve and present “pertinent” information to the user and/or an associate of the user.
  • Relevant demographic groups include but are not limited to: (i) age; (ii) gender; (iii) educational level; (iv) household income; (v) ethnic group and/or national origin; (vi) medical condition.
  • (i) age/(ii) gender—in some embodiments, the age of a conversation participant is determined in accordance with a number of features, including but not limited to one or more of the following: speech content features and speech delivery features.
      • A) Speech content features—after converting voice content into text, the text may be analyzed for the presence of certain words or phrases. This may be predicated, for example, on the assumption that teenagers use certain slang or idioms unlikely to be used by older members of the population (and vice-versa).
      • B) Speech delivery features—in one example, one or more speech delivery features such as the voice pitch or speech rate (for example, measured in words/minute) of a child and/or adolescent may be different than and speech delivery features of an young adult or elderly person.
  • The skilled artisan is referred to, for example, US 20050286705, incorporated herein by reference in its entirety, which provides examples of certain techniques for extracting certain voice characteristics (e.g. language/dialect/accent, age group, gender).
  • In one example related to video conversations, the user's physical appearance can also be indicative of a user's age and/or gender. For example, gray hair may indicate an older person, facial hair may indicate a male, etc.
  • Once an age or gender of a conversation participant is assessed, it is possible to target retrieved information to the participant (or an associated thereof) accordingly.
  • (ii) educational level—in general, more educated people (i.e. college educated people) tend to use a different set of vocabulary words than less educated people.
  • Information retrieval and/or presentation can be customized using this demographic parameter as well. For example, if it assumed that a conversationalist is college educated people, the n
  • (iv) ethnic group and/or national origin—this feature also may be assessed or determined using one or more of speech content features and speech delivery features.
  • (v) number of children per household this may be observable from background ‘voices’ or noise or from a background image.
  • In one example, if background noise indicative of a present of children is detected in the background (for example, for voice pitch or a baby crying), then “child-oriented” content (for example, a link to a Sesame Street clip) or “parent-oriented content” (for example, an article from Parenting magazine online).
  • Thus, in one example, if two people are discussing movies, each on a respective cell phone, and a baby crying is detected in the background for the first “cell phone” then the first user may be served an article about popular movies for young children.
  • If the conversation then shifts to the topic of vacations, and a dog barking is detected in the background for the second “cell phone” then the second user on the second cell phone may be served an article about popular “pet-friendly” vacation destinations.
  • One example of ‘speech content features’ includes slang or idioms that tend to be used by a particular ethnic group or non-native English speakers whose mother tongue is a specific language (or who come from a certain area of the world).
  • One example of ‘speech delivery features’ relates to a speaker's accent. The skilled artisan is referred, for example, to US 2004/0096050, incorporated herein by reference in its entirety, and to US 2006/0067508, incorporated herein by reference in its entirety.
  • (vi) medical condition—In some embodiments, a user's medical condition (either temporary or chronic) may be assessed in accordance with one or more audio and/or video features.
  • In one example, breathing sounds may be analyzed, and breathing rate may be determined. This may be indicative of whether or not a person has some sort of respiratory ailment, and data from a medical database could be presented to the user.
  • Alternatively, breathing sounds may determine user emotions and/or user interest in a topic.
  • Storing Biometric Data (for Example, Voice-Print Data) and Demographic Data (with Reference to FIG. 4)
  • Sometimes it may be convenient to store data about previous conversations and to associate this data with user account information. Thus, the system may determine from a first conversation (or set of conversations) specific data about a given user with a certain level of certainty.
  • Later, when the user engages in a second multi-party conversation, it may be advantageous to access the earlier-stored demographic data in order to provide to the user pertinent information. Thus, there is no need for the system to re-profile the given user.
  • In another example, the earlier demographic profile may be refined in a later conversation by gathering more ‘input data points.’
  • In some embodiments, the user may be averse to giving ‘account information’—for example, because there is a desire not to inconvenience the user.
  • Nevertheless, it may be advantageous to maintain a ‘voice print’ database which would allow identifying a given user from his or her ‘voice print.’
  • Recognizing an identity of a user from a voice print is known in the art—the skilled artisan is referred to, for example, US 2006/0188076; US 2005/0131706; US 2003/0125944; and US 2002/0152078 each of which is incorporated herein by reference in entirety
  • Thus, in step S211 content (i.e. voice content and optionally video content) if a multi-party conversation is analyzed and one or more biometric parameters or features (for example, voice print or face ‘print’) are computed. The results of the analysis and optionally demographic data are stored and are associated with a user identity and/or voice print data.
  • During a second conversation, the identity of the user is determined and/or the user is associated with the previous conversation using voice print data based on analysis of voice and/or video content S215. At this point, the previous demographic information of the user is available.
  • Optionally, the demographic profile is refined by analyzing the second conversation.
  • Techniques for Retrieving and/or Presenting Information in Accordance with a Multi-Party Conversation
  • FIG. 5A provides a flow chart of an exemplary technique for retrieving and providing information. In the example of FIG. 5A, certain words are given “weights” in the information retrieval according to one or more features of a conversation participant. For example, if it is determined that a given conversation-participant is “dominant” in the conversation (i.e. either from a personality profile or from the interaction between conversation-participants), words spoken by this participant may be given a greater weight in information retrieval or search.
  • In another example, words spoken excitedly and/or with certain body language may be given greater weight.
  • FIG. 5B relates to a technique where a term disambiguation S309 may be carried out in accordance with one or more features of a conversation participant. For example, if it assessed that a person is an avid investor or computer enthusiast, then the word “apple” may be handled by retrieving information related to Apple Computer.
  • Another example relates to the word Madonna—this could refer either to the “Virgin Mary” to a singer. If it is assessed that a conversation participant is an avid catholic, it is more likely the former. If it is assessed that a conversation participant is likes pop-music (for example, from background sounds, age demographics, slang, etc), then Madonna is more likely to refer to the singer.
  • In the exemplary technique of FIG. 5C, words are given greater “weight” or priority in accordance with body language and/or speech delivery features.
  • Discussion of Exemplary Apparatus
  • FIG. 6 provides a block diagram of an exemplary system 100 for retrieval and presentation of information in according with some embodiments of the present invention. The apparatus or system, or any component thereof may reside on any location within a computer network (or single computer device) i.e. on the client terminal device 10, on a server or cluster of servers (not shown), proxy, gateway, etc. Any component may be implemented using any combination of hardware (for example, non-volatile memory, volatile memory, CPUs, computer devices, etc) and/or software—for example, coded in any language including but not limited to machine language, assembler, C, C++, Java, C#, Perl etc.
  • The exemplary system 100 may an input 110 for receiving one or more digitized audio and/or visual waveforms, a speech recognition engine 154 (for converting a live or recorded speech signal to a sequence of words), one or more feature extractor(s) 118, a historical data storage 142, and a historical data storage updating engine 150.
  • Exemplary implementations of each of the aforementioned components are described below.
  • It is appreciated that not every component in FIG. 6 (or any other component described in any figure or in the text of the present disclosure) must be present in every embodiment. Any element in FIG. 6, and any element described in the present disclosure may be implemented as any combination of software and/or hardware. Furthermore, any element in FIG. 6 and any element described in the present disclosure may be either reside on or within a single computer device, or be a distributed over a plurality of devices in a local or wide-area network.
  • Audio and/or Video Input 110
  • In some embodiments, the media input 110 for receiving a digitized waveform is a streaming input. This may be useful for ‘eavesdropping’ on a multi-party conversation in substantially real time. In some embodiments, ‘substantially real time’ refers to refer time with no more than a predetermined time delay, for example, a delay of at most 15 seconds, or at most 1 minute, or at most 5 minutes, or at most 30 minutes, or at most 60 minutes.
  • FIG. 7, a multi-party conversation is conducted using client devices or communication terminals 10 (i.e. N terminals, where N is greater than or equal to two) via the Internet 2. In one example, VOIP software such as Skype® software resides on each terminal 10.
  • In one example, ‘streaming media input’ 110 may reside as a ‘distributed component’ where an input for each party of the multi-party conversation resides on a respective client device 10. Alternatively or additionally, streaming media signal input 110 may reside at least in part ‘in the cloud’ (for example, at one or more servers deployed over wide-area and/or publicly accessible network such as the Internet 20). Thus, according to this implementation, and audio streaming signals and/or video streaming signals of the conversation (and optionally video signals) may be intercepted as they are transmitted over the Internet.
  • In yet another example, input 110 does not necessarily receive or handle a streaming signal. In one example, stored digital audio and/or video waveforms may be provided stored in non-volatile memory (including but not limited to flash, magnetic and optical media) or in volatile memory.
  • It is also noted, with reference to FIG. 7, that the multiparty conversation is not required to be a VOIP conversation. In yet another example, two or more parties are speaking to each other in the same room, and this conversation is recorded (for example, using a single microphone, or more than one microphone). In this example, the system 100 may include a ‘voice-print’ identifier (not shown) for determining an identity of a speaking party (or for distinguishing between speech of more than one person).
  • In yet another example, at least one communication device is a cellular telephone communicating over a cellular network.
  • In yet another example, two or more parties may converse over a ‘traditional’ circuit-switched phone network, and the audio sounds may be streamed to information retrieval and presentation system 100 and/or provided as recording digital media stored in volatile and/or non-volatile memory.
  • Feature Extractor(s) 118
  • FIG. 8 provides a block diagram of several exemplary feature extractor(s)—this is not intended as comprehensive but just to describe a few feature extractor(s). These include: text feature extractor(s) 210 for computing one or more features of the words extracted by speech recognition engine 154 (i.e. features of the words spoken); speech delivery features extractor(s) 220 for determining features of how words are spoken; speaker visual appearance feature extractor(s) 230 (i.e. provided in some embodiments where video as well as audio signals are analyzed); and background features (i.e. relating to background sounds or noises and/or background images).
  • It is noted that the feature extractors may employ any technique for feature extraction of media content known in the art, including but not limited to heuristically techniques and/or ‘statistical AI’ and/or ‘data mining techniques’ and/or ‘machine learning techniques’ where a training set is first provided to a classifier or feature calculation engine. The training may be supervised or unsupervised.
  • Exemplary techniques include but are not limited to tree techniques (for example binary trees), regression techniques, Hidden Markov Models, Neural Networks, and meta-techniques such as boosting or bagging. In specific embodiments, this statistical model is created in accordance with previously collected “training” data. In some embodiments, a scoring system is created. In some embodiments, a voting model for combining more than one technique is used.
  • Appropriate statistical techniques are well known in the art, and are described in a large number of well known sources including, for example, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations by lan H. Witten, Eibe Frank; Morgan Kaufmann, October 1999), the entirety of which is herein incorporated by reference.
  • It is noted that in exemplary embodiments a first feature may be determined in accordance with a different feature, thus facilitating ‘feature combining.’
  • In some embodiments, one or more feature extractors or calculation engine may be operative to effect one or more ‘classification operations’—e.g. determining a gender of a speaker, age range, ethnicity, income, and many other possible classification operations.
  • Each element described in FIG. 8 is described in further detail below.
  • Text Feature Extractor(s) 210
  • FIG. 9 provides a block diagram of exemplary text feature extractors. Thus, certain phrases or expressions spoken by a participant in a conversation may be identified by a phrase detector 260.
  • In one example, when a speaker uses a certain phrase, this may indicate a current desire or preference. For example, if a speaker says “I am quite hungry” this may indicate that a food product add should be sent to the speaker.
  • In another example, a speaker may use certain idioms that indicate general desire or preference rather than a desire at a specific moment. For example, a speaker may make a general statement regarding a preference for American cars, or a professing love for his children, or a distaste for a certain sport or activity. These phrases may be detected and stored as part of a speaker profile, for example, in historical data storage 142.
  • The speaker profile built from detecting these phrases, and optionally performing statistical analysis, may be useful for present or future provisioning of ads to the speaker or to another person associated with the speaker.
  • The phrase detector 260 may include, for example, a database of pre-determined words or phrases or regular expressions.
  • In one example, it is recognized that the computational cost associated with analyzing text to determine the appearance of certain regular phrases (i.e. from a pre-determined set) may increase with the size of the set of phrases.
  • Thus, the exact set of phrases may be determined by various business considerations. In one example, certain sponsors may ‘purchase’ the right to include certain phrases relevant for the sponsor's product in the set of words or regular expressions.
  • In another example, the text feature extractor(s) 210 may be used to provide a demographic profile of a given speaker. For example, usage of certain phrases may be indicative of an ethnic group of a national origin of a given speaker. As will be described below, this may be determined using some sort of statistical model, or some sort of heuristics, or some sort of scoring system.
  • In some embodiments, it may be useful to analyze frequencies of words (or word combinations) in a given segment of conversation using a language model engine 256.
  • For example, it is recognized that more educated people tend to use a different set of vocabulary in their speech than less educated people. Thus, it is possible to prepare pre-determined conversation ‘training sets’ of more educated people and conversation ‘training sets’ of less educated people. For each training set, frequencies of various words may be computed. For each pre-determined conversation ‘training set,’ a language model of word (or word combination) frequencies may be constructed.
  • According to this example, when a segment of conversation is analyzed, it is possible (i.e. for a given speaker or speakers) to compare the frequencies of word usage in the analyzed segment of conversation, and to determine if the frequency table more closely matches the training set of more educated people or less educated people, in order to obtain demographic data (i.e.
  • This principle could be applied using pre-determined ‘training sets’ for native English speakers vs. non-native English speakers, training sets for different ethnic groups, and training sets for people from different regions. This principle may also be used for different conversation ‘types.’ For example, conversations related to computer technologies would tend to provide an elevated frequency for one set of words, romantic conversations would tend to provide an elevated frequency for another set of words, etc. Thus, for different conversation types, or conversation topics, various training sets can be prepared. For a given segment of analyzed conversation, word frequencies (or word combination frequencies) can then be compared with the frequencies of one or more training sets.
  • The same principle described for word frequencies can also be applied to sentence structures—i.e. certain pre-determined demographic groups or conversation type may be associated with certain sentence structures. Thus, in some embodiments, a part of speech (POS) tagger 264 is provided.
  • A Discussion of FIGS. 10-15
  • FIG. 10 provides a block diagram of an exemplary system 220 for detecting one or more speech delivery features. This includes an accent detector 302, tone detector 306, speech tempo detector 310, and speech volume detector 314 (i.e. for detecting loudness or softness.
  • As with any feature detector or computation engine disclosed herein, speech delivery feature extractor 220 or any component thereof may be pre-trained with ‘training data’ from a training set.
  • FIG. 11 provides a block diagram of an exemplary system 230 for detecting speaker appearance features—i.e. for video media content for the case where the multi-party conversation includes both voice and video. This includes a body gestures feature extractor(s) 352, and physical appearance features extractor 356.
  • FIG. 12 provides a block diagram of an exemplary background feature extractor(s) 250. This includes (i) audio background features extractor 402 for extracting various features of background sounds or noise including but not limited to specific sounds or noises such as pet sounds, an indication of background talking, an ambient noise level, a stability of an ambient noise level, etc; and (ii) visual background features extractor 406 which may, for example, identify certain items or features in the room, for example, certain products are brands present in a room.
  • FIG. 13 provides a block diagram of additional feature extractors 118 for determining one or more features of the electronic media content of the conversations. Certain features may be ‘combined features’ or ‘derived features’ derived from one or more other features.
  • This includes a conversation harmony level classifier (for example, determining if a conversation is friendly or unfriendly and to what extent) 452, a deviation feature calculation engine 456, a feature engine for demographic feature(s) 460, a feature engine for physiological status 464, a feature engine for conversation participants relation status 468 (for example, family members, business partners, friends, lovers, spouses, etc), conversation expected length classifier 472 (i.e. if the end of the conversation is expected within a ‘short’ period of time, the information may be carried out differently than for the situation where the end of the conversation is not expected within a short period of time), conversation topic classifier 476, etc.
  • FIG. 14 provides a block diagram of exemplary demographic feature calculators or classifiers. This includes gender classifier 502, ethnic group classifier 506, income level classifier 510, age classifier 514, national/regional origin classifier 518, tastes (for example, clothes and good) classifier 522, educational level classifier 5267, marital status classifier 530, job status classifier 534 (i.e. employed vs. unemployed, manager vs. employee, etc), religion classifier 538 (i.e. Jewish, Christian, Hindu, Muslim, etc).
  • In one example related to retrieval and/or presentation of information in accordance with a demographic profile and related to religion classifier 538, a religion of a person is detected, for example, using key-words, accent and/or speaker location. One example relates to a speaker with who often speaks about Jewish topics, or may often listen to Klezmer music or Yiddish music in the background. In one particular example, if the speaker is discussing a desire to cook dinner with a friend, certain recipes may be presented to the speaker—if the speaker is Jewish, recipes that include pork may be filtered out.
  • In another example, if the Jewish speaker is speaking with a friend about the need to find a spouse, personal ads (i.e. from a dating site) may be biased towards people who indicate an interest in Judiasm.
  • In the description and claims of the present application, each of the verbs, “comprise” “include” and “have”, and conjugates thereof are used to indicate that the object or objects of the verb are not necessarily a complete listing of members, components, elements or parts of the subject or subjects of the verb.
  • All references cited herein are incorporated by reference in their entirety. Citation of a reference does not constitute an admission that the reference is prior art.
  • The articles “a” and “an” are used herein to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. By way of example, “an element” means one element or more than one element.
  • The term “including” is used herein to mean, and is used interchangeably with, the phrase “including but not limited” to.
  • The term “or” is used herein to mean, and is used interchangeably with, the term “and/or,” unless context clearly indicates otherwise.
  • The term “such as” is used herein to mean, and is used interchangeably, with the phrase “such as but not limited to”.
  • The present invention has been described using detailed descriptions of embodiments thereof that are provided by way of example and are not intended to limit the scope of the invention. The described embodiments comprise different features, not all of which are required in all embodiments of the invention. Some embodiments of the present invention utilize only some of the features or possible combinations of the features. Variations of embodiments of the present invention that are described and embodiments of the present invention comprising different combinations of features noted in the described embodiments will occur to persons of the art.

Claims (15)

1) A method of providing information-retrieval services, the method comprising:
a) monitoring a multi-party voice conversation not directed at the entity doing the monitoring; and
b) in accordance with content of said monitored voice conversation, retrieving and presenting information to at least one party of said multi-party voice conversation.
2) The method of claim 1 wherein said retrieving includes retrieving at least one of:
i) a social-network profile;
ii) a weather forecast;
iii) a traffic forecast;
iv) a Wikipedia entry;
v) a news article;
vi) an online forum entry;
vii) a blog entry;
viii) a social bookmarking web service entry;
ix) a music clip; and
x) a film clip.
3) The method of claim 1 wherein said includes assigning a keyword weight in accordance with an demographic parameter of a given party of said multi-party voice conversation estimated from electronic media of said multi-party conversation, said estimated demographic parameter being selected from the group consisting of:
i) an age parameter;
ii) a gender parameter; and
iii) an ethnicity parameter.
4) The method of claim 1 wherein said retrieving includes selecting or emphasizing an information-source from a plurality of candidate information-sources in accordance with an demographic parameter of a given party of said multi-party voice conversation estimated from electronic media of said multi-party conversation, said estimated demographic parameter being selected from the group consisting of:
i) an age parameter;
ii) a gender parameter; and
iii) an ethnicity parameter.
5) The method of claim 1 wherein said retrieving includes effecting a disambiguation in accordance with an demographic parameter of a given party of said multi-party voice conversation estimated from electronic media of said multi-party conversation, said estimated demographic parameter being selected from the group consisting of:
i) an age parameter;
ii) a gender parameter; and
iii) an ethnicity parameter.
6) The method of claim 1 wherein said includes assigning a keyword weight in accordance with a speech delivery feature of a given party of said multi-party voice conversation estimated from electronic media of said multi-party conversation said speech delivery feature being selected from the group consisting of:
i) a loudness parameter;
ii) a speech tempo parameter; and
iii) an emotional outburst parameter.
7) The method of claim 1 wherein said retrieving includes selecting or emphasizing an information-source from a plurality of candidate information-sources in accordance with a geographic location of a given party of said multi-party voice conversation estimated from electronic media of said multi-party conversation.
8) The method of claim 1 wherein said retrieving includes selecting or emphasizing an information-source from a plurality of candidate information-sources in accordance with an accent feature of at least one given party of said multi-party voice conversation.
9) The method of claim 1 wherein said retrieving includes assigning a keyword weight in accordance with an demographic parameter of a given party of said multi-party voice conversation estimated from electronic media of said multi-party conversation, said estimated demographic parameter being selected from the group consisting of:
i) an age parameter;
ii) a gender parameter; and
iii) an ethnicity parameter.
10) The method of claim 1 wherein said information-presenting for a first set of words extracted from said multi-party conversation includes displacing earlier-presented retrieved information associated with a second earlier set of words extracted from said multi-party conversation in accordance with relative speech delivery parameters of said first and second set extracted words in accordance with a speech delivery feature being selected from the group consisting of:
i) a loudness parameter;
ii) a speech tempo parameter; and
iii) an emotional outburst parameter.
11) The method of claim 1 wherein said multi-party voice conversation is carried out between a plurality of client terminal devices communicating via a wide-area network, and for a given client device of said client device plurality:
i) said information retrieval is carried out for incoming content relative to said given client device; and
ii) said information presenting is on a display screen of said given client device.
12) A method of providing information-retrieval services, the method comprising:
a) monitoring a terminal device for incoming media content and outgoing media content of a multi-party conversation; and
b) in accordance with said incoming media content, retrieving information over a remote network and presenting said retrieved information on said monitored-terminal device.
13) The method of claim 1 wherein said retrieving includes sending content of said multi-party conversation to an Internet search engine, and said presenting includes presenting search results from said Internet search engine.
14) The method of claim 12 wherein said retrieving includes retrieving at least one of:
i) a social-network profile;
ii) a weather forecast;
iii) a traffic forecast;
iv) a Wikipedia entry;
v) a news article;
vi) an online forum entry;
vii) a blog entry;
viii) a social bookmarking web service entry;
ix) a music clip; and
x) a film clip.
15) A method of providing information-retrieval services, the method comprising:
a) monitoring a given terminal client device for an incoming or outgoing remote call; and
b) upon detecting a said incoming or outgoing remote call, sending content of said detected incoming call or outgoing call over a wide-area network to a search engine; and
c) presenting search results from said search engine on said monitored terminal device.
US11/882,479 2006-08-03 2007-08-02 Automatic retrieval and presentation of information relevant to the context of a user's conversation Abandoned US20080240379A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/882,479 US20080240379A1 (en) 2006-08-03 2007-08-02 Automatic retrieval and presentation of information relevant to the context of a user's conversation

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US82127206P 2006-08-03 2006-08-03
US82432306P 2006-09-01 2006-09-01
US86277106P 2006-10-25 2006-10-25
US11/882,479 US20080240379A1 (en) 2006-08-03 2007-08-02 Automatic retrieval and presentation of information relevant to the context of a user's conversation

Publications (1)

Publication Number Publication Date
US20080240379A1 true US20080240379A1 (en) 2008-10-02

Family

ID=39794364

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/882,479 Abandoned US20080240379A1 (en) 2006-08-03 2007-08-02 Automatic retrieval and presentation of information relevant to the context of a user's conversation

Country Status (1)

Country Link
US (1) US20080240379A1 (en)

Cited By (90)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100042413A1 (en) * 2008-08-12 2010-02-18 Ditech Networks, Inc. Voice Activated Application Service Architecture and Delivery
US20100086107A1 (en) * 2008-09-26 2010-04-08 Tzruya Yoav M Voice-Recognition Based Advertising
US20100246784A1 (en) * 2009-03-27 2010-09-30 Verizon Patent And Licensing Inc. Conversation support
US20110154050A1 (en) * 2009-12-22 2011-06-23 Pitney Bowes Inc. System and method for selectively providing cryptographic capabilities based on location
US20110166937A1 (en) * 2010-01-05 2011-07-07 Searete Llc Media output with micro-impulse radar feedback of physiological response
US20110166940A1 (en) * 2010-01-05 2011-07-07 Searete Llc Micro-impulse radar detection of a human demographic and delivery of targeted media content
US20110276895A1 (en) * 2010-05-04 2011-11-10 Qwest Communications International Inc. Conversation Capture
US8219628B2 (en) 2010-07-23 2012-07-10 International Business Machines Corporation Method to change instant messaging status based on text entered during conversation
US20130124189A1 (en) * 2011-11-10 2013-05-16 At&T Intellectual Property I, Lp Network-based background expert
US8494851B2 (en) 2010-09-13 2013-07-23 International Business Machines Corporation System and method for contextual social network communications during phone conversation
US8577671B1 (en) * 2012-07-20 2013-11-05 Veveo, Inc. Method of and system for using conversation state information in a conversational interaction system
US20140019132A1 (en) * 2012-07-12 2014-01-16 Sony Corporation Information processing apparatus, information processing method, display control apparatus, and display control method
US8713022B2 (en) * 2012-05-31 2014-04-29 International Business Machines Corporation Community profiling for social media
US20140164532A1 (en) * 2012-12-11 2014-06-12 Nuance Communications, Inc. Systems and methods for virtual agent participation in multiparty conversation
US20140164509A1 (en) * 2012-12-11 2014-06-12 Nuance Communications, Inc. Systems and methods for providing input to virtual agent
US20140164312A1 (en) * 2012-12-11 2014-06-12 Nuance Communications, Inc. Systems and methods for informing virtual agent recommendation
US20140294167A1 (en) * 2013-03-26 2014-10-02 Lg Uplus Corp. Terminal, server, and method for actively providing information based on communication contents
US8862587B1 (en) * 2010-08-05 2014-10-14 Ca, Inc. Server and system for automatic profiling based in part on user names
US8884813B2 (en) 2010-01-05 2014-11-11 The Invention Science Fund I, Llc Surveillance of stress conditions of persons using micro-impulse radar
US20150046147A1 (en) * 2008-04-15 2015-02-12 Facebook, Inc. Translation system information extraction
US9021033B2 (en) 2010-07-23 2015-04-28 International Business Machines Corporation Method to change instant messaging status based on text entered during conversation
US9019149B2 (en) 2010-01-05 2015-04-28 The Invention Science Fund I, Llc Method and apparatus for measuring the motion of a person
US9024814B2 (en) 2010-01-05 2015-05-05 The Invention Science Fund I, Llc Tracking identities of persons using micro-impulse radar
US9043939B2 (en) 2012-10-26 2015-05-26 International Business Machines Corporation Accessing information during a teleconferencing event
US9047606B2 (en) 2011-09-29 2015-06-02 Hewlett-Packard Development Company, L.P. Social and contextual recommendations
US20150161236A1 (en) * 2013-12-05 2015-06-11 Lenovo (Singapore) Pte. Ltd. Recording context for conducting searches
US9069067B2 (en) 2010-09-17 2015-06-30 The Invention Science Fund I, Llc Control of an electronic apparatus using micro-impulse radar
CN104813311A (en) * 2012-12-11 2015-07-29 纽昂斯通讯公司 System and methods for virtual agent recommendation for multiple persons
US9148394B2 (en) 2012-12-11 2015-09-29 Nuance Communications, Inc. Systems and methods for user interface presentation of virtual agent
US20150293903A1 (en) * 2012-10-31 2015-10-15 Lancaster University Business Enterprises Limited Text analysis
US9208142B2 (en) 2013-05-20 2015-12-08 International Business Machines Corporation Analyzing documents corresponding to demographics
US20160042749A1 (en) * 2014-08-07 2016-02-11 Sharp Kabushiki Kaisha Sound output device, network system, and sound output method
US9262175B2 (en) 2012-12-11 2016-02-16 Nuance Communications, Inc. Systems and methods for storing record of virtual agent interaction
US9276802B2 (en) 2012-12-11 2016-03-01 Nuance Communications, Inc. Systems and methods for sharing information between virtual agents
WO2015184196A3 (en) * 2014-05-28 2016-03-17 Aliphcom Speech summary and action item generation
US9356790B2 (en) 2010-05-04 2016-05-31 Qwest Communications International Inc. Multi-user integrated task list
US20160154898A1 (en) * 2014-12-02 2016-06-02 International Business Machines Corporation Topic presentation method, device, and computer program
US9390706B2 (en) * 2014-06-19 2016-07-12 Mattersight Corporation Personality-based intelligent personal assistant system and methods
US9449184B2 (en) 2011-02-14 2016-09-20 International Business Machines Corporation Time based access control in social software
US9465833B2 (en) 2012-07-31 2016-10-11 Veveo, Inc. Disambiguating user intent in conversational interaction system for large corpus information retrieval
US9524295B2 (en) 2006-10-26 2016-12-20 Facebook, Inc. Simultaneous translation of open domain lectures and speeches
US9559869B2 (en) 2010-05-04 2017-01-31 Qwest Communications International Inc. Video call handling
WO2017030963A1 (en) * 2015-08-19 2017-02-23 Google Inc. Incorporating user content within a communication session interface
US9645996B1 (en) * 2010-03-25 2017-05-09 Open Invention Network Llc Method and device for automatically generating a tag from a conversation in a social networking website
EP2351022A4 (en) * 2008-11-21 2017-05-10 Telefonaktiebolaget LM Ericsson (publ) Method, a media server, computer program and computer program product for combining a speech related to a voice over ip voice communication session between user equipments, in combination with web based applications
US9679300B2 (en) 2012-12-11 2017-06-13 Nuance Communications, Inc. Systems and methods for virtual agent recommendation for multiple persons
US20170178033A1 (en) * 2013-02-14 2017-06-22 24/7 Customer, Inc. Categorization of user interactions into predefined hierarchical categories
US9799328B2 (en) 2012-08-03 2017-10-24 Veveo, Inc. Method for using pauses detected in speech input to assist in interpreting the input during conversational interaction for information retrieval
US9852136B2 (en) 2014-12-23 2017-12-26 Rovi Guides, Inc. Systems and methods for determining whether a negation statement applies to a current or past query
US9854049B2 (en) 2015-01-30 2017-12-26 Rovi Guides, Inc. Systems and methods for resolving ambiguous terms in social chatter based on a user profile
US20180018972A1 (en) * 2016-07-13 2018-01-18 Alcatel-Lucent Usa Inc. Integrating third-party programs with messaging systems
US9934785B1 (en) * 2016-11-30 2018-04-03 Spotify Ab Identification of taste attributes from an audio signal
US10031968B2 (en) 2012-10-11 2018-07-24 Veveo, Inc. Method for adaptive conversation state management with filtering operators applied dynamically as part of a conversational interface
US20180276281A1 (en) * 2015-10-13 2018-09-27 Sony Corporation Information processing system, information processing method, and storage medium
US10121493B2 (en) 2013-05-07 2018-11-06 Veveo, Inc. Method of and system for real time feedback in an incremental speech input interface
US10154071B2 (en) 2015-07-29 2018-12-11 International Business Machines Corporation Group chat with dynamic background images and content from social media
US20190279656A1 (en) * 2018-03-12 2019-09-12 Fuji Xerox Co., Ltd. Information presentation apparatus, information presentation method, and non-transitory computer readable medium
EP3557501A1 (en) * 2018-04-20 2019-10-23 Facebook, Inc. Assisting users with personalized and contextual communication content
US20190392926A1 (en) * 2018-06-22 2019-12-26 5 Health Inc. Methods and systems for providing and organizing medical information
US10534623B2 (en) 2013-12-16 2020-01-14 Nuance Communications, Inc. Systems and methods for providing a virtual assistant
US10761866B2 (en) 2018-04-20 2020-09-01 Facebook, Inc. Intent identification for agent matching by assistant systems
US10896295B1 (en) 2018-08-21 2021-01-19 Facebook, Inc. Providing additional information for identified named-entities for assistant systems
US10949616B1 (en) 2018-08-21 2021-03-16 Facebook, Inc. Automatically detecting and storing entity information for assistant systems
US10978056B1 (en) 2018-04-20 2021-04-13 Facebook, Inc. Grammaticality classification for natural language generation in assistant systems
US20210111915A1 (en) * 2012-10-22 2021-04-15 International Business Machines Corporation Guiding a presenter in a collaborative session on word choice
US20210117681A1 (en) 2019-10-18 2021-04-22 Facebook, Inc. Multimodal Dialog State Tracking and Action Prediction for Assistant Systems
US10999335B2 (en) 2012-08-10 2021-05-04 Nuance Communications, Inc. Virtual agent communication for electronic device
US11094180B1 (en) * 2018-04-09 2021-08-17 State Farm Mutual Automobile Insurance Company Sensing peripheral heuristic evidence, reinforcement, and engagement system
US11115410B1 (en) 2018-04-20 2021-09-07 Facebook, Inc. Secure authentication for assistant systems
US11128720B1 (en) 2010-03-25 2021-09-21 Open Invention Network Llc Method and system for searching network resources to locate content
USD931294S1 (en) 2018-06-22 2021-09-21 5 Health Inc. Display screen or portion thereof with a graphical user interface
US11159767B1 (en) 2020-04-07 2021-10-26 Facebook Technologies, Llc Proactive in-call content recommendations for assistant systems
US11222185B2 (en) 2006-10-26 2022-01-11 Meta Platforms, Inc. Lexicon development via shared translation database
US11307880B2 (en) 2018-04-20 2022-04-19 Meta Platforms, Inc. Assisting users with personalized and contextual communication content
US11334900B1 (en) * 2021-09-07 2022-05-17 Instreamatic, Inc. Voice-based collection of statistical data
US11367527B1 (en) 2019-08-19 2022-06-21 State Farm Mutual Automobile Insurance Company Senior living engagement and care support platforms
US11423754B1 (en) 2014-10-07 2022-08-23 State Farm Mutual Automobile Insurance Company Systems and methods for improved assisted or independent living environments
US11442992B1 (en) 2019-06-28 2022-09-13 Meta Platforms Technologies, Llc Conversational reasoning with knowledge graph paths for assistant systems
US11562744B1 (en) 2020-02-13 2023-01-24 Meta Platforms Technologies, Llc Stylizing text-to-speech (TTS) voice response for assistant systems
US11563706B2 (en) 2020-12-29 2023-01-24 Meta Platforms, Inc. Generating context-aware rendering of media contents for assistant systems
US11567788B1 (en) 2019-10-18 2023-01-31 Meta Platforms, Inc. Generating proactive reminders for assistant systems
US11658835B2 (en) 2020-06-29 2023-05-23 Meta Platforms, Inc. Using a single request for multi-person calling in assistant systems
US11657094B2 (en) 2019-06-28 2023-05-23 Meta Platforms Technologies, Llc Memory grounded conversational reasoning and question answering for assistant systems
US11688516B2 (en) 2021-01-19 2023-06-27 State Farm Mutual Automobile Insurance Company Alert systems for senior living engagement and care support platforms
US11715042B1 (en) 2018-04-20 2023-08-01 Meta Platforms Technologies, Llc Interpretability of deep reinforcement learning models in assistant systems
US11762494B2 (en) * 2013-03-15 2023-09-19 Amazon Technologies, Inc. Systems and methods for identifying users of devices and customizing devices to users
US11809480B1 (en) 2020-12-31 2023-11-07 Meta Platforms, Inc. Generating dynamic knowledge graph of media contents for assistant systems
US11861315B2 (en) 2021-04-21 2024-01-02 Meta Platforms, Inc. Continuous learning for natural-language understanding models for assistant systems
US11886473B2 (en) 2018-04-20 2024-01-30 Meta Platforms, Inc. Intent identification for agent matching by assistant systems
US11894129B1 (en) 2019-07-03 2024-02-06 State Farm Mutual Automobile Insurance Company Senior living care coordination platforms

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020062481A1 (en) * 2000-02-25 2002-05-23 Malcolm Slaney Method and system for selecting advertisements
US20020152078A1 (en) * 1999-10-25 2002-10-17 Matt Yuschik Voiceprint identification system
US20030125944A1 (en) * 1999-07-12 2003-07-03 Robert C. Wohlsen Method and system for identifying a user by voice
US20030195801A1 (en) * 2000-10-12 2003-10-16 Tetsuo Takakura System and method for providing advertisement data with conversation data to users
US20040096050A1 (en) * 2002-11-19 2004-05-20 Das Sharmistha Sarkar Accent-based matching of a communicant with a call-center agent
US20050234779A1 (en) * 2003-11-17 2005-10-20 Leo Chiu System for dynamic AD selection and placement within a voice application accessed through an electronic information pace
US20050286705A1 (en) * 2004-06-16 2005-12-29 Matsushita Electric Industrial Co., Ltd. Intelligent call routing and call supervision method for call centers
US20060067508A1 (en) * 2004-09-30 2006-03-30 International Business Machines Corporation Methods and apparatus for processing foreign accent/language communications
US20060167747A1 (en) * 2005-01-25 2006-07-27 Microsoft Corporation Content-targeted advertising for interactive computer-based applications
US20060188855A1 (en) * 2005-01-26 2006-08-24 Aruze Corporation Gaming system and typing game apparatus
US20060188076A1 (en) * 2005-02-24 2006-08-24 Isenberg Neil E Technique for verifying identities of users of a communications service by voiceprints

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030125944A1 (en) * 1999-07-12 2003-07-03 Robert C. Wohlsen Method and system for identifying a user by voice
US20020152078A1 (en) * 1999-10-25 2002-10-17 Matt Yuschik Voiceprint identification system
US20020062481A1 (en) * 2000-02-25 2002-05-23 Malcolm Slaney Method and system for selecting advertisements
US20030195801A1 (en) * 2000-10-12 2003-10-16 Tetsuo Takakura System and method for providing advertisement data with conversation data to users
US20040096050A1 (en) * 2002-11-19 2004-05-20 Das Sharmistha Sarkar Accent-based matching of a communicant with a call-center agent
US20050234779A1 (en) * 2003-11-17 2005-10-20 Leo Chiu System for dynamic AD selection and placement within a voice application accessed through an electronic information pace
US20050286705A1 (en) * 2004-06-16 2005-12-29 Matsushita Electric Industrial Co., Ltd. Intelligent call routing and call supervision method for call centers
US20060067508A1 (en) * 2004-09-30 2006-03-30 International Business Machines Corporation Methods and apparatus for processing foreign accent/language communications
US20060167747A1 (en) * 2005-01-25 2006-07-27 Microsoft Corporation Content-targeted advertising for interactive computer-based applications
US20060188855A1 (en) * 2005-01-26 2006-08-24 Aruze Corporation Gaming system and typing game apparatus
US20060188076A1 (en) * 2005-02-24 2006-08-24 Isenberg Neil E Technique for verifying identities of users of a communications service by voiceprints

Cited By (189)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11222185B2 (en) 2006-10-26 2022-01-11 Meta Platforms, Inc. Lexicon development via shared translation database
US9524295B2 (en) 2006-10-26 2016-12-20 Facebook, Inc. Simultaneous translation of open domain lectures and speeches
US9830318B2 (en) 2006-10-26 2017-11-28 Facebook, Inc. Simultaneous translation of open domain lectures and speeches
US20150046147A1 (en) * 2008-04-15 2015-02-12 Facebook, Inc. Translation system information extraction
US9753918B2 (en) 2008-04-15 2017-09-05 Facebook, Inc. Lexicon development via shared translation database
US20100042413A1 (en) * 2008-08-12 2010-02-18 Ditech Networks, Inc. Voice Activated Application Service Architecture and Delivery
US8301452B2 (en) * 2008-08-12 2012-10-30 Ditech Networks, Inc. Voice activated application service architecture and delivery
US20100086107A1 (en) * 2008-09-26 2010-04-08 Tzruya Yoav M Voice-Recognition Based Advertising
EP2351022A4 (en) * 2008-11-21 2017-05-10 Telefonaktiebolaget LM Ericsson (publ) Method, a media server, computer program and computer program product for combining a speech related to a voice over ip voice communication session between user equipments, in combination with web based applications
US8537980B2 (en) * 2009-03-27 2013-09-17 Verizon Patent And Licensing Inc. Conversation support
CN102362471A (en) * 2009-03-27 2012-02-22 维里逊专利及许可公司 Conversation support
US20100246784A1 (en) * 2009-03-27 2010-09-30 Verizon Patent And Licensing Inc. Conversation support
US20110154050A1 (en) * 2009-12-22 2011-06-23 Pitney Bowes Inc. System and method for selectively providing cryptographic capabilities based on location
US9024814B2 (en) 2010-01-05 2015-05-05 The Invention Science Fund I, Llc Tracking identities of persons using micro-impulse radar
US20110166937A1 (en) * 2010-01-05 2011-07-07 Searete Llc Media output with micro-impulse radar feedback of physiological response
WO2011084884A1 (en) * 2010-01-05 2011-07-14 Searete Llc Micro-impulse radar detection of a human demographic and delivery of targeted media content
US20110166940A1 (en) * 2010-01-05 2011-07-07 Searete Llc Micro-impulse radar detection of a human demographic and delivery of targeted media content
US9019149B2 (en) 2010-01-05 2015-04-28 The Invention Science Fund I, Llc Method and apparatus for measuring the motion of a person
US8884813B2 (en) 2010-01-05 2014-11-11 The Invention Science Fund I, Llc Surveillance of stress conditions of persons using micro-impulse radar
US9645996B1 (en) * 2010-03-25 2017-05-09 Open Invention Network Llc Method and device for automatically generating a tag from a conversation in a social networking website
US10621681B1 (en) 2010-03-25 2020-04-14 Open Invention Network Llc Method and device for automatically generating tag from a conversation in a social networking website
US11128720B1 (en) 2010-03-25 2021-09-21 Open Invention Network Llc Method and system for searching network resources to locate content
US9356790B2 (en) 2010-05-04 2016-05-31 Qwest Communications International Inc. Multi-user integrated task list
US9559869B2 (en) 2010-05-04 2017-01-31 Qwest Communications International Inc. Video call handling
US20110276895A1 (en) * 2010-05-04 2011-11-10 Qwest Communications International Inc. Conversation Capture
US9501802B2 (en) * 2010-05-04 2016-11-22 Qwest Communications International Inc. Conversation capture
US9021033B2 (en) 2010-07-23 2015-04-28 International Business Machines Corporation Method to change instant messaging status based on text entered during conversation
US8219628B2 (en) 2010-07-23 2012-07-10 International Business Machines Corporation Method to change instant messaging status based on text entered during conversation
US8862587B1 (en) * 2010-08-05 2014-10-14 Ca, Inc. Server and system for automatic profiling based in part on user names
US8494851B2 (en) 2010-09-13 2013-07-23 International Business Machines Corporation System and method for contextual social network communications during phone conversation
US9069067B2 (en) 2010-09-17 2015-06-30 The Invention Science Fund I, Llc Control of an electronic apparatus using micro-impulse radar
US9449184B2 (en) 2011-02-14 2016-09-20 International Business Machines Corporation Time based access control in social software
US9047606B2 (en) 2011-09-29 2015-06-02 Hewlett-Packard Development Company, L.P. Social and contextual recommendations
US20130124189A1 (en) * 2011-11-10 2013-05-16 At&T Intellectual Property I, Lp Network-based background expert
US9711137B2 (en) * 2011-11-10 2017-07-18 At&T Intellectual Property I, Lp Network-based background expert
US10811001B2 (en) 2011-11-10 2020-10-20 At&T Intellectual Property I, L.P. Network-based background expert
US8738628B2 (en) * 2012-05-31 2014-05-27 International Business Machines Corporation Community profiling for social media
US8713022B2 (en) * 2012-05-31 2014-04-29 International Business Machines Corporation Community profiling for social media
US9666211B2 (en) * 2012-07-12 2017-05-30 Sony Corporation Information processing apparatus, information processing method, display control apparatus, and display control method
US20140019132A1 (en) * 2012-07-12 2014-01-16 Sony Corporation Information processing apparatus, information processing method, display control apparatus, and display control method
US9477643B2 (en) * 2012-07-20 2016-10-25 Veveo, Inc. Method of and system for using conversation state information in a conversational interaction system
US9424233B2 (en) 2012-07-20 2016-08-23 Veveo, Inc. Method of and system for inferring user intent in search input in a conversational interaction system
US20140163965A1 (en) * 2012-07-20 2014-06-12 Veveo, Inc. Method of and System for Using Conversation State Information in a Conversational Interaction System
US20140058724A1 (en) * 2012-07-20 2014-02-27 Veveo, Inc. Method of and System for Using Conversation State Information in a Conversational Interaction System
US8954318B2 (en) * 2012-07-20 2015-02-10 Veveo, Inc. Method of and system for using conversation state information in a conversational interaction system
US9183183B2 (en) 2012-07-20 2015-11-10 Veveo, Inc. Method of and system for inferring user intent in search input in a conversational interaction system
US8577671B1 (en) * 2012-07-20 2013-11-05 Veveo, Inc. Method of and system for using conversation state information in a conversational interaction system
US9465833B2 (en) 2012-07-31 2016-10-11 Veveo, Inc. Disambiguating user intent in conversational interaction system for large corpus information retrieval
US9799328B2 (en) 2012-08-03 2017-10-24 Veveo, Inc. Method for using pauses detected in speech input to assist in interpreting the input during conversational interaction for information retrieval
US11388208B2 (en) 2012-08-10 2022-07-12 Nuance Communications, Inc. Virtual agent communication for electronic device
US10999335B2 (en) 2012-08-10 2021-05-04 Nuance Communications, Inc. Virtual agent communication for electronic device
US11544310B2 (en) 2012-10-11 2023-01-03 Veveo, Inc. Method for adaptive conversation state management with filtering operators applied dynamically as part of a conversational interface
US10031968B2 (en) 2012-10-11 2018-07-24 Veveo, Inc. Method for adaptive conversation state management with filtering operators applied dynamically as part of a conversational interface
US20210111915A1 (en) * 2012-10-22 2021-04-15 International Business Machines Corporation Guiding a presenter in a collaborative session on word choice
US9043939B2 (en) 2012-10-26 2015-05-26 International Business Machines Corporation Accessing information during a teleconferencing event
US9122884B2 (en) 2012-10-26 2015-09-01 International Business Machines Corporation Accessing information during a teleconferencing event
US20150293903A1 (en) * 2012-10-31 2015-10-15 Lancaster University Business Enterprises Limited Text analysis
US9560089B2 (en) * 2012-12-11 2017-01-31 Nuance Communications, Inc. Systems and methods for providing input to virtual agent
US9276802B2 (en) 2012-12-11 2016-03-01 Nuance Communications, Inc. Systems and methods for sharing information between virtual agents
CN104813311A (en) * 2012-12-11 2015-07-29 纽昂斯通讯公司 System and methods for virtual agent recommendation for multiple persons
US20140164532A1 (en) * 2012-12-11 2014-06-12 Nuance Communications, Inc. Systems and methods for virtual agent participation in multiparty conversation
US9148394B2 (en) 2012-12-11 2015-09-29 Nuance Communications, Inc. Systems and methods for user interface presentation of virtual agent
US9659298B2 (en) * 2012-12-11 2017-05-23 Nuance Communications, Inc. Systems and methods for informing virtual agent recommendation
US20140164509A1 (en) * 2012-12-11 2014-06-12 Nuance Communications, Inc. Systems and methods for providing input to virtual agent
US9679300B2 (en) 2012-12-11 2017-06-13 Nuance Communications, Inc. Systems and methods for virtual agent recommendation for multiple persons
US9262175B2 (en) 2012-12-11 2016-02-16 Nuance Communications, Inc. Systems and methods for storing record of virtual agent interaction
US20140164312A1 (en) * 2012-12-11 2014-06-12 Nuance Communications, Inc. Systems and methods for informing virtual agent recommendation
US10311377B2 (en) * 2013-02-14 2019-06-04 [24]7.ai, Inc. Categorization of user interactions into predefined hierarchical categories
US20170178033A1 (en) * 2013-02-14 2017-06-22 24/7 Customer, Inc. Categorization of user interactions into predefined hierarchical categories
US11762494B2 (en) * 2013-03-15 2023-09-19 Amazon Technologies, Inc. Systems and methods for identifying users of devices and customizing devices to users
US20140294167A1 (en) * 2013-03-26 2014-10-02 Lg Uplus Corp. Terminal, server, and method for actively providing information based on communication contents
US9106757B2 (en) * 2013-03-26 2015-08-11 Lg Uplus Corp. Terminal, server, and method for actively providing information based on communication contents
US10121493B2 (en) 2013-05-07 2018-11-06 Veveo, Inc. Method of and system for real time feedback in an incremental speech input interface
US9208142B2 (en) 2013-05-20 2015-12-08 International Business Machines Corporation Analyzing documents corresponding to demographics
US20150161236A1 (en) * 2013-12-05 2015-06-11 Lenovo (Singapore) Pte. Ltd. Recording context for conducting searches
US10534623B2 (en) 2013-12-16 2020-01-14 Nuance Communications, Inc. Systems and methods for providing a virtual assistant
WO2015184196A3 (en) * 2014-05-28 2016-03-17 Aliphcom Speech summary and action item generation
US10748534B2 (en) 2014-06-19 2020-08-18 Mattersight Corporation Personality-based chatbot and methods including non-text input
US9390706B2 (en) * 2014-06-19 2016-07-12 Mattersight Corporation Personality-based intelligent personal assistant system and methods
US20160042749A1 (en) * 2014-08-07 2016-02-11 Sharp Kabushiki Kaisha Sound output device, network system, and sound output method
US9653097B2 (en) * 2014-08-07 2017-05-16 Sharp Kabushiki Kaisha Sound output device, network system, and sound output method
US11423754B1 (en) 2014-10-07 2022-08-23 State Farm Mutual Automobile Insurance Company Systems and methods for improved assisted or independent living environments
US20170109408A1 (en) * 2014-12-02 2017-04-20 International Business Machines Corporation Topic presentation method, device, and computer program
US20160154898A1 (en) * 2014-12-02 2016-06-02 International Business Machines Corporation Topic presentation method, device, and computer program
US9852136B2 (en) 2014-12-23 2017-12-26 Rovi Guides, Inc. Systems and methods for determining whether a negation statement applies to a current or past query
US10341447B2 (en) 2015-01-30 2019-07-02 Rovi Guides, Inc. Systems and methods for resolving ambiguous terms in social chatter based on a user profile
US9854049B2 (en) 2015-01-30 2017-12-26 Rovi Guides, Inc. Systems and methods for resolving ambiguous terms in social chatter based on a user profile
US10154071B2 (en) 2015-07-29 2018-12-11 International Business Machines Corporation Group chat with dynamic background images and content from social media
US10007410B2 (en) 2015-08-19 2018-06-26 Google Llc Incorporating user content within a communication session interface
US10732806B2 (en) 2015-08-19 2020-08-04 Google Llc Incorporating user content within a communication session interface
WO2017030963A1 (en) * 2015-08-19 2017-02-23 Google Inc. Incorporating user content within a communication session interface
US10754864B2 (en) * 2015-10-13 2020-08-25 Sony Corporation Information processing system and information processing method to specify persons with good affinity toward each other
US20180276281A1 (en) * 2015-10-13 2018-09-27 Sony Corporation Information processing system, information processing method, and storage medium
US11341148B2 (en) * 2015-10-13 2022-05-24 Sony Corporation Information processing system and information processing method to specify persons with good affinity toward each other
US10581769B2 (en) * 2016-07-13 2020-03-03 Nokia Of America Corporation Integrating third-party programs with messaging systems
US20180018972A1 (en) * 2016-07-13 2018-01-18 Alcatel-Lucent Usa Inc. Integrating third-party programs with messaging systems
US9934785B1 (en) * 2016-11-30 2018-04-03 Spotify Ab Identification of taste attributes from an audio signal
US20180182394A1 (en) * 2016-11-30 2018-06-28 Spotify Ab Identification of taste attributes from an audio signal
US10891948B2 (en) 2016-11-30 2021-01-12 Spotify Ab Identification of taste attributes from an audio signal
US10580434B2 (en) * 2018-03-12 2020-03-03 Fuji Xerox Co., Ltd. Information presentation apparatus, information presentation method, and non-transitory computer readable medium
US20190279656A1 (en) * 2018-03-12 2019-09-12 Fuji Xerox Co., Ltd. Information presentation apparatus, information presentation method, and non-transitory computer readable medium
US11462094B2 (en) 2018-04-09 2022-10-04 State Farm Mutual Automobile Insurance Company Sensing peripheral heuristic evidence, reinforcement, and engagement system
US11887461B2 (en) 2018-04-09 2024-01-30 State Farm Mutual Automobile Insurance Company Sensing peripheral heuristic evidence, reinforcement, and engagement system
US11423758B2 (en) 2018-04-09 2022-08-23 State Farm Mutual Automobile Insurance Company Sensing peripheral heuristic evidence, reinforcement, and engagement system
US11670153B2 (en) 2018-04-09 2023-06-06 State Farm Mutual Automobile Insurance Company Sensing peripheral heuristic evidence, reinforcement, and engagement system
US11094180B1 (en) * 2018-04-09 2021-08-17 State Farm Mutual Automobile Insurance Company Sensing peripheral heuristic evidence, reinforcement, and engagement system
US11869328B2 (en) 2018-04-09 2024-01-09 State Farm Mutual Automobile Insurance Company Sensing peripheral heuristic evidence, reinforcement, and engagement system
US10795703B2 (en) 2018-04-20 2020-10-06 Facebook Technologies, Llc Auto-completion for gesture-input in assistant systems
US11307880B2 (en) 2018-04-20 2022-04-19 Meta Platforms, Inc. Assisting users with personalized and contextual communication content
US10978056B1 (en) 2018-04-20 2021-04-13 Facebook, Inc. Grammaticality classification for natural language generation in assistant systems
US10977258B1 (en) 2018-04-20 2021-04-13 Facebook, Inc. Content summarization for assistant systems
US10958599B1 (en) 2018-04-20 2021-03-23 Facebook, Inc. Assisting multiple users in a multi-user conversation thread
US11704900B2 (en) 2018-04-20 2023-07-18 Meta Platforms, Inc. Predictive injection of conversation fillers for assistant systems
US10957329B1 (en) 2018-04-20 2021-03-23 Facebook, Inc. Multiple wake words for systems with multiple smart assistants
US11003669B1 (en) 2018-04-20 2021-05-11 Facebook, Inc. Ephemeral content digests for assistant systems
US11010179B2 (en) 2018-04-20 2021-05-18 Facebook, Inc. Aggregating semantic information for improved understanding of users
US11010436B1 (en) 2018-04-20 2021-05-18 Facebook, Inc. Engaging users by personalized composing-content recommendation
US11038974B1 (en) 2018-04-20 2021-06-15 Facebook, Inc. Recommending content with assistant systems
US11042554B1 (en) 2018-04-20 2021-06-22 Facebook, Inc. Generating compositional natural language by assistant systems
US11087756B1 (en) 2018-04-20 2021-08-10 Facebook Technologies, Llc Auto-completion for multi-modal user input in assistant systems
US11086858B1 (en) 2018-04-20 2021-08-10 Facebook, Inc. Context-based utterance prediction for assistant systems
US20230186618A1 (en) 2018-04-20 2023-06-15 Meta Platforms, Inc. Generating Multi-Perspective Responses by Assistant Systems
US11093551B1 (en) 2018-04-20 2021-08-17 Facebook, Inc. Execution engine for compositional entity resolution for assistant systems
US11100179B1 (en) 2018-04-20 2021-08-24 Facebook, Inc. Content suggestions for content digests for assistant systems
US11115410B1 (en) 2018-04-20 2021-09-07 Facebook, Inc. Secure authentication for assistant systems
US10936346B2 (en) 2018-04-20 2021-03-02 Facebook, Inc. Processing multimodal user input for assistant systems
US10782986B2 (en) 2018-04-20 2020-09-22 Facebook, Inc. Assisting users with personalized and contextual communication content
US11715289B2 (en) 2018-04-20 2023-08-01 Meta Platforms, Inc. Generating multi-perspective responses by assistant systems
US11715042B1 (en) 2018-04-20 2023-08-01 Meta Platforms Technologies, Llc Interpretability of deep reinforcement learning models in assistant systems
US10963273B2 (en) 2018-04-20 2021-03-30 Facebook, Inc. Generating personalized content summaries for users
US11245646B1 (en) 2018-04-20 2022-02-08 Facebook, Inc. Predictive injection of conversation fillers for assistant systems
US11301521B1 (en) 2018-04-20 2022-04-12 Meta Platforms, Inc. Suggestions for fallback social contacts for assistant systems
US11727677B2 (en) 2018-04-20 2023-08-15 Meta Platforms Technologies, Llc Personalized gesture recognition for user interaction with assistant systems
US11308169B1 (en) 2018-04-20 2022-04-19 Meta Platforms, Inc. Generating multi-perspective responses by assistant systems
US11694429B2 (en) 2018-04-20 2023-07-04 Meta Platforms Technologies, Llc Auto-completion for gesture-input in assistant systems
US11908179B2 (en) 2018-04-20 2024-02-20 Meta Platforms, Inc. Suggestions for fallback social contacts for assistant systems
US11908181B2 (en) 2018-04-20 2024-02-20 Meta Platforms, Inc. Generating multi-perspective responses by assistant systems
US10855485B1 (en) 2018-04-20 2020-12-01 Facebook, Inc. Message-based device interactions for assistant systems
US10761866B2 (en) 2018-04-20 2020-09-01 Facebook, Inc. Intent identification for agent matching by assistant systems
US11886473B2 (en) 2018-04-20 2024-01-30 Meta Platforms, Inc. Intent identification for agent matching by assistant systems
US11368420B1 (en) 2018-04-20 2022-06-21 Facebook Technologies, Llc. Dialog state tracking for assistant systems
US10803050B1 (en) 2018-04-20 2020-10-13 Facebook, Inc. Resolving entities from multiple data sources for assistant systems
US10854206B1 (en) 2018-04-20 2020-12-01 Facebook, Inc. Identifying users through conversations for assistant systems
US10802848B2 (en) 2018-04-20 2020-10-13 Facebook Technologies, Llc Personalized gesture recognition for user interaction with assistant systems
EP3557501A1 (en) * 2018-04-20 2019-10-23 Facebook, Inc. Assisting users with personalized and contextual communication content
US10853103B2 (en) 2018-04-20 2020-12-01 Facebook, Inc. Contextual auto-completion for assistant systems
US10827024B1 (en) 2018-04-20 2020-11-03 Facebook, Inc. Realtime bandwidth-based communication for assistant systems
US11429649B2 (en) 2018-04-20 2022-08-30 Meta Platforms, Inc. Assisting users with efficient information sharing among social connections
USD931294S1 (en) 2018-06-22 2021-09-21 5 Health Inc. Display screen or portion thereof with a graphical user interface
US20190392926A1 (en) * 2018-06-22 2019-12-26 5 Health Inc. Methods and systems for providing and organizing medical information
US10896295B1 (en) 2018-08-21 2021-01-19 Facebook, Inc. Providing additional information for identified named-entities for assistant systems
US10949616B1 (en) 2018-08-21 2021-03-16 Facebook, Inc. Automatically detecting and storing entity information for assistant systems
US11442992B1 (en) 2019-06-28 2022-09-13 Meta Platforms Technologies, Llc Conversational reasoning with knowledge graph paths for assistant systems
US11657094B2 (en) 2019-06-28 2023-05-23 Meta Platforms Technologies, Llc Memory grounded conversational reasoning and question answering for assistant systems
US11894129B1 (en) 2019-07-03 2024-02-06 State Farm Mutual Automobile Insurance Company Senior living care coordination platforms
US11380439B2 (en) 2019-08-19 2022-07-05 State Farm Mutual Automobile Insurance Company Senior living engagement and care support platforms
US11908578B2 (en) 2019-08-19 2024-02-20 State Farm Mutual Automobile Insurance Company Senior living engagement and care support platforms
US11901071B2 (en) 2019-08-19 2024-02-13 State Farm Mutual Automobile Insurance Company Senior living engagement and care support platforms
US11923087B2 (en) 2019-08-19 2024-03-05 State Farm Mutual Automobile Insurance Company Senior living engagement and care support platforms
US11367527B1 (en) 2019-08-19 2022-06-21 State Farm Mutual Automobile Insurance Company Senior living engagement and care support platforms
US11393585B2 (en) 2019-08-19 2022-07-19 State Farm Mutual Automobile Insurance Company Senior living engagement and care support platforms
US11682489B2 (en) 2019-08-19 2023-06-20 State Farm Mutual Automobile Insurance Company Senior living engagement and care support platforms
US11923086B2 (en) 2019-08-19 2024-03-05 State Farm Mutual Automobile Insurance Company Senior living engagement and care support platforms
US11238239B2 (en) 2019-10-18 2022-02-01 Facebook Technologies, Llc In-call experience enhancement for assistant systems
US11861674B1 (en) 2019-10-18 2024-01-02 Meta Platforms Technologies, Llc Method, one or more computer-readable non-transitory storage media, and a system for generating comprehensive information for products of interest by assistant systems
US11694281B1 (en) 2019-10-18 2023-07-04 Meta Platforms, Inc. Personalized conversational recommendations by assistant systems
US11688021B2 (en) 2019-10-18 2023-06-27 Meta Platforms Technologies, Llc Suppressing reminders for assistant systems
US11699194B2 (en) 2019-10-18 2023-07-11 Meta Platforms Technologies, Llc User controlled task execution with task persistence for assistant systems
US11948563B1 (en) 2019-10-18 2024-04-02 Meta Platforms, Inc. Conversation summarization during user-control task execution for assistant systems
US11704745B2 (en) 2019-10-18 2023-07-18 Meta Platforms, Inc. Multimodal dialog state tracking and action prediction for assistant systems
US11669918B2 (en) 2019-10-18 2023-06-06 Meta Platforms Technologies, Llc Dialog session override policies for assistant systems
US20210117681A1 (en) 2019-10-18 2021-04-22 Facebook, Inc. Multimodal Dialog State Tracking and Action Prediction for Assistant Systems
US11636438B1 (en) 2019-10-18 2023-04-25 Meta Platforms Technologies, Llc Generating smart reminders by assistant systems
US11567788B1 (en) 2019-10-18 2023-01-31 Meta Platforms, Inc. Generating proactive reminders for assistant systems
US11308284B2 (en) 2019-10-18 2022-04-19 Facebook Technologies, Llc. Smart cameras enabled by assistant systems
US11314941B2 (en) 2019-10-18 2022-04-26 Facebook Technologies, Llc. On-device convolutional neural network models for assistant systems
US11688022B2 (en) 2019-10-18 2023-06-27 Meta Platforms, Inc. Semantic representations using structural ontology for assistant systems
US11341335B1 (en) 2019-10-18 2022-05-24 Facebook Technologies, Llc Dialog session override policies for assistant systems
US11403466B2 (en) 2019-10-18 2022-08-02 Facebook Technologies, Llc. Speech recognition accuracy with natural-language understanding based meta-speech systems for assistant systems
US11443120B2 (en) 2019-10-18 2022-09-13 Meta Platforms, Inc. Multimodal entity and coreference resolution for assistant systems
US11562744B1 (en) 2020-02-13 2023-01-24 Meta Platforms Technologies, Llc Stylizing text-to-speech (TTS) voice response for assistant systems
US11159767B1 (en) 2020-04-07 2021-10-26 Facebook Technologies, Llc Proactive in-call content recommendations for assistant systems
US11658835B2 (en) 2020-06-29 2023-05-23 Meta Platforms, Inc. Using a single request for multi-person calling in assistant systems
US11563706B2 (en) 2020-12-29 2023-01-24 Meta Platforms, Inc. Generating context-aware rendering of media contents for assistant systems
US11809480B1 (en) 2020-12-31 2023-11-07 Meta Platforms, Inc. Generating dynamic knowledge graph of media contents for assistant systems
US11935651B2 (en) 2021-01-19 2024-03-19 State Farm Mutual Automobile Insurance Company Alert systems for senior living engagement and care support platforms
US11688516B2 (en) 2021-01-19 2023-06-27 State Farm Mutual Automobile Insurance Company Alert systems for senior living engagement and care support platforms
US11861315B2 (en) 2021-04-21 2024-01-02 Meta Platforms, Inc. Continuous learning for natural-language understanding models for assistant systems
US11334900B1 (en) * 2021-09-07 2022-05-17 Instreamatic, Inc. Voice-based collection of statistical data

Similar Documents

Publication Publication Date Title
US20080240379A1 (en) Automatic retrieval and presentation of information relevant to the context of a user's conversation
US20070186165A1 (en) Method And Apparatus For Electronically Providing Advertisements
US9053096B2 (en) Language translation based on speaker-related information
CN105991847B (en) Call method and electronic equipment
CN104813311B (en) The system and method recommended for the virtual protocol of more people
US9099087B2 (en) Methods and systems for obtaining language models for transcribing communications
US20080033826A1 (en) Personality-based and mood-base provisioning of advertisements
US20080059198A1 (en) Apparatus and method for detecting and reporting online predators
US20150348538A1 (en) Speech summary and action item generation
US10468052B2 (en) Method and device for providing information
US20130144619A1 (en) Enhanced voice conferencing
US8811638B2 (en) Audible assistance
US20150371663A1 (en) Personality-based intelligent personal assistant system and methods
US20120201362A1 (en) Posting to social networks by voice
US11074916B2 (en) Information processing system, and information processing method
CN111241822A (en) Emotion discovery and dispersion method and device under input scene
JP7207425B2 (en) Dialog device, dialog system and dialog program
US20210125610A1 (en) Ai-driven personal assistant with adaptive response generation
US20220231873A1 (en) System for facilitating comprehensive multilingual virtual or real-time meeting with real-time translation
US20230282207A1 (en) System and method for electronic communication
JP2007334732A (en) Network system and network information transmission/reception method
JP2017010374A (en) Business support information providing system and business support information providing method
CN114566187B (en) Method of operating a system comprising an electronic device, electronic device and system thereof
US20200335079A1 (en) Dialogue system and method for controlling the same
US20220172711A1 (en) System with speaker representation, electronic device and related methods

Legal Events

Date Code Title Description
AS Assignment

Owner name: PUDDING LTD, ISRAEL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MAISLOS, ARIEL;MAISLOS, RUBEN;ARBEL, ERAN;REEL/FRAME:019708/0056

Effective date: 20070802

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION