EP1016000A1

EP1016000A1 - Method and apparatus for analyzing online user typing to determine or verify facts

Info

Publication number: EP1016000A1
Application number: EP97951483A
Authority: EP
Inventors: John W. Richardson; Robert T. Adams; Vaughn S. Iverson
Original assignee: Intel Corp
Current assignee: Intel Corp
Priority date: 1996-12-31
Filing date: 1997-12-01
Publication date: 2000-07-05
Also published as: WO1998029818A1; EP1016000A4; AU5511498A

Abstract

A method of providing real-time, augmented information to a chat room user about one or more other users in the chat room is provided. Information is received from another chat room user (405). The information received from the other chat room user is then analyzed to determine one or more analyzed guess attributes of the chat room user (445). The analyzed guess attributes may include, for example, the user's age, gender, or level of education. Upon determining the one or more analyzed guess attributes of the other chat room user, they are displayed to the local chat room user (455). According to other aspects of the invention, the identity of a chat room user can be verified and the likelihood that a particular text was written by a given person can be determined.

Description

METHOD AND APPARATUS FOR ANALYZING ONLINE USER TYPING TO DETERMINE OR VERIFY FACTS

RELATED APPLICATIONS

A copending United States patent application, <Attorney Docket No. P4118>, filed <filing date>, by John W. Richardson, et al., and titled "METHOD AND APPARATUS FOR MASQUERADING ONLINE," is hereby incorporated by reference.

FIELD OF THE INVENTION

The invention relates generally to the field of online communications in multi-user environments. More specifically, the invention relates to identifying attributes of a user based upon information available through the online communication session.

BACKGROUND OF THE INVENTION

Several approaches have been developed to keep young children away from objectionable material on the Internet, World Wide Web (WWW) and chat rooms. For example, parents can restrict access to certain newsgroups and chat areas by installing software on their personal computers to filter out objectionable material and sites. Products including SurfWatch and Cyber Patrol examine the content of a requested site and check it against a list of blocked sites before allowing the user to access the requested site. Further information about SurfWatch and Cyber Patrol can be found at http://www.surfwatch.com/ and http: //www.microsys.com/, respectively.

The above online solutions relate to the present application in that they attempt to analyze information that is prepared by humans and scan for keywords to filter information to the online world; however, still lacking is a method and apparatus that addresses the widespread practice of identity deception.

Since there's generally no identification required on the Internet beyond a self-assigned online name (username), people can be as anonymous as they wish. Essentially, a person's name or identity can be changed at will. In the online world, identity deception or masquerading can generally be described as supplying false information and/or employing a pseudonym or misleading icon to disguise yourself as something you are not. The most frequent type of masquerading appears to be males presenting themselves as females by creating female icons, called "avatars" or "props," or by choosing a female username. This type of masquerading is commonly referred to as gender-switching, gender swapping, or gender deception.

Masquerading is one of the controversial aspects of online chat rooms. Due to the anonymity and untraceability masquerading has become rampant in online chat rooms and the like. For example, men present themselves as women and children pretend to be adults.

Commercial internet service providers facilitate the practice of masquerading by allowing participants to identify their name, gender, and age as they wish. This allows people an opportunity to choose a different gender label, or choose no label at all. Without confronting the potential masquerader and directly inquiring of the user his or her gender or age, there is currently no reliable method of determining this information. It is therefore desirable to provide a method of analyzing a chat room user's input to provide feedback about that user. It is also desirable to provide a method of determining whether a chat room user is the same person that had been logged on from a prior session. Further, it is desirable to provide an estimation of the likelihood that a sample text was written by a person claiming to be the author. SUMMARY OF THE INVENTION

A method of providing real-time, augmented information to a chat room user about one or more other users in the chat room is disclosed. Information is received that has been transmitted by a user. The information is analyzed to determine one or more analyzed guess attributes of the user. Upon determining the one or more analyzed guess attributes of the user, they are displayed to the local user.

According to another aspect of the invention, the identity of a chat room user can be verified. One or more text messages are received from a user. Based upon the one or more received text messages, analyzed guess attributes of the user and confidence levels associated with each of the analyzed guess attributes are determined. The analyzed guess attributes are compared with a stored user profile that has been generated from one or more previous chat sessions involving the user to determine whether the user is the same person from a previous chat session.

According to yet another aspect of the invention, the likelihood that a particular text was written by a given person can be determined. A text is received in a computer readable form. A profile of the purported author, in computer readable form, is also received. Next, analyzed guess attributes of the true author are determined by analyzing the text in question. Then, the profile of the purported author is compared to analyzed guess attributes of the true author. Upon completion of the comparison, the results are displayed. BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:

Figure 1 is an example of a typical computer system upon which one embodiment of the present invention can be implemented.

Figure 2 is a high level data flow diagram illustrating the overall software architecture including the processes, data stores, and the flow of data among the processes according to one embodiment of the present invention.

Figure 3 is a data flow diagram illustrating the attribute analyzer of Figure 2 according to one embodiment of the present invention.

Figure 4 is a flow diagram illustrating a method of verifying claimed facts according to one embodiment of the present invention using the processes and data stores shown in Figure 2.

Figure 5 is a flow diagram illustrating a method of performing the metric processing of Figure 4 according to one embodiment of the present invention.

Figure 6 is a flow diagram illustrating a method of performing the background verification of Figure 4 according to one embodiment of the present invention. Figure 7 is a flow diagram illustrating a method of performing the determination of analyzed guess attributes and confidence levels of Figure 4 according to one embodiment of the present invention.

Figure 8 is a flow diagram illustrating a method of determining whether a chat room user is the same person from a previous chat session according to one embodiment of the present invention.

Figure 9 is a flow diagram illustrating a method of determining whether a chat room user is the same person from a previous chat session according to another embodiment of the present invention.

Figure 10 is a flow diagram illustrating a method of determining the likelihood a given text was written by the purported author according to one embodiment of the present invention.

Figure 11 is an exemplary user interface for providing augmented information to a chat room user according to one embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

A method and apparatus for analyzing information sent by users in an online chat room and providing real-time, augmented information to a local user about one or more of the users is described. Importantly, while most chat areas allow real-time communication among users, this is not an essential characteristic of a chat area for the purposes of this application. The terms "chat room" and "chat area" are used throughout this application to refer to any online environment that allows multi-user interaction. For example, Internet Relay Chat (IRC), multi-user dungeons, multi-user environment simulators (MU*s), habitats, GMUKS (graphical multi-user konversation), and even Internet newsgroups would fall within this definition of a chat room.

In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without some of these specific details. In other instances, well-known structures and devices are shown in block diagram form. HARDWARE OVERVIEW

Referring to Figure 1, a computer system is shown as 100. The computer system 100 represents a computer system upon which the preferred embodiment of the present invention can be implemented. Computer system 100 comprises a bus or other communication means 101 for communicating information, and a processing means 102 coupled with bus 101 for processing information. Computer system 100 further comprises a random access memory (RAM) or other dynamic storage device 104 (referred to as main memory), coupled to bus 101 for storing information and instructions to be executed by processor 102. Main memory 104 also may be used for storing temporary variables or other intermediate information during execution of instructions by processor 102. Computer system 100 also comprises a read only memory (ROM) and/or other static storage device 106 coupled to bus 101 for storing static information and instructions for processor 102. Data storage device 107 is coupled to bus 101 for storing information and instructions.

A data storage device 107 such as a magnetic disk or optical disc and its corresponding drive can be coupled to computer system 100. Computer system 100 can also be coupled via bus 101 to a display device 121, such as a cathode ray tube (CRT), for displaying information to a computer user. An alphanumeric input device 122, including alphanumeric and other keys, is typically coupled to bus 101 for communicating information and command selections to processor 102. Another type of user input device is cursor control 123, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 102 and for controlling cursor movement on display 121. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), which allows the device to specify positions in a plane. Alternatively, other input devices such as a stylus or pen can be used to interact with the display. A hard copy device 124 which may be used for printing instructions, data or other information on a medium such as paper, film, or similar types of media can be coupled to bus 201. A communication device 125 may also be coupled to bus 101 for use in accessing other computer systems. The communication device 125 may include any of a number of commercially available networking peripheral devices such as those used for coupling to an Ethernet, token ring, Internet, or wide area network. Further, an optical character recognition (OCR) device 126 may be coupled to bus 101 for scanning hard copy documents and converting them into computer readable form. Note that any or all of the components of the system illustrated in Figure 1 and associated hardware may be used in various embodiments of the present invention; however, it will be appreciated by those of ordinary skill in the art that any configuration of the system may be used for various purposes according to the particular implementation.

The present invention is related to the use of computer system 100 for determining characteristics of a person based upon a sample of text written by the person. In one embodiment, computer system 100 executes a program that analyzes the information sent from users in an online chat room and provides real-time, augmented information to a local user about one or more of the users.

Figure 2 is a high level data flow diagram illustrating the overall software architecture including the processes, data stores, and the flow of data among the processes according to one embodiment of the present invention. A character stream is received by the system and sent to a display 220 and to a lexical analysis process 205.

In one embodiment, the input bit stream is a stream of ASCII characters. The lexical analysis process 205 converts the character stream into tokens. Both the tokens and the character stream are input into a metric processing block 210.

It is appreciated that online communication is becoming more and more graphically oriented. Therefore, in an alternative embodiment of the present invention, the lexical analysis process 205 would also include a graphic analysis process to analyze graphic information received such as avatars, icons, or other images associated with or transmitted by a user.

The metric processing block produces metrics with reference to a spelling dictionary 255 and a grammar checker 260. The tokens are also supplied into a simple syntax analyzer 225. The simple syntax analyzer 225 identifies claimed facts in the token stream and outputs the claimed facts to the display 220, a background verification process 230, and a comparator against claimed facts 240. In this embodiment, the simple syntax analyzer 225 reads a database containing keywords and phrases to facilitate the identification of claimed facts. For example, the user's name would generally follow these introductory phrases "My name is" or "I am" or "I'm," the number between the phrases "I am" and "years old" would be the user's claimed age, the user's address could be expected to follow the phrase "I live at," etc.

An attribute analyzer 215 analyzes the tokens and metrics to produce analyzed guess attributes and confidence levels. The attribute analyzer 215 can also receive user feedback about the accuracy of its analyzed guess attributes to learn from its mistakes and context information to improve the analysis. The process of determining the analyzed guess attributes and confidence levels will be discussed further with reference to Figure 4.

The analyzed guess attributes and confidence levels produced by the attribute analyzer 215 and the claimed facts determined by the simple syntax analyzer 225 are input into the comparator against claimed facts 240. The analyzed guess attributes are compared to the claimed facts and the results of the comparison are sent to display 220.

The analyzed guess attributes and confidence levels are also used by an expert system to compare against other people 250. The expert system to compare against other people 250 determines if the analyzed guess attributes match a stored user profile corresponding to the user that produced the character steam. A past user profile database 245 maintains user profiles on users encountered in previous online chat sessions. Each user profile in the past user profile database 245 includes typical conversational constructs employed by the user (e.g., conversational openings, topics of discussion, and conversational closings), and other distinguishing characteristics such the grammar, spelling and typing metrics discussed below. Additionally, claimed facts from previous chat session could be maintained in the user profile. Further, as discussed above, since graphic objects are more frequently being employed in online communication, graphics associated with a given user can also stored in the past user profiles database 245.

Given a set of claimed facts, the background verification process 230 produces verified facts. The verified facts are compared against the claimed facts by the comparator against claimed facts 240 and the comparison results are displayed.

GENDER, AGE, EDUCATIONAL LEVEL, AND REGIONAL PREDICTORS

Before moving on to Figure 3, it is instructive at this point to discuss predictors that can be used by the attribute analyzer 215 to make guesses about a user's likely gender, age, educational level, and the region in which they grew up. Many of the speech patterns discussed below have been observed by linguists in the area of sociolinguistics. Sociolinguistics is the study of language and linguistic behavior as influenced by social and cultural factors. Linguists have found that a speaker's usage of language can often describe the age, gender, social class, and other attributes of the speaker.

A sample of text written by a particular person, therefore, can be analyzed to determine attributes of the person. For example, women are generally more polite than men. Politeness is defined by the concern for the feelings of others. Women typically use more polite speech than do men, characterized by a high frequency of honorifics, and softening devices such as hedges and questions.

Studies of "verbal turntaking" have illustrated that men have more frequent turns than women, speak for greater lengths of time during each turn, tend to interrupt another speaker's turn more often, and answer questions that weren't addressed to them.

In Japanese, the language differences between the genders is very apparent. For example, a male speaker would attach "-san" to the end of a person's name (e.g., John-san, Vaungh-san, or Robert-san), whereas a female speaker would attach "-chan" to the end of the name (e.g., John- chan, Vaungh-chan, or Robert-chan). Some linguists have suggested the existence of "genderlect" (i.e., language variations particular to gender) within Standard American English (SAE). Linguists use the term SAE to refer to the English that is taught in American schools, tested for on exams like the SAT, and used in job interviews. While genderlect is more subtle in SAE, according to the theory, indications of a speaker's gender are communicated through their unconscious choices of style and diction. The following are exemplary observations about speech patterns that have found much support in researchers and scholars of today: (1) Women more frequently use hedges. For example, phrases like "sort of," "kind of," "It seems like," "loosely speaking", and other phrases that tend to lessen the assertiveness a statement, soften the impact of a statement or phrases such as " I was sort-of-wondering," "maybe if....," "I think that...." etc. (2) Women tend to use superpolite / polite forms. For example, women more frequently use the following phrases "Would you mind...," "I'd appreciate it if...," "...if you don't mind." (3) Women frequently employ tag questions. For example, phrases like "aren't you?" or "isn't it?" at the end of a statement. (4) Men tend to use "I" more. (5) Men tend to cuss more. (6) Men tend to use more imperatives. (6) Women tend to deflect compliments. (7) Women tend to use more terms of endearment. (8) To establish camaraderie, men tend to use sarcasm and put-downs jokingly, whereas women tend to take these "jokes" as personal insults. In contrast, women tend to establish camaraderie by trading personal details about their life. (9) Women tend to use indirect communication. (10) Women more frequently preface their ideas with conditionals like "I'm not saying we have to do this, but maybe we could..." "I'm just brainstorming here..." "I don't know if this will work, but..." "I'm no expert on this, but maybe..." "This is a really dumb question, but...." (11) Men use more obscenities. (12) Women use empty adjectives more often (e.g., divine, lovely, adorable, etc.) (13) Women tend to use what is considered the better, or prescriptive, form of grammar and pronunciation. Experts have attributed this to the fact that children tend to adopt the language of their primary care-giver, so women, being potential moms and representing the majority of teachers, may feel an unspoken responsibility to know and use so-called "good English." (14) Men paraphrase more often, while women more frequently use direct quotation. (15) Both men and women develop special lexicons. For example, women use more words and details for things like colors (e.g., teal instead of green, bone instead of white), whereas men use more words and details for things like sports.

There is no denying that there are cases in which men and women do not use all or even some of the patterns attributed to them. However, these and other observed patterns are still generally useful for generating variables to be tracked, If-then rules, and confidence levels that can be incorporated into the knowledge base of an expert system, for example.

In the same manner, variables, rules, metrics, and/or confidence levels can be generated for determining where a person grew up. For example, regional phrases and colloquialisms can be stored in a database and compared to the words used by a particular online user.

Similarly, formulas can be used to predict a person's age and educational level. These formulas can be derived from known formulas that measure readability of a text. Several different formulas are used in software packages to analyze text passages and rate the readability (e.g., the Flesch Reading Ease, Gunning Fog Index, and the Flesch-Kincaid Grade Level).

The Flesch Reading Ease formula produces lower scores for writing that is difficult to read and higher scores for text that is easy to read. The Flesch Reading Ease score is determined as follows: 206.835 - (1.015 (Average sentence length) + 846(Number of syllables per 100 words)). A text scoring 90 to 100 is very easy to read and rated at the 4th grade level. A score between 60 to 70 is considered standard and the corresponding text would be readable by those having the reading skills of a 7th to 8th grader. A text generating a score between 0 and 30 is considered very difficult to read and is recommended for college level readers.

The Gunning Fog Index also gives an approximate grade level a reader must have completed to understand the document using the following formula: .04((Average number of words per sentence) + (Number of words of 3 syllables or more)). The last approach is the Flesch-Kincaid Grade Level. The formula is: 0.39 (Average number of words per sentence) + 11.8 (average number of syllables per word)) - 15.59. A score between 6th and 10th grade is considered to be most effective for a general audience.

A correlation can be established between writers' educational levels and the grade level of the writing they produce. In one embodiment, this correlation can \tc used to determine guess educational levels. This method of determining guess education levels can be further refined by adding a database of expected vocabulary for particular age groups and educational levels to the calculation. Once a database of expected vocabulary for particular age groups and educational levels is established, the system can maintain metrics regarding the vocabulary usage of users that are encountered in chat rooms. If a person claims they have graduated from college, but uses words primarily from vocabulary identified with a high school education, the system should accordingly indicate that it is unlikely that the person is college educated. Table 1 illustrates exemplary multisyllabic words that would likely be used by more educated people and corresponding words that would tend to indicate a lower educational level. These and other examples, readily determined with the use of a thesaurus, can be used as a factor in measuring educational level.

Table 1

Figure 3 is a data flow diagram illustrating the attribute analyzer of Figure 2 according to one embodiment of the present invention. In this embodimeni, the attribute analyzer 215 includes a neural network 315, a metric analyzer 310, a linguistics expert system 355, and a weighting process 330. The speech patterns and variables discussed above can be incorporated into rules and metrics usable by the linguistics expert system 355 and the metric analyzer 310, respectively.

A neural network learns directly by interfacing with the domain, therefore, no rule-base needs to be generated. A "neural network" is a system that is constructed to imitate the intelligent human biological processes of learning, self-modification, and learning by making inferences. The neural network 315 analyzes tokens, and considers the context of the conversation, user feedback and metrics to produce analyzed guess attributes and confidence levels. Based upon user feedback regarding the accuracy of the analyzed guess attributes for a given user, the neural network 15 can adjust the weights used for particular neurode synapses in the synapse weights data store 360. Given enough time and experience or training, the neural network 315 can learn everything about the problem domain (attribute determination based on samples of text). It may well be able to learn what is presently not known by any of the experts.

The metric analyzer 310 produces a set of analyzed guess attributes and confidence levels based upon the metrics supplied by the metric processing block 210 and a metric database 325. The metric database 325 includes correlaiions between the metrics and attributes like those discussed above. Importantly, particular metrics may be better predictors for some attributes than other attributes and should be subdivided accordingly. For example, a metric that tracks references to a popular television series of a particular era (e.g., Gilligan's Island) is a better predictor of age than educational level or gender.

The attribute analyzer 215 also includes linguistics expert system 355. Generally, an "expert system" is a program that manifests some combination of concepts, procedures, and techniques derived from recent artificial intelligence (Al) research. These problem- solving systems were initially called "expert systems" to suggest that they functioned as effectively as human experts at their highly specialized tasks. Expert systems have been employed in fields as varied as medical diagnosis (e.g., MYCIN was developed at Stanford University to consult with physicians to determine if a patient had a potentially fatal infection like bacteremia or meningitis.) and computer configuration (e.g., XCON was developed to assist Digital Equipment Corporation with customer requests for custom built computers.). A range of implementation options are available for development of an expert system such as simply adding an interface and supplying rules to an existing expert system engine, using expert system development tools to build the system, or creating the system from scratch with a high level language.

The linguistics expert system 355 includes an inference engine 305 and a knowledge base 365. The inference engine 305 contains the inference and control strategies. It also includes various knowledge acquisition, explanation, and user interface subsystems. The knowledge base 365 consists of facts and heuristics useful for attribute determination. The knowledge can be in the form of examples, facts, rules, or objects. In this example, because the information available for the determination of attributes is primarily in the form of rules, the linguistic expert system's knowledge representation preferably would be a structured rule-base. That is, the knowledge base 365 would be arranged into separate rule-bases (e.g., a rule-base for educational level predictors 350, a rule-base for gender predictors 335, a rule-base for age predictors 340, and a rule-base for regional predictors 345).

In an alternative embodiment, the knowledge base 365 can also be arranged into separate rule-bases based upon the context of the conversation. The context can be manually input to the linguistics expert system 355 or the linguistics expert system 355 can be configured to automatically recognize the context based upon keywords that are be associated with a given context. Subsequent linguistic analysis based on the context can then disregard certain inferences that might otherwise have been made without reference to the context. For example, without knowing that the conversational environment was a chemistry tutorial chat room, a high level of education might be attributed to an uneducated user due to the high number of references to chemistry and related terms.

The linguistics expert system 355 also includes logic to identify rules in the knowledge base 365 that require additional information before they can be evaluated. When these rules are identified, the linguistics expert system 355 provides guidance to the user in the form of suggested questions.

The linguistics expert system 355 can be implemented as a rule- based learning system that is trainable or a traditional expert system. Traditional expert systems are dependent on experts to supply domain knowledge in the form of facts and relationships among the facts. The knowledge of these systems is bounded by what is currently known about the established domain. Therefore, when new expertise can no longer be found, the knowledge of this type of expert system stops growing. In contrast, a rule-based learning system can learn by interacting with the domain, and therefore is not dependent upon experts to supply additional domain knowledge. A rule-based learning system can modify its knowledge base based upon interaction with the domain and user feedback. In the preferred embodiment, the attribute analyzer 215 includes a rule- based learning system. A further advantage of using a rule-based learning system is the ability to query the system about the particular rules used to arrive at its conclusions, whereas this information is not available from a neural network .

Figure 4 is a flow diagram illustrating a method of verifying claimed facts according to one embodiment of the present invention using the processes and data stores shown in Figure 2. At step 405 information is received from another chat room user. In this embodiment, it is assumed information is communicated among users in the form of text messages. The characters of the text messages may be received in real-time as the other user is typing or the text messages may be received after the other user has completed and transmitted each individual message.

In any event, at step 410, lexical analysis is performed on the received text messages to tokenize the character stream. That is, the characters are grouped into tokens (e.g., words, punctuation, white space, numbers, etc.).

Metrics are also used to improve attribute determination. Clues about the other user's gender, age, educational level, and other characteristics can be found by studying the character stream and tokens. For example, a user that falsely claims to have an advanced degree in English mighi Ix; betrayed by poor grammar and spelling. In this embodiment, one or more metrics are tracked based upon the text messages received from the other user in the metric processing step, 415.

At step 420, a simple syntax analysis is performed. The simple syntax analyzer scans the text messages for facts asserted by the other user. In one embodiment, this simple syntax analysis is performed using a database containing keywords and phrases. As discussed above with reference to Figure 2, the database could include the words and phrases that are likely to identify occurrences of claimed facts. Alternatively, the syntax analysis could be performed without reference to a keyword database by using other syntax analysis techniques to determine claimed facts asserted by the user (e.g., parsing methods typically employed by compilers). In any event, once the claimed facts have been determined in step 420, at step 425, the claimed facts are displayed to the local user. The claimed facts can be displayed in the chat window or a separate window on the display 121 so as not to interfere with the chat session.

Certain of the claimed facts will likely be capable of verification with reference to other sources of information (e.g., white pages and public records). At step 430, background verification is performed.

At step 435, the attribute analyzer 215 determines analyzed guess attributes and confidence levels for each of the analyzed guess attributes. In one embodiment, the analyzed guess attributes of the other user include: age, educational level, gender, and region. The inventors of the present invention envision many other attributes will be capable of determination as the field of sociolinguistics continues to mature and more information becomes available with respect to the relationship of speech patterns and other attributes.

Confidence levels can be provided for each of the analyzed guess attributes. The confidence levels indicating the attribute analyzer's confidence in its guess based upon the sample size and weight of the predictor, for example.

After the analyzed guess attributes and confidence levels have been determined, this supplemental information about the other user is displayed to the local user. The analyzed guess attributes can be displayed in a separate window, the chat window, or along side the corresponding claimed attributes to allow easy comparison by the local user.

Importantly, a measure of hysteresis should be included in the attribute deteriui nation to ensure that a brief lapse into poor grammar, for example, does not immediately affect the guessed educational level.

At step 445, the analyzed guess attributes are compared to corresponding claimed attributes. At step 450, the results of the comparison are analyzed to determined if the analyzed guess attributes produced by the attribute analyzer 215 differ from the claimed attributes provided by the other user. In one embodiment, the system can include selectable thresholds for each of the analyzed guess attributes. In this embodiment, step 450 would be configured to perceive no difference between a given claimed facts and the corresponding analyzed guess attribute unless a user supplied threshold is exceeded. For example, the local user might not want to be alerted about a difference between the analyzed guess age the claimed age unless the absolute value of the difference exceeded two years. Similarly, the local user might want to suppress alerts until the confidence level reached a sufficient level.

If a discrepancy is found and the user selected threshold has been exceeded, the local user is alerted at step 455, otherwise processing continues at step 405 so long as text messages are being received. Based upon the local user's preference, the local user can be alerted of the discrepancy between the analyzed guess attributes and the claimed attributes by an alarm including sound, color, or some other mechanism.

Figure 5 is a flow diagram illustrating a method of performing the metric processing of Figure 4 according to one embodiment of the present invention. In one embodiment, the step of performing metric processing 415 includes measuring typing speed 505, grammar analysis 510, and spelling analysis 515. At step 505, if the character stream is received in real-time, metrics can be measured and recorded regarding the other user's typing speed, typing rhythm, and other factors determined to be pertinent to user attribute analysis. Since different individuals have different typing patterns, metrics recorded with respect to a users typing patterns are especially useful for verifying the identity of a particular user. For example, given general typing rhythm information such as inter-key and inter-word time differences, a touch typist would be distinguishable from a "hunt-and-peck" typist.

At step 510, grammar metrics can be tracked by performing grammar analysis on the text messages. For example, metrics can be recorded for grammatical and punctuation errors, weak phrasing, slang, ambiguity, cliches, long or incomplete sentences, redundant phrases, incorrect verbs, and other problems. At step 515, spelling metrics can be tracked by performing spelling analysis on the text messages. Many other variables can be tracked depending upon the complexity and accuracy goals for the system (e.g., word choice, breadth of vocabulary, length of sentences, references to events or popular icons of a particular era, music, complexity of sentence structure, etc.)

Additionally, common spelling mistakes, common grammatical mistakes, use of "emoticons" (e.g., smileys), usage, choice and frequency of "Net abbreviations" such as "IMHO" ("In my humble opinion), "IRL" ("In real life"), "INAL" ("I'm not a lawyer") can be recorded in a user profile to facilitate user identification. This aspect of the present invention will be discussed further with respect to Figure 8.

Figure 6 is a flow diagram illustrating a method of performing the background verification of Figure 4 according to one embodiment of the present invention. In this embodiment, the step of performing background verification 430 further includes steps 605, 610, 615 and 620 for each claimed fact that is identified by step 420.

At step 605, a determination is made as to whether a given claimed fact is verifiable with an online resource. This determination can be made, for example, by comparing the claimed fact to a list of known reliable online resources.

If the claimed fact is verifiable, at step 610, information regarding the claimed fact is retrieved from the appropriate online resource. Preferably, this verification processing is performed in the background to reduce the effect on the ongoing chat session.

At step 615, it is determined whether there is a discrepancy between the claimed fact and the verified fact. If a discrepancy exists, the local user is alerted at step 620. Again, user selectable thresholds can be employed to allow the system to be more flexible. Once the local user has been alerted of the discrepancy, the background verification processing is complete and processing continues with step 435.

If it is determined, at step 605, that the claimed fact is not one that can be verified with reference to a reliable online resource, then the verification processing for this claimed fact is complete. Processing will continue with step 435.

If, at step 615, it is determined that there is no discrepancy between the claimed fact and the verified data (e.g., the verified data is consistent with the claimed fact or falls within the user defined threshold), then the background verification processing is complete with respect to this claimed fact and processing continues with step 435.

Figure 7 is a flow diagram illustrating a method of performing the determination of analyzed guess attributes and confidence levels of Figure 4 according to one embodiment of the present invention.

At step 705 linguistics analysis is performed by the linguistics expert system 355 to provide analyzed guess attributes and corresponding confidence levels.

At step 710, metric analysis is performed on the metrics by the metric analyzer 310. The metric analysis also produces a set of analyzed guess attributes and corresponding confidence levels. In one embodiment, the metric analysis simply involves looking up metrics in the metric database 325.

At step 715, the neural network 315 analyzes the text messages received from the other user to arrive at analyzed guess attributes and confidence levels. Upon determining the analyzed guess attributes and corresponding confidence levels, the neural network analysis outputs analyzed guess attributes and confidence levels.

The analyzed guess attributes and confidence levels output from steps 705, 710, and 715 are evaluated and weighted at step 720. Initially, less weight will be given to the neural network 315 than the metric analyzer 310 and the linguistics expert system 355, but as it becomes trained, the neural network will be given more weight. For example, initially the linguistics experts system's analyzed guess attributes might be weighted 50%, the metric analyzer's analyzed guess attributes 30%, and the neural network's analyzed guess attributes 20%. However, as the neural network's confidence levels increase as a result of sufficient training, its outputs can be relied upon more heavily. It is appreciated that not all of the analysis steps (705, 710, and 715) are required for a functional method of determining analyzed guess attributes. Any one of steps 705, 710, or 715 alone could provide a reasonable attribute analysis.

Figure 8 is a flow diagram illustrating a method of determining whether a chat room user is the same person from a previous chat session according to one embodiment of the present invention. Once the analysis processing described with respect to Figure 4 is in place, determining whether a given chat room user is someone from a previous session is easily achieved by maintaining and utilizing stored user profiles in the past user profiles database 245. In this embodiment, the stored user profiles preferably include at least the claimed facts from previous chat sessions for each user encountered, the analyzed guess attributes from previous encounters with the user, and statistical profiles of the user including the spelling, grammar, and other metrics described with respect to Figure 5. Of course, additional distinguishing characteristics would increase the certainty of the identification process such as avatars, icon, props or other graphics employed by the user.

Steps 405 through 440 are as described with respect to Figure 4. At step 840, the analyzed guess attributes and the claimed facts from the current chat session are compared against a stored user profile in the past user profiles 245. For example, the claimed facts in the current chat session are compared against the claimed facts from previous chat sessions and analyzed guess attributes are compared against claimed facts from the current and previous sessions. Step 845 determines whether a discrepancy exists between the stored user profile and the analyzed guess attributes and the claimed facts from the current session. If a discrepancy is found the local user is alerted, at step 850, otherwise processing continues at step 405 so long as text messages are being received. As above, based upon the local user's preference, the local user can be alerted of a discrepancy by an alarm mechanism of his/her choosing. Figure 9 is a flow diagram illustrating a method of determining whether a chat room user is the same person from a previous chat session according to another embodiment of the present invention. Research in discourse analysis has revealed predictable sequences and routines in human interaction. While speech may seem to be infinitely variable, it is not totally unpredictable. It has been recognized that a significant percentage of conversational language is highly routinized into prefabricated utierances. Some have concluded that an enormous amount of natural language is formulaic, automatic, and rehearsed.

At the most simple, abstract level a complete conversation can be said to have the following three components: an opening of the conversation; topic discussion; and a closing of the conversation. People tend to use a predictable routine in opening a conversation. The process of opening a conversation can typically be broken down further into the following elements: bid for attention; verbal salute; identification; personal inquiry; and Smalltalk. Topic discussion generally has a much less obvious structure than the opening and closing sections of a conversation making the task of identification difficult with reference to the structure of past and present topic discussions alone. However, given there is much nonoriginal material that is included in everyday conversations, a fair conclusion about what can be predicted about topic discussion is that it is often repetitive. As in the case of opening a conversation, people tend to follow set procedures for closing a conversation. As Schegloff and Sacks state it, a conversation "does not simply end, but is brought to a close" (see Schegloff, E. and Sacks, H. Opening up closings. Semiotica, 8 (1973), 289-327.). Basic elements identified in the process of closing a conversation include the following: transition signals; exchange of phatic remarks, and exchange of farewells.

Recognizing the predictability and repetitive nature of conversations allows a user to be identified by comparing a current conversational pattern to patterns maintained in a user profile. The following method exploits the predictability and repetitive nature of conversations to determine whether a particular user is the same person from one or more previous chat sessions. Steps 405 and 410 are as described with respect to Figure 4. At step 915, a simple syntax analysis is performed on the received text messages to identify conversational constructs (e.g., conversational opening elements, the topic of conversation, and conversational closing elements). At step 920, the identified conversational constructs in the current conversation are compared against a stored user profile. Preferably the stored user profile has been derived from several previous encounters with the user to make the determination more reliable. As discussed earlier, the comparison should also include a measure of hysteresis to ensure against a mismatch due to one out-of-character remark.

Of course, it is appreciated that a particular user might be identified with sufficient certainty without every aspect of the user's profile indicating a match. Further, a particular user might be the same person from a previous session even though a topic of conversation is unlikely for the person. The context of the conversation is helpful in this regard. For example, the topic of conversation should be given less weight as a predictor when the users are engaged in a special interest chat room as opposed to a ch;ιt room of a more general nature.

At step 925, it is determined whether the differences between the conversational constructs identified for the current chat session exceed a threshold so as to warrant an alert to the local user. If the differences are found to be substantial in step 925, the local user is alerted of the discrepancy at step 930. If the differences are not worthy of alerting the local user, then steps 405, 410, and 915 through 930 can be repeated as text messages continue to be received from other chat room users.

While lioth Figure 8 and Figure 9, illustrate exemplary methods for determining whether a given chat room user is the same person from a previous session based upon text input. The inventors of the present invention anticipate the bit stream will also include graphical objects such as avatars. If this is the case, the graphical objects in the bit stream can be compared to graphical objects associated with a given user from the past user profiles database 245 to further increase the confidence of the determination process. Figure 1 is a flow diagram illustrating a method of determining the likelihood a given text was written by the purported author according to one embodiment of the present invention.

This method could be used by a university, for example, to determine the likelihood that an essay was indeed written by the student that claimed to be the author.

At step 1 (X)5, a computer readable form of the text to be certified (e.g., an ASCII text file on diskette) is received by the system. If the text to be certified is in hard copy form, the text can be scanned into the system via the optical character recognition device 126 or it may be manually entered via the keyboard 122. In any event, an electronic copy is required before the analysis can be performed.

At step 1 10, a profile of the purported author of the text to be certified is received by the system. Again, the profile needs to be in a computer readable form. The profile may be generated by analyzing the writing of a pers >n over a significant period of time and storing observed characteristics as described with respect to Figure 2. Another option would be to generate the profile based upon several known samples of the individual's writing. Alternatively, the profile can be manually keyed into the system and should include accurate information regarding the purported author's gender, age, educational level, and where he/she grew up.

At step 1015, analyzed guess attributes of the true author and associated confidence levels are determined based upon the text supplied in step 1005. This determination is equivalent to the determination made in step 435.

At step 1 25 the analyzed guess attributes of the true author are compared to the purported author's profile supplied in step 1010. The differences, if any, between the analyzed guess attributes and the purported author's attributes are detected at step 1030.

If a discrepancy is found the operator is alerted as discussed above with respect to step 455, otherwise processing continues at step 1005 where the authorship of another text can be verified. Figure 1 1 is an exemplary user interface for providing augmented information to a chat room user according to one embodiment of the present invention. The user interface includes means for communicating information to the local chat room user and means for receiving feedback from the local chat room user such as graphical or text-based windows presented on display 220. In this example, the user interface includes a chat window 1 1 0, a fact window 1110, an alert window 1130, and a feedback window 1 140.

The chat window 1120 records and displays the chat room conversation. The fact window 1110 displays claimed facts, analyzed guesses, and confidence levels associated with the analyzed guesses for one or more users involved in the chat room conversation. As the conversation is taking place and as new information is discovered about the other chat room users, the simple syntax analyzer 225 and the attribute analyzer 215, update the fact window 1110. When the simple syntax analyzer 225 disc-overs a recognized factual assertion, the claimed fact is displayed and recorded in the fact window 1110. When the attribute analyzer 215 has enough information to produce analyzed guess attributes and confidence levels, this additional information is presented to the local chat room user by way of the fact window 1110.

The local chat room user can be immediately alerted of potential identity deception by way of an alert mechanism such as an audible tone or visual signal. In this embodiment, based upon tolerances selected by the local chat room user, text-based alerts are displayed in alert window 1130 when discrepancies are detected between the facts claimed by a particular chat room user and the analyzed guess attributes determined for that chat room user.

The feedback window 1140 allows the local chat room user to provide feedback to the attribute analyzer 215 regarding known attributes of the other chat room users. The local chat room user's feedback is available for learning systems of the attribute analyzer 215 to make better educated guesses in the future. In the foregoing specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims

CLALMSWhat is claimed is:

1. A method of providing real-time, augmented information to a chat room user about one or more other users in the chat room, the method comprising the computer implemented steps of: receiving information transmitted by a user; analyzing the information to determine one or more analyzed guess attributes of the user; and displaying the one or more analyzed guess attributes to a local user.

2. The met hod of claim 1 , the method further including the computer implemented steps of: recording claimed attributes that appear in the information received from the user, the claimed attributes being facts about the user; and comparing the one or more analyzed guess attributes against the claimed attributes to determine the trustworthiness of the claimed attributes.

3. The method of claim 2, wherein the claimed and analyzed guess attributes include one or more of the following attributes: the age of the user; the gender of the user; the educational level of the user; and the region in which the user grew up.

4. The met od of claim 2, wherein the information is in the form of a character stream, the method further including the computer implemented steps of: performing lexical analysis to group the character stream into tokens; and performing syntax analysis on the tokens to determine the claimed attributes.

5. The method of claim 2, wherein the step of analyzing the information to determine one or more analyzed guess attributes of the user further includes the computer implemented steps of: recording metrics based upon the information transmitted by the user; and determining confidence levels associated with each of the one or more analyzed guess attributes.

6. The method of claim 2, the method further including the computer implemented step of verifying one or more of the claimed attributes with online resources.

7. The method of claim 2, the method further including the computer implemented step of prompting the local user to ask the user for a particul r piece of data that will facilitate the step of analyzing the informal ion to determine one or more analyzed guess attributes of the user.

8. The method of claim 5, wherein the information is in the form of a character stream, and wherein the step of recording metrics about the information transmitted by the user further includes the computer implemented steps of:

measuring the user's typing speed based upon the character stream;performing grammar analysis to maintain grammar metrics for the user; and performing spelling analysis to maintain spelling metrics for the user.

9. The met hod of claim 5, the method further including the computer implemented steps of: perfo╧Çiiing linguistic analysis on the information transmitted by the user; perfo╧Çinng metric analysis based upon the metrics; performing neural network analysis on the information transmitted by the user; and evaluating and weighting the linguistic analysis, the metric analysis, and the neural network analysis to produce the one or more analyzed guess attributes and the confidence levels.

10. The met hod of claim 6, wherein the step of verifying one or more of the claimed attributes with online resources further includes the computer implemented steps of: determining that the one or more claimed attributes are in a predetermined set of verifiable facts; and confirming the one or more claimed attributes with an appropriate online resource.

11. The method of claim 2, the method further including the computer implemented steps of alerting the local user if a discrepancy is found between any of the one or more analyzed guess attributes and a corresponding claimed attribute.

12. A method of determining whether a user is the same person from a previous chat session, the method comprising the computer implemented steps of: receiving one or more text messages from the user; determining analyzed guess attributes of the user and confidence levels associated with each of the analyzed guess attributes based upon the one or more text messages received from the user;

comparing the analyzed guess attributes with a user profile, the user profile having been generated from one or more previous sessions involving the user; and notifying a local user of results obtained from the step of comparing.

13. A method of claim 12 wherein the step of determining analyzed guess attributes of the user and confidence levels associated with each of the analyzed guess attributes based upon the one or more text messages received from the user further includes the computer implemented steps of: recording metrics about the text messages received from the user; and perfo╧Çning metric analysis to determine a first set of analyzed guess attributes, the metric analysis based upon the metrics recorded in the step of recording.

14. A method of claim 12 wherein the step of determining analyzed guess attributes of the user and confidence levels associated with each of the analyzed guess attributes based upon the text messages received from the user further includes the computer implemented step ol performing linguistic analysis to determine a first set of analyzed guess attributes, the linguistic analysis based on the text messages transmitted by the user.

15. A metht >< I of claim 12 wherein the step of determining analyzed guess att ibutes of the user and confidence levels associated with each ol the analyzed guess attributes based upon the text messages received from the user further includes the computer implemented step of performing neural network analysis to determine a first set of analy/.ed guess attributes, the neural network analysis based on the text messages transmitted by the user.

16. The methcxi of claim 12, wherein the user profile and analyzed guess attributes include one or more of the following attributes: the age of the user; the gender of the user; the educational level of the user; and the region in which the user grew up.

17. A metlx x I of claim 14, the method further including the computer implemented steps of: recording metrics about the text messages received from the user; performing metric analysis to determine a second set of analyzed guess attributes, the metric analysis based upon the metrics rec orded in the step of recording; and evaluating and weighting the first set of analyzed guess attributes and the second set of analyzed guess attributes to produce llie one or more analyzed guess attributes and the confidence levels.

18. A metluxl of claim 14, the method further including the computer implemented steps of: perfo╧Çiiing neural network analysis to determine a second set of analyzed guess attributes, the neural network analysis based on the text messages transmitted by the user; and evaluating and weighting the first set of analyzed guess attributes and the second set of analyzed guess attributes to produce the one or more analyzed guess attributes and the confidence levels.

19. The method of claim 14, the method further including the computer implemented step of prompting the local user to ask the user for a particulai piece of data that will facilitate the step of determining analyzed guess attributes of the user.

20. A metlx H I of determining the likelihood a particular text was written by a given person, the method comprising the computer implemented steps of: receiving a text in a computer readable form, the text having a purported author; receiving a profile of the purported author in a computer readable form; based upon the text, determining analyzed guess attributes associated the text's true author; comparing the analyzed guess attributes determined in the determining step with the profile of the purported author; and display iug the results of the step of comparing to the local user.

21. The method of claim 20, the method further including computer implemented step of determining confidence levels associated with each ol the analyzed guess attributes.

22. The met hod of claim 20, wherein the profile of the purported author includes one or more of the following attributes: the purported author's age; the purported author's gender; and the purported author's educational level.

23. A metlu >< I of determining whether a user is the same person from a previous chat session, the method comprising the computer implemented steps of: receiving one or more text messages from the user; compai nig inlormation obtained from the one or more text messages with a user profile, the user profile having been generated from one or more previous sessions involving the user; and notifying a local user of results obtained from the step of comparing.

24. The method of claim 23, wherein the user profile includes informal ion regarding one or more of the following: a set of data indicative of the typical process of opening a conversation for the user; a set of typical topics of conversation for the user; and a set of data indicative of the typical process of closing a conversation for the use i

25. The method of claim 23, wherein the user profile includes one or more gi p ical objects typically associated with the user, the method I in ther including the computer implemented step of: receiving graphical information from the user; and comparing the gi aphical information to the one or more graphical objects.

26. The method of claim 25, wherein the one or more graphical objects include an avatar employed by the user to graphically represent him/herself in a chat room.

27. The method of claim 25, wherein the one or more graphical objects include an icon employed by the user.