WO2006076347A2

WO2006076347A2 - System and method for recording network based voice and video media

Info

Publication number: WO2006076347A2
Application number: PCT/US2006/000798
Authority: WO
Inventors: Roland M. Chemtob; Thomas Soroka, Jr.; Peter B. Tilles; Stuart Barry Hart; Stuart William Leitch
Original assignee: Chemtob Roland M; Soroka Thomas Jr; Tilles Peter B; Stuart Barry Hart; Stuart William Leitch
Priority date: 2005-01-11
Filing date: 2006-01-11
Publication date: 2006-07-20
Also published as: WO2006076347A3

Abstract

Users provide contact information to the system. The system uses the contact information to contact the user and collect a digital sample of the user's speech/voice. The digital sample is stored as the user's registered voice signature. Device identification and/or location information may be obtained from the user's device. This information may be cross-checked against external databases for corroboration. Discrepancies are noted in a report that can be used for investigative purposes. The system stores the device information, voice signature and user provided information in associated fashion. The voice signature is used to verify a subsequently provided voice sample when a user wishes to record a communications session. Access to recording functionality is allowed if the voice signatures match. Recording is initiated provided that consent to recording is provided by one or more parties to the communications session. The confirmation of consent is recorded and stored by the system.

Description

SYSTEM AND METHOD FOR RECORDING NETWORK BASED VOICE AND VIDEO MEDIA

FIELD OF THE INVENTION

The present invention relates generally to a system and method for recording and storing network communications.

BACKGROUND OF THE INVENTION

Voice conversations (e.g. telephone calls) and video streams traverse communications networks in the form of digital, analog or other data streams. There is often a need or desire to record, store, retrieve and archive such data streams for legitimate purposes, and such acts may presently be accomplished using existing technology.

Unfortunately, there does not presently exist any sufficient security measures to thwart the intention of malicious users who would perform such acts for illegitimate purposes. Such malicious users can also execute a variety of acts that compromise the security, privacy, and service delivery to legitimate end users.

U.S. Patent No. 5,995,824 to Whitfield discloses an exemplary recording system for recording segments of a conversation while talking on a cellular telephone. Various other U.S. patents relate to voice and telephone call recording. However, there is no adequate solution that provides authentication, continued verification of valid users, and multiple party consent, nor is there a solution for denying service to malicious users from a network based recording system.

Further, relevant prior art patents do not address the recording of video media streams that can take place between one or more parties involved in a video conference, video call, video instant messaging ("chat"), video download or video gaming session.

SUMMARY

The present invention provides for Authentication, Verification, Consent and Fraud Reporting for Recording Network Based Voice and Video Media. More specifically, the present invention provides a system of processes implemented with software on computing platforms that are connected into communications networks. The software implemented processes determine an end user's eligibility to record a media session. Eligibility is determined by ensuring that valid account-holding end users are requesting service from an authorized communications device. This is accomplished by collecting, storing and comparing unique device identification and location information from an end user's device through the use of a network based device location discovery protocol. Any suitable device discovery protocol may be used. Accordingly, the system authenticates new account registrants, verifies legitimate account holding end users, provides a "consent to record" mechanism from multiple parties, deters malicious users from compromising the security and privacy of valid users' accounts, and report upon suspicious activity by binding communications signaling data with collected speech samples associated with the suspicious activity.

Thus, the system authenticates and verifies valid users of a communications network based Voice & Video (Media) Recording System. Various MRSs are well known in the art, and any suitable MRS may be used in conjunction with the present invention. Further, the system executes an automated consent mechanism upon activation from an end user and/or can be configured to execute based on originating and terminating communications consent laws. The system deters fraudulent use of a Media Recording Service through a combined process of end user identification, communication device discovery, speech analysis, automated audible announcements and tones, along with the reporting of suspicious activity. The suspicious activity reporting contains both logged activity data and associated speech samples in order to further aid in any investigation of malicious activity pertaining to a media recording system. Thus, the system is capable of authenticating registrants attempting to establish valid accounts, verifying that valid account holders are allowed to make call/media recordings, obtaining consent from multiple parties before recording is established, and preventing fraudulent abuse of a network based (voice and video) Media Recording Service. As an overview, a system in accordance with the present invention may include one or more of the following conceptual functional components: Authentication, Verification, Consent, and Fraudulent Activity Reporting.

The present invention fully associates the end user's registered personal information, their electronically stored voice sample and the device identifier from the device from which they are attempting to record from in order to establish a valid account onto a Media Recording System.

The present invention compares an end user's registered personal information and the device identifier from the device from which they are attempting to record a call/session, against an external database of record to confirm or deny permission to record an ensuing call or media session.

The present invention compares the voice sample of an attempted Media Recording System user with the stored voice samples of the claimed valid account holder in order to confirm or deny that the attempted user is a valid account holder of the Media Recording System and to confirm or deny permission to record an ensuing call or media session. The voice signature may be used by law enforcement or other investigative authorities to identify a perpetrator.

The present invention will prevent a voice and/or video recording system from recording upon un-successful authentication as a result of failed and/or incomplete registration information and its entry.

The present invention will allow a voice and/or video recording system to proceed with recording upon successful verification that registered account holding users have appropriately identified themselves every time the recording service is invoked. The present invention will prevent a voice and/or video recording system from proceeding with recording upon verification that attempted users have failed to appropriately identify themselves every time the recording service is invoked.

The present invention will allow a media session to be recorded only upon one or more parties indicating their consent to record the media session. The present invention will prevent a media session from being recorded upon the failure of one or more parties to indicate their consent to record the media session.

The present invention will allow a media session to be recorded only upon one or less parties indicating their consent to record the media session as a function of geographic and governmental consent laws.

The present invention will prevent a media session to be recorded only upon one or less parties indicating their consent to record the media session as a function of geographic and governmental consent laws. Upon detection of malicious activity that includes, but is not limited to masquerading as valid users, spoofing valid user registration and login information, interception of valid user recording streams, and any other flagrant fraudulent use of a communications network based recording system, the present invention will automatically compile and transmit reports of said malicious events through an interface that can be accessed and used by law enforcement. A report may be generated that will include the time of day, device identifier, person identity, and speech samples associated with suspicious activity.

BRIEF DESCRIPTION OF THE DRAWINGS The present invention will now be described by way of example with reference to the following drawings in which:

Figure 1 is a block diagram of a network system for recording network-based communications sessions, including the AVCF system;

Figure 2 is an architecture and information flow diagram that shows the relationships between the AVCF system's functions, their internal and external interfaces, as well as the information flow in and out of the system;

Figures 3 and 4 are flow diagrams illustrating an exemplary Authentication Process in accordance with the present invention;

Figure 5 is a flow diagram illustrating an exemplary Verification Process in accordance with the present invention;

Figure 6 is a flow diagram illustrating an exemplary Consent Process in accordance with the present invention;

Figure 7 is a flow diagram illustrating an exemplary alternative Consent Process in accordance with the present invention; and

Figure 8 is a block diagram showing an exemplary computer system within which various functionalities described herein can be fully or partially implemented.

DETAILED DESCRIPTION The present invention applies to all communication network based media recording systems that reside on a plurality of network technologies and network types that include but are not limited to TDM, VoIP, IP, ATM, Frame Relay, MPLS, Fixed-Mobile & Broadband Wireless, Ultra Wideband, Optical, Satellite, CATV, cable modem, xDSL, UMTS, OFDM, WiMAX, SMS, Video, Gaming, Analog & Digital Radio and all future local and/or wide area communications networks that are developed to support real-time voice conversations and video streaming sessions.

The present invention provides a set of tightly coupled processes that are implemented with software residing on a computing platform comprising conventional hardware, and referred to herein as the Authentication, Verification, Consent and Fraud Prevention System ("AVCF System") or the system 100, as best shown in Figures 1 and 2. The system 100 includes a processor/CPU, a memory operably connected to the processor, and instructions stored in the memory and executable by the processor to carrying out the method and functions described herein.

The system 100 also provides control communications with an existing Voice/Video (Media) recording system referred to as the Media Recording System (MRS) 110, control communications with externa! database systems 112, 114, 116 via Internet Protocol or other networks, control communications protocols with communications devices such as phones 120a, computers 120b, video cameras, PDAs 120c, etc. via a multitude of signaling and media networks commonly referred to as Public Internet networks 130, Public Switched Telephone Networks, and wireless networks, as well as privately managed communications networks that contain a plurality of technologies, collectively 140, as shown in Figures 1-7.

Hardware and software technologies for implementing such communications are generally known in the art, and are modifiable in a straightforward manner for operation in accordance with the present invention.

The present invention involves a combination of the physical system, the processes of authentication, verification, multi-party consent, reporting of fraud, and communicating with external (prior art) systems via publicly accessible networks. The system connects to a physical communications network that supports a plurality of communications protocols for control signaling, data, and transport of multi-media content. Using communications signaling protocols and database query/response protocols, the system communicates with external databases that are themselves, connected to physical networks that can be sub-networks of the public Internet or privately managed networks.

Using Internet communications protocols and computer security protocols, the system communicates with end user's personal computers that are themselves, connected to physical networks, which are also sub-networks of the public Internet. These end users include new Media Recording service registrants, valid Media Recording account holders, and any computer/Internet user who accesses the system web interface. Using a command signaling protocol and computer security protocols, the system communicates directly with a Media Recording System, that may be connected to the public Internet at a location that is remote to the system, or via a direct local communications network connection if the Media Recording System physically resides at the same location as the system. The system connects to a physical communications network that supports telephony communications protocols for call control signaling, call status, and media content. This physical communications network is part of the public switched wire-line/wireless/VolP/lntemet telephone network.

Using wire-line, wireless, and IP telephony communications protocols and call control protocols, the system communicates with end user's telephony capable devices, and/or personal computers that are themselves, connected to physical networks that may also be part of the public switched wire- line/wireless/VolP/lntemet telephone network. These end users include new Media Recording service registrants, valid Media Recording account holders, and any telephony/computer/internet user who accesses the system telephony interface.

Figure 2 is an architecture and information flow diagram that shows the relationships between the AVCF system's functions, their internal and external interfaces, as well as the information flow in and out of the system. Figure 2 also shows the relationship between the individual processes and the storage function of the system.

An end user's interaction with the AVCF system 100 typically begins with the Authentication process during which the user registers with and is enrolled as an authorized user of the system, as described hereinbelow, and as described in Figures 3 and 4.

AUTHENTICATION PROCESS

When a new user wishes to sign up for a Media Recording Service to record telephone calls and other communications sessions, the Authentication process of the AVCF system 100 in accordance with the present invention ensures that the user is whom he/she claims to be. There are numerous examples of prior art for electronically authenticating identity. However, the present invention is novel in that it binds, i.e. stores in association, collected textual identification information with voice signature (i.e. a recorded voice sample or an analysis of a voice sample) information from the end user, and with device identification data received through a (prior art) network based device discovery protocol.

Referring now to Figures 3 and 4, an exemplary method of communication is described. As referred to therein, the Authentication process begins with a new user's interaction with the AVCF system 100. In the exemplary embodiment of Figures 3 and 4, the new user/registrant provides personal identification information to the system 100 via a website or other device interface, etc., as shown at step 1 of Figure 3. As part of this process, the system 100 may prompt the user for, collect and store one or more of a new registrant's Name, Address, Phone Number, Social Security Number, Email address, and/or other Internet addresses that may be used for communication (such as URLs, IP addresses, additional email addresses, etc.) credit card number, credit card expiration date, any applicable credit card security code and any other security question/answer/phrase required by the system operator through a textual keypad or graphical interface. The system further prompts for, collects, and stores a registrant's chosen Username and Password through a textual keypad or graphical interface.

This information may be entered by the user and gathered by the system 100 in a conventional manner. In this embodiment, the system 100 also gathers username and password information, as shown at steps 2 and 3. Accordingly, the system 100 receives, via a communications network, a signal communicating personal identification information relating to the user. The personal identification information includes contact information for communicating with the user via a communications network.

Further, in accordance with the present invention, the system 100 may notify the user that it will "call" (i.e. via a telephone or other voice interface) the user, via one or more of the sets of contact information provided, to complete the registration and Authentication Process, as shown at step 2. Accordingly, the system may automatically call one or more of the entered phone number(s), email address(es), network addresses and URL(s) to collect spoken speech samples from the user through each of the registered communications channels/devices, in order to complete the registration process. In other words, the system initiates a communications session with the user via the communications network using the contact information. These communications devices include, but are not limited to home phone, work phone, cell phone, video conference phone, Internet Protocol (IP) Phone, computer, Personal Digital Assistants (PDA), personal computers, laptop computers, mobile entertainment devices and video game systems. Accordingly, voice signatures are obtained for various communications devices including VoIP phones, PSTN phones, PSTN phones connected to broadband , service via an adaptor, cellular phones, cell phones with VoIP capabilities, mobile entertainment devices with voice communications capabilities, computers with VoIP software and the like. An individual may even be reached, and a voice sample obtained, by using an email address to initiate a phone call using an ENLJM capability to bridge internet addressing with PSTN phone numbers. This may include telephone calls and/or Internet Protocol (IP) session initiated calls to all of the communication devices/contact information that are identified by the new user during the registration process. After the system initiates a call to the phone number, email/network address or URL that was entered by the registrant, it prompts the registrant to speak their name, address, and phone number of that communications device, email address, or network address(es) from which the recording service may be invoked (including Home, Office, Cell, remote etc.) and date of birth, etc. For example, the system 100 then dials the user's phone number and plays an audio script retrieved from a database requesting that the user speak their name, address, phone number, etc., as shown at step 4. Thus, the system requests from the user, via the communications session, a sample of the user's speech. The user answers the phone, etc. and speaks the requested identification information (e.g., name, address, etc.), as shown at step 5. Accordingly, the system 100 receives the user's speech sample via the communications network during the communications session. Optionally, during the communications session in which the user's voice sample is being collected, the system 100 polls or scans the communications device being presently used by the user to provide the voice sample to discover the device's identification and /or location information relating to the device, as shown at step 6. A conventional device discovery protocol may be used for this purpose. The device discovery protocol lets devices notify the communications network and other network connected systems of their existence. Each device on each port stores information defining itself, and sends updates to its directly connected networks and systems as needed. By way of example, the device identification information may include, but is not limited to, Internet (IP) addresses, Medium Access Control (MAC) Addresses, Vendor specific identifiers, hardware manufacturing numbers, software and firmware revision numbers, and dates of manufacture. By way of example, the device location information may include, but is not limited to, Internet network (IP) addresses, Latitude/Longitude coordinates, street address, network topology coordinates, network naming conventions associated with locations as well as any additional location information derived from the Global Positioning System (GPS). Device discovery protocols and techniques are well known in the art and are beyond the scope of the present invention. Any suitable technique may be used. In response, the user's communications device sends its identification and/or location information to the system 100 using the same device discovery protocol, as shown at step 7. In other words, the communications device transmits its identification and location information outward through its communications network. If the registrant is unable to answer the registered device when it is being called, an electronic message (i.e.; voicemail, email) will be left with directions to call back into the claimed system, so the registrant's voice sample can eventually be collected before a valid account can be opened.

The system 100 records the user's spoken information as the user's voice sample and stores it in the system's database 102 (Figure 2) in association with the user's profile, as shown at step 8, for future use in a Verification Process, as discussed below. More specifically, the system digitally samples and stores this registrant's voice sample as their registered voice signature in system memory. This digital voice signature is used for future verification when the user subsequently wishes to record a telephone call, media stream or other communications session. In embodiments in which the user's device's identification and/or location information is also gathered, such information is also stored in the system's database 102 in association with the user's profile. Thus, the system stores in its memory a digital sampling of the user's speech sample in association with the personal identification information provided by the user.

By way of further detail, the system 100 digitally samples the registrant's voice, and converts the time sampled voice data into a voice frequency signature by executing a Fast Fourier Transform (FFT) algorithm on the digital voice sample data, digitally processing the digital sampling of the user's speech to obtain a voice signature. The results of the FFT execution on the new registrant's spoken voice samples will yield a digital voice signature that is stored in an account holder voice signature database. In this manner, the system digitally processes the digital sampling of the user's speech to obtain a voice signature. The Verification Process subsequently queries this voice signature database when an attempt is made to have a call or session recorded from the corresponding legitimate account holder. The stored voice signature data is preferably attached to and associated with the personal identification information as well as device identifier information of the registrant. Preferably, a pre-recorded and stored audio announcement will be played to the user via the telephone, etc. to warn against malicious activity, e.g. by informing the user that it is unlawful to maliciously use the recording service for purposes other than its legal intent, and that violators will be prosecuted to the fullest extent of state, federal and international laws. The announcement preferably also makes it known that the user's voice sample will be part of valid registration, and will be maintained in a report used for investigation into unlawful use of this service if fraud is suspected, as shown at step 9. Another audio announcement may be played to inform the registrant that it will activate the Media Recording Service upon successful authentication that the entered credit card information, address, and phone numbers/addresses belong to the named registrant. The call is then terminated.

In the preferred exemplary embodiment of Figures 3 and 4, additional steps are performed during authentication. These additional steps heighten the level of confidence that the new user has properly identified himself. These additional steps include the cross-checking of information provided by the user during registration with information maintained in independent databases gathered and/or compiled independently. For example, the user's identification information and device identifier may be compared to one or more databases of record that contain personal identity data, and network device identity data. This may be performed by querying a database external to the system to obtain information relating to the user. Such databases, and/or access to such databases, are presently commercially available, e.g., on a pay-per-query or subscription basis. Exemplary databases of record include, but are not limited to, Banking and Credit Bureau databases, telephone number/street address verification databases, Municipality record databases, Service Provider subscriber databases, Real Estate records, and Internet Domain registries. In essence, these are independent sources of information that are referenced for corroborative purposes. For example, the system 100 may communicate with external systems and reference respective databases 112, 114, 116 to query the databases for types of information provided by the user during registration process, as shown at step 10. The external systems respond to the queries by transmitting to the system 100 appropriate information retrieved from their respective databases, as shown at step 11. Conventional Internet communications protocols, computer security technologies, and database query/response protocols may be used to facilitate communication between the system and external databases.

The system 100 then compares the information from the external databases with the personal and/or device identification information provided by the user during registration, as shown at step 12. A software implemented, rules-based analysis algorithm may be used to compare the responses from these external database queries and to determine the validity of the registrant's entered information by ensuring that the name, address, phone number, email address, URL and account numbers coincide and are corroborative of each other. In other words, the system determines whether the information obtained from the database corroborate the personal identification information provided by the user. An exact match may not be required for satisfactory corroboration, as will be appreciated by those skilled in the art.

Preferably, the system 100 will compare the account data that was entered into the system by the user against the data that was retrieved from the external databases of record and calculate a numerical authentication score value on a confidence scale of high to low levels. The greater the number of data matches, the higher degree of confidence the system will report. The system may be preprogrammed with a predefined threshold confidence value that must be exceeded to allow valid authentication by a new registrant.

If information from the external databases does not agree with/corroborate the personal identification information that was provided by the user to the system during the registration process and/or the device identification and location information that was collected during the registration process, i.e. the confidence threshold value is not exceeded, then the authentication process fails. Optionally, the system 100 may notify the new user via the provided contact information and prompt the user to try again and/or contact a human customer service representative, as shown at step 13a. Accordingly, a new user account is not created. In other words, the user is not authenticated and is not authorized to use the system to record communications sessions. If suspicious activity is suspected, the system compiles a report that can be used for investigative purposes.

If, however, the information from the external databases is determined to corroborate, using convention methods and technologies, the personal identification and/or device information, i.e., the confidence threshold value is exceeded, then a new user account is created for the user, as shown at step 13b. This may include the system's issuing of a control command to the MRS 110 to confirm that the user is now a valid account holder of the MRS and that subsequent attempts to use the MRS 110 by the user will be permitted, subject to execution of the system's Verification Process. Optionally, the system 100 may notify the user that the user account has been created and that the user can use the system to record communications sessions, as shown at step 13b. The individual's identity has thereby been authenticated, has been matched with voice samples, and the user is now a valid account holder and can begin using the Media Recording System.

Preferably, the system 100 records in data log form failed registration attempts, information discrepancies, and session parameters, then creates a report that identifies both the personal identification information of the failed registrant and/or the audio record (speech samples) associated with the failed attempts, as shown at step 14. Thus, the report indicates that there is no corroboration of the personal identification information. In any event, the audio record is preferably maintained for possible subsequent investigation of potential fraud, as discussed below.

In the embodiment shown, steps 4 - 12 are repeated to register additional devices, additional contact information, device information, etc. and to provide additional voice samples for each device, as shown at step 15. Preferably, multiple devices and/or contact information may be provided, but the system only collects a single voice sample via any one of the provided contact addresses. In such an embodiment, analysis of the single voice sample provides sufficient information to perform adequate authentication for all possible contact mediums. VERIFICATION PROCESS

After completion of the Authentication Process, the user is an authorized user of the system, and may used the system from time to time as a Calling Party to record a telephone call, media datastream or other communications session with a Called Party, as desired. To initiate recording of any particular communication session, the communication session is routed through the AVCF system 100. Accordingly, the user must log in and pass a verification process to confirm that the user that wishes to record is in fact an authenticated user properly entitled to use the system.

As shown in Figure 5, the Verification Process is initiated by a Calling Party's initiation of a connection with the AVCF system, e.g. by dialing a specified telephone number or address, using a conventional phone, Internet phone, or a plurality of handheld devices. Upon network connection between the calling device and the AVCF system 100, the system 100 analyzes the Calling Party's telephone, number and/or its communications address. The system 100 compares the device telephone or address with the registered telephone and/or addresses and confirms that this communications device has been registered with the system 100. The user may be prompted also to provide a usemame and password provided during the authentication process. This identifies to the system the user for which the Calling Party is requesting verification. If the device is not registered, the system 100 will play an announcement that requests a Usemame, Password, and a Personal Identification Number (PIN), etc. from the Calling Party. After connecting to the system, the Calling Party may provide, e.g., via a telephone, a telephone number of a Called Party, as shown at step 16 of Figure 5. When the Calling Party is using a communication device other than a conventional telephone (i.e., a computer, Internet Phone, or one of a variety of handheld devices), the system 100 operates in a manner similar to that for a conventional telephone. The communications device establishes a connection through the system 100 via standard prior art electronic communication protocols, then the Calling Party provides a URL or other address of the Called Party for completion of the call. The system responds by playing a prerecorded audio script prompting the Calling Party to speak the Calling Party's name, originating telephone number, originating URL, email, web address, etc., as shown at step 17. The Calling Party then speaks the requested information into the telephone or other voice-receiving interface to provide speech samples to the system 100, as shown at step 18. Thus, the system receives from the Calling Party, during a subsequent communications session, a new sample of the Calling Party's/user's speech.

The system 100 then digitally samples and stores the Calling Party's speech in its database, as shown at step 19. Further, the system 100 compares the Calling Party's speech sample with the user's speech sample that was collected for the Calling Party's registered device (phone number, IP address, web address, etc..) and for the user identified by the username and password provided by the Calling Party to the system 100. Thus, the new speech sample is compared to the old speech sample for a particular user. Further, the system 100 analyzes the Calling Party's speech sample to make the comparison. It is then determined whether the Calling Party's speech sample matches the user's speech sample provided during the Authentication Process. Conventional signal processing techniques may be used to analyzed the speech samples and determine whether there is a match. For example, the system may digitally sample the attempted user's voice, and convert the time sampled voice data into a voice frequency signature by executing a Fast Fourier Transform (FFT) algorithm on the voice samples. The results of the FFT execution on the attempted user's spoken voice samples will yield a digital voice file. This new voice verification file is compared with the voice signature file that was created upon registration. As the attempted user's speech is being sampled, the system uses a speech recognition analysis algorithm, such as a conventional Hidden Markov Modeling (HMM) algorithm to analyze the Calling Party's voice file against the stored voice signature of the authorized user.

Optionally, a verification score reflect a relative degree to which the signals are determined to match may be provided as a result of the comparison and analysis. Optionally, the system is configured with a verification score threshold parameter that may be set by system personnel to reflect a threshold above which there is considered to be a match, and below which there is considered to be no match. For example, if the result of the HMM analysis yields a voice signature match, the system will calculate a positive verification score for recognizing the speech of the attempted user. If the result of the HMM voice recognition analysis results in a mismatch, the system will calculate a negative verification score indicating a mismatch. The thresholds for determining voice sample verification matches are set by the personnel operating the system and are based on business rules.

Following the voice analysis, the system will convert the spoken name, phone number/Email address to a set of textual data. The system will then compare the converted textual name and phone number/address data with the name and phone/address data that was stored into the system at the time of user registration. If the result of the textual comparison analysis yields a telephone number/address match, the system will calculate a positive score for recognizing the spoken data of the attempted user. If the result of the textual comparison analysis results in a mismatch, the system will calculate a negative score indicating a mismatch.

If either of the voice signature analysis (involving the Calling Party's spoken speech sample as compared against the user's speech sample provided during the Authentication Process) or the textual comparison analysis yields a negative score, indicating a mismatch, then the claimed system will not authorize recording of a media session using the MRS 110, as shown at step 20b. Optionally, the system 100 may notify the Calling Party that he has not been verified as an authorized user of the system, and will prompt the Calling Party to try again. Optionally, after multiple failed attempts, e.g. 3 or another predetermined number of failed attempts, the system 100 will disallow future attempts for verification originating from the same telephone number, internet address, etc., as shown at step 20b.

If both the voice signature analysis and the textual comparison analysis result in positive scores, indicating an attempt to use the Media recording Service by a valid account holder, i.e., the Calling Party's spoken speech sample does match the user's speech sample provided during the Authentication Process, then the system permits the user to record a media session using the MRS 110, subject to any applicable Consent Process, as shown at step 20a. This involves the system's transmission of a control command signal to the MRS system 110 to allow the recording of subsequent telephone calls and/or media datastreams, subject to the Consent Process.

Optionally, in certain embodiments, if the speech samples match, the system will further compare the device identifier for the communications device presently being used by the Calling party with the device identifier(s) that was/were registered by the user during the Authentication Process, and will require a match before authorizing recording using the MRS 110. Optionally, if the device identifier does not match the device identification data associated with the user's voice signature, the system will prompt the user to speak a predetermined security phrase. If the voice signatures for both the user's name and security phrase match with the registered speech samples, then the user is allowed to initiate a call with the Media Recording Service. This allows an authenticated user to use the system from a remote device and location that was not identified during the Authentication Process.

The system 100 preferably creates a report of any failed attempts and generates a report as part of a Fraud Reporting Function. The report preferably includes one or more of the analysis results, the attempted user's call detail records, session signaling reports, the start and stop times of the failed attempt, along with any speech samples that were collected during the failed user verifications. The speech sample is stored in association with other information relating to the failed attempt, and can be retrieved and analyzed at a later date, if desired. In this manner, a record is provided for follow up, error checking, and/or fraud investigation. If the Verification Process results in authorization for the Calling Party to use the MRS 110 to record a communications session, then the system 100 initiates a call through the MRS 110, e.g. by forwarding the Called Party's telephone number provided by the Calling Party, by dialing out, transmitting the address of the Called Party, or performing other outbound signaling, as shown at step 22, to establish the requested telephone call, video/media datastream or other communications session. In certain embodiments, the MRS 110 may then initiate a telephone call or media communications session, such as a two-way video call, and record the communications session in a conventional manner. A conventional, prior art MRS is suitable for this purpose, and any suitable MRS may be used. The call/session is established through the MRS so that the media content can be recorded upon proper consent.

CONSENT PROCESS

In the exemplary embodiment shown in Figures 1-6, the Calling Party and/or Called Party's consent to record the communications session is required before the MRS will record the communications session. In the exemplary embodiment of Figure 6, conventional MRS hardware is specially configured with special purpose software in accordance with the present invention to monitor and begin recording based on the Consent Process described below. The MRS will begin recording upon receiving an appropriate command from the system 100 in step 32 upon successful consent. If this command does not arrive from the system 100, recording is not initiated. In alternative embodiments, a similar Consent Process is carried out by the system 100, and control is forwarded to a conventional MRS to perform conventional recording without the need of special configuring of the MRS. In other alternative embodiments, no consent is required. In other words, the MRS will permit recording after a valid user of the system is verified, without the need for the Calling Party, the Called Party, or both to first provide consent to recording as described below. This alternative embodiment may be appropriate, for example, in geographical areas where local laws do not require consent for recording, or when the sessions recorded are not of a type for which local laws require consent for recording. Referring now to the exemplary embodiment of Figure 6, the MRS 110 first acknowledges that a communications session is being initiated on behalf of an authenticated and verified user, as shown at step 23. For example, this is achieved by the system 100 sending a command in signal form to the MRS 110 telling the MRS 110 that the Calling Party is a valid user, but that the MRS 110 should not begin recording until the Consent Process has successfully obtained the Calling and/or Called Party's Consent. Accordingly, this step involves the MRS 110 sending a signal confirm that it received a command from the system 100. The system 100 then signals the MRS 110 to complete the initiation of the communications session, e.g. by dialing the Called Party's telephone number, etc., as shown at step 24.

In response, the Called Party's device and/or network acknowledges that the communications session has been initiated and the communication session is established, as shown at steps 25 and 26. This is performed in an entirely conventional manner. The system can be configured to obtain consent from the Calling Party and/or Called Party in any suitable manner. In some embodiments, consent from both parties is required. In other embodiments, consent from only one party is required. In certain embodiments, either the Calling Party, Called Party or both must enter a code signifying consent. In certain embodiments, the Calling Party verbally asks the Called Party to provide consent and provides instructions to do so. In other embodiments, the system 100 plays a pre-recorded audio script, etc. to prompt the Called Party to provide consent. Other suitable alternatives will be appreciated by those skilled in the art. In the exemplary embodiment of Figure 6, the Calling Party advises the

Called Party that the Called Party's consent will be requested, and, in the example of a telephone call, asks the user to verbally agree to provide such consent, as shown at step 27. If the Called Party verbally agrees to consent, as shown in step 28, then the Calling Party, who is an authorized user of the system 100, enters a known or specified key sequence (or graphical indicator, such as an on screen button, pull down menu option etc.) into their communications device to initiate the system's Consent Function, as shown at step 29. Optionally, the key sequence may be a pre-defined key sequence that is unique to the registered user. In this embodiment, this is taken to be the Calling Party's consent to the recording of the communications session.

Accordingly, the system 100 plays an automated audible announcement or graphical display to both the Calling Party and the Called Party that asks one or both parties to signify their consent that the called party understands that the call will be recorded only upon their consent, and that they are providing permission for the recording to proceed by entering a specific key sequence (e.g., 35779) and/or by making a choice on a graphical indicator using their respective communications devices, as shown at step 30. This announcement will ensure that both parties are aware of the possibility that this call/session can be recorded, thus enabling them to take the appropriate actions to obtain consent.

In response, the Called Party enters the consent indicator using the communications device, as shown at step 31. For example, DTMF tones may be entered, and those tone are preferably recorded by the MRS and/or the system 100 for confirmatory purposes. It should be noted that such tones can be stored in a manner confirming that the tones originated from the respective party providing consent. If the Calling Party and/or the Called Party does not provide consent in the prescribed manner, the call ends, e.g., after a prescribed time.

After receiving all required parties' consent in the prescribed manner, the system 100 transmits signals to the MRS to cause the MRS to begin digitally recording and storing the communications session (voice conversation or media streams) in both directions between the Calling party and the Called party, as shown at step 32. If the system does not receive the positive response indicating consent from the called party, then the system will not allow the Media Recording System to record, and will maintain the established communications session between the parties over the network.

The communications session then follows, and may eventually be terminated by either party, as shown at step 33. After the communications session has been terminated, the system 100 recognizes the termination and transmits a signal to the MRS to cause the MRS to stop recording of the communications session. Optionally, the system 100 again confirms that both parties provided consent in the prescribed manner, and then commands the MRS (by transmitting appropriate signals) to store the recorded communications session for later retrieval, as shown at step 34. For example, the system analyzes the call signaling and content data streams that took place in both directions in search of the automated announcement, applicable usage rules, the consent initiation sequence and the consent positive response indication. If the system acknowledges that these sequences occurred satisfactorily and the call/session complied with applicable usage rules, it commands the Media Recording System to store the entire conversation/session in a file directory for retrieval by valid account holders. If the consent initiation, consent positive responses and rules based consent signaling sequences are not found, then any recorded file is then placed into a "Fraud Queue" by the Media Recording System, as discussed in greater detail below. This suspected fraudulent recording will remain in the Fraud Queue until investigative activity is completed.

Figure 7 depicts an exemplary flow of an alternative Consent Process. Referring now to Figure 6 and 7, steps 35-38 are identical to steps 23-26. However, in this embodiment, the system 100 includes a rules-based software controlled consent computing engine. The consent computing engine has information stored in its database that defines applicable consent laws applicable to various geographic locations, including those of the individual states in the U.S. For example, local laws may require multi-party consent or only single party consent, and may vary among various locations.

After a two-way communications session is established, the rules based consent computing engine determines the geographic locations of both the Calling Party and the Called Party, as shown at step 39. For example, the location determination is accomplished by evaluating communication parameters like, but not limited to phone numbers, network addresses, network device discovery protocol data, country codes, Global Positioning Satellite data, Vertical-Horizontal coordinates, physical addresses and postal codes.

The system then queries its laws database to determine which consent laws apply to the Calling Party and the Called Party, as a function of their respective locations, as shown at step 40. Consent laws are entered into the system's internal database by either manual entry or via an automated interface prior to its execution on calls/sessions. If the recording consent laws state that single or no-party consent is required then the system will allow the established communications session to be recorded without any further interaction. In the event of single party consent, the system 100 considers the Calling Party to have provided the single party consent because they are initiating the call to be recorded, and no further consent is required.

If the consent laws require multi-party consent, then the system's rules engine instructs the system to play a prerecorded announcement or provide a graphical display to both parties, alerting one or both parties of the terms on which the communications session may be recorded, e.g. upon consent from all parties, at least one party, etc., as shown at step 41.

If the rules based engine determines that multi-party consent is not required then recording can be initiated by the Calling Party without the knowledge of the Called Party. If it is determined that multi-party consent is required, both parties will receive an appropriate announcement instructing the Calling and Called Parties to indicate their consent or non-consent by entering a specific key sequence and/or graphical indicator. The parties on the call can choose to consent and have the call/session recorded, or they can choose not to consent and continue with the established communications without recording.

If all or both parties choose to consent and provide such consent to the system 100 in the prescribed manner, e.g. as indicated in the announcement, then the system 100 transmits a signal to the MRS 110 to initiate recording (digitally sampling and storing) each direction of the call/session, as shown at steps 42a, 42b and 43. If the required consent is not received, then the call is permitted to continue without recording, as shown at step 43. Preferably, an announcement is played to both parties by the system to confirm that the communications session is not being recorded.

The communications session may then be terminated by either party, and the system 100 causes the MRS to stop record, store the recording, etc. in a manner similar to that described above with reference to steps 33 and 34, as shown at steps 44 and 45.

FRAUDULENT ACTIVITY REPORTING Although prior art exists pertaining to the detection of fraudulent activities with respect to web based services, the inventive system is novel in that it generates a report that contains the session signaling characteristics and binds the report with the collected voice samples of a failed attempt. This Fraudulent Activity Reporting process begins upon the detection of any mismatches during the Authentication and Verification processes, that may indicate unlawful abuse of a valid user's account or if a malicious user were to set up and attempt to use the Media Recording System from an unauthorized account by using compromised personal information of an unknowing person or entity. For example, during the Consent Process, the system analyzes call signaling and content data streams that took place in both directions in search of the consent initiation sequence and the consent positive response sequence, as discussed above with respect to steps 34 and 45. If the consent initiation sequences, execution of applicable usage rules, and consent positive responses are not found or in suspect form, then any recorded file is then placed into a "Fraud Queue" by the Media Recording System. This suspected fraudulent recording will remain in the Fraud Queue until investigative activity is completed and will not be made available for retrieval.

A software implemented, rules based algorithm within the system will monitor call attempts and voice sample information and will determine if suspicious activity is suspected. For example, mismatches between names, addresses, phone numbers, email addresses, etc. and/or other irregularities may be monitored and be deemed suspicious when exceeding a predetermined threshold level. If the suspicious activity is suspected, the system will compile a report binded to the collected speech samples to be used for investigative purposes.

By way of further example, the systems stores collected voice samples from a caller during the Authentication and Verification Processes, and failed attempts may be indicative of suspicious activity.

If any voice analysis or the identification information comparison yields a discrepancy that warrants denial of the Media Recording Service, the system will prompt the user that they do not match the security parameters of a "Valid User". The user will be prompted by the system to contact the Media Recording Service Provider for assistance and will terminate the communications session. The system will store the voice samples of the failed attempt to access the Media Recording Service for comparison with future attempts. The system voice analysis algorithms will account for and consider differences in voice samples based on a person's health, age, dialect, accent, frequencies, and the digital signal characteristics that make up the compared voice samples. The maintenance of a record of communications session parameters, e.g. phone numbers, IP address, URLs, emails, etc. may be used to identify wrongdoers, and the collected voice sample may provide voice signature information that may be used to positively identify a particular wrongdoer. Such records are maintained by the system and may be provided upon request to law enforcement or similar authorities upon request.

COMPUTER PLATFORM

Figure 8 is a block diagram showing an exemplary computer 200 of a type that may be used for the system 100, within which various functionalities described herein can be fully or partially implemented. Computer 200 can function as a server, a personal computer, a mainframe, or various other types of computing devices. It is noted that computer 200 is only one example of computer environment and is not intended to suggest any limitation as the scope or use or functionality of the computer and network architectures. Neither should the example computer be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in Fig. 8.

Computer 200 may include one or more processors 202 coupled to a bus 204. Bus 204 represents one or more of any variety of bus structures and architectures and may also include one or more point-to-point connections.

Computer 200 may also include or have access to memory 206, which represents a variety of computer readable media. Such media can be any available media that is accessible by processor(s) 202 and includes both volatile and non-volatile media, removable and non-removable media. For instance, memory 206 may include computer readable media in the form of volatile memory, such as random access memory (RAM) and/or non-volatile memory in the form of read only memory (ROM). In terms of removable/non-removable storage media or memory media, memory 206 may include a hard disk, a magnetic disk, a floppy disk, an optical disk drive, CD-ROM, flash memory, etc. Any number of program modules 112 can be stored in memory 206, including by way of example, an operating system 208, off-the-shelf applications 210 (such as e-mail programs, browsers, etc.), program data 212, the software application at least partially implementing the present invention being referred to as reference number 113 in Figure 8, and other modules 214. Memory 206 may also include one or more persistent stores 114 containing data and information enabling functionality associated with program modules 112.

A user can enter commands and information into computer 200 via input devices such as a keyboard 216 and a pointing device 218 (e.g., a "mouse"). Other device(s) 220 (not shown specifically) may include a microphone, joystick, game pad, serial port, etc. These and other input devices are connected to bus 204 via peripheral interfaces 222, such as a parallel port, game port, universal serial bus (USB)₁ etc.

A display device 222 can also be connected to computer 200 via an interface, such as video adapter 224. In addition to display device 222, other output peripheral devices can include components such as speakers (not shown), or a printer 226.

Computer 200 can operate in a networked environment or point-to-point environment, using logical connections to one or more remote computers. The remote computers may be personal computers, servers, routers, or peer devices. A network interface adapter 228 may provide access to network 104, such as when network is implemented as a local area network (LAN), or wide area network (WAN), etc.

In a network environment, some or all of the program modules 112 executed by computer 200 may be retrieved from another computing device coupled to the network. For purposes of illustration, the operating program module 113 and other executable program components, such as the operating system, are illustrated herein as discrete blocks, although it is recognized that such programs and components reside at various times in different storage components remote or local, and are executed by processor(s) 202 of computer 200 or remote computers.

PROGRAM MODULE

Techniques and functionality described herein may be provided in the general context of computer-executable instructions, such as program modules, executed by one or more computers (one or more processors) or other devices. Generally, program modules include routines, programs, objects, components, data structures, logic, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments, to carry out one or more of the methods, or combinations of steps of the methods, described herein. It is noted that a portion of a program module may reside on one or more computers operating in a system.

An implementation of these modules and techniques may be stored on or transmitted across some form of computer readable media as a computer program product. Computer readable media can be any available media that can be accessed by a computer. By way of example, and not limitation, computer readable media may comprise volatile and non-volatile media, or technology for storing computer readable instructions, data structures, program modules, or other data.

Having thus described particular embodiments of the invention, various alterations, modifications, and improvements will readily occur to those skilled in the art. Such alterations, modifications and improvements as are made obvious by this disclosure are intended to be part of this description though not expressly stated herein, and are intended to be within the spirit and scope of the invention.

Accordingly, the foregoing description is by way of example only, and not limiting.

Claims

What is claimed is:

1. A method for authenticating a user as an authorized user of a computerized system, the method comprising the system performing the following: receiving, via a communications network, a signal communicating personal identification information relating to the user, the personal identification information comprising contact information for communicating with the user via a communications network; initiating a communications session with the user, the communications session being initiated via the communications network using the contact information; requesting from the user, during the communications session, a sample of the user's speech; receiving the user's speech sample during the communications session; and storing in a memory of the system a digital sampling of the user's speech sample, the digital sampling being stored in the memory of the system in association with the personal identification information provided by the user.

2. The method of claim 1 , further comprising: digitally processing the digital sampling of the user's speech to obtain a voice signature.

3. The method of claim 1 , further comprising: receiving from the user, during a subsequent communications session, a new sample of the user's speech; and comparing the new sample of the user's speech to the sample of the user's speech; and permitting use of the system if the new sample matches the sample.

4. The method of claim 3, wherein permitting use of the system comprises transmitting a signal to a media recording system to permit the user to record a communications session.

5. The method of claim 4, wherein the communications session comprises a telephone call between a Calling Party and a Called Party.

6. The method of claim 4, wherein the communications session comprises a media datastream between a Calling Party and a Called Party.

7. The method of claim 1 , further comprising: querying a database external to the system to obtain information relating to the user; and comparing the information obtained from the database to the personal identification information provided by the user; and determining whether the information obtained from the database corroborates the personal identification information provided by the user.

8. The method of claim 7, further comprising: creating a report identifying both the voice sample and the personal identification information, the report indicating that there is no corroboration of the personal identification information, if it is determined that the information obtained from the database does not corroborate the personal identification information provided by the user.

9. The method of claim 7, wherein the determining comprises calculating a numerical confidence score and comparing the score to a predefined threshold value.

10. The method of claim 1 , further comprising: gathering from a first communications device used by the user during the communications session, device identification information or device location information; and storing in the memory of the system the device identification information or device location information, the device identification information or device location information being stored in the memory of the system in association with the personal identification information provided by the user.

11. The method of claim 3, further comprising: gathering from a first communications device used by the user during the communications session, device identification information or device location information; storing in the memory of the system the device identification information or device location information, the device identification information or device location information being stored in the memory of the system in association with the personal identification information provided by the user; gathering from a second communications device used by the user during the subsequent communications session, device identification information or device location information; and permitting use of the system if the device identification information or device location information gathered from the second communication device matches the device identification information or device location information stored in the memory of the system in association with the user.

12. The method of claim 11 , further comprising: creating a report identifying the new voice sample and the device identification information of device location information gathered from the second communications device, the report indicating that the device identification information or device location information gathered from the second communication device does not match the device identification information or device location information stored in the memory of the system in association with the user, if it is determined that device identification information or device location information gathered from the second communication device does not match the device identification information or device location information stored in the memory of the system in association with the user.

13. The method of claim 1 , wherein the user provides a plurality of sets of contact information, the method further comprising: initiating a plurality of communications sessions with the user, each communications session being initiated via the communications network using a respective one of the plurality of sets of contact information; requesting from the user, via each respective communications session, a respective sample of the user's speech; receiving the respective user's speech samples via the respective communications sessions; and storing in the memory of the system a respective digital sampling of each respective user's speech sample, the digital samplings being stored in the memory of the system in association with the personal identification information provided by the user.

14. A method for obtaining consent to record a communications session between a Calling Party and a Called Party using a computerized system, the method comprising the system performing the following: receiving from a communications device operated by the Calling Party, via a communications network, a signal of a predefined type, said signal being provided by the Calling Party to request recording of the communications session; playing a pre-defined announcement requesting that the Called Party operate a communications device of the Called Party to provide a specified signal signifying the Called Party's consent to recording of the communications session; receiving from the communications device operated by the Called Party, via the communications network, the specified signal signifying the Called Party's consent; recording and storing in a memory of the system the specified signal received from the Called Party; and permitting a media recording system to record the communications session.

15. The method of claim 14, further comprising: recording the communications session; and analyzing the recorded communications session to verify receipt of the Called Party's consent; and storing the recorded communications session in the memory of the system for later retrieval if receipt of the Called Party's consent is verified.

16. The method of claim 14, wherein permitting the media recording system to record the communications session comprises transmitting a control signal to the media recording system.

17. The method of claim 14, wherein the signal of the predefined type comprises one of a key sequence, a series of DTMF tones, or a signal transmitted upon selection of a prescribed graphical indicator in a graphical user interface.

18. The method of claim 14, wherein the specified signal comprises one of a key sequence, a series of DTMF tones, or a signal transmitted upon selection of a prescribed graphical indicator in a graphical user interface.

19. The method of claim 14, wherein the pre-defined announcement comprises an audible speech signal.

20. The method of claim 14, wherein the pre-defined announcement comprises a display via a graphical user interface.

21. A method for obtaining consent to record a communications session between communications devices of a Calling Party and a Called Party using a computerized system, the method comprising the system performing the following: determining a geographic location of the respective communications device of at least one of the Calling Party and the Called Party; referencing a database to identify a consent protocol applicable to the at least one of the Calling Party and the Called Party; playing a pre-defined announcement to the at least one of the Calling Party and the Called Party, the announcement requesting operation of a respective communications device to provide a specified signal signifying consent to recording of the communications session; receiving from the Calling Party's communications device, via the communications network, the specified signal, if any; receiving from the Called Party's communications device, via the communications network, the specified signal, if any; recording and storing in a memory of the system the specified signal received from the Calling Party, if any; recording and storing in a memory of the system the specified signal received from the Called Party, if any; and permitting a media recording system to record the communications session only if all required specified signals are received to comply with the applicable consent protocol for the Calling Party and the Called Party.

22. A system for authenticating a user as an authorized user of a computerized system, the system comprising: a processor; a memory operably connected to the processor; and instructions stored in the memory and executable by the processor to cause said system to carry out the method of claim 1.

23. A computer program product embodied on one or more computer- readable media, the computer program product comprising computer readable program code configured to carry out the method of claim 1.

24. A system for authenticating a user as an authorized user of a computerized system, the system comprising: a processor; a memory operably connected to the processor; and instructions stored in the memory and executable by the processor to cause said system to carry out the method of claim 14.

25. A computer program product embodied on one or more computer- readable media, the computer program product comprising computer readable program code configured to carry out the method of claim 14.

26. A system for authenticating a user as an authorized user of a computerized system, the system comprising: a processor; a memory operably connected to the processor; and instructions stored in the memory and executable by the processor to cause said system to carry out the method of claim 21.

27. A computer program product embodied on one or more computer- readable media, the computer program product comprising computer readable program code configured to carry out the method of claim 21.