RECORDING AND TRANSCRIPTION SYSTEM
Field of the Invention
The present invention relates to a system for transcribing the spoken word and in particular, to a system which enables a multi-channel audio file such as may result from a court hearing to be efficiently transcribed by a typist, or group of typists without the need for the typist to manually determine which of a multiplicity of speakers relate to any particular portion of the audio file being transcribed at any given time.
Background of the Invention
For many years multi-channel analogue and digital recording and transcription systems have used parallel processes whereby multiple microphones have been used to record multiple channels of audio information at the same time. These systems are for example, common in courtrooms in order that a court transcript may be prepared for transcription. There may, for example, be a microphone for the judge, a separate microphone for the witness stand and a microphone for each of Defence and Prosecution counsel.
When an audio file compiled as abovementioned reaches a typist, the typist must transcribe the recording to produce a document with a text layout, format and syntax appropriate to the particular court concerned and, of course, each line of text in chronological order must be preceded, at the very least, by the name of the person who originally spoke the text. The lastmentioned information is usually obtained by
the typist from a log to which she must have constant reference whilst typing and which has been meticulously prepared by a person in the courtroom. Identifying the person who spoke the text being typed is not always easy, even with a log, as speakers at a particular microphone may change frequently such as when one witness finishes in the witness box and another commences. Furthermore, difficulties often occur when two persons and different microphones speak at the same time. The typist must manually enable and disable the different channels on the audio file which correspond to the separate microphones which originally record the information in an attempt to try to unscramble audio information which may become garbled if two parties speak at the same time during the original recording.
Transcription of such audio files is consequently quite labour intensive and, even transcription of a single channel audio file, may take 3 to 4 times the time taken to record the file whereas multi-channel transcription can often talce between 10 and 14 times the length of the original audio file to transcribe. Not only does this result in a time consuming and expensive transcript, but it often precludes a reliable transcript being returned to a court in time for persons involved in a trial process to refer to a reliable textual transcript whilst a trial is in progress or immediately thereafter.
The typist is also often required to keep track of the audio file time line in order to insert chronological information into the transcript.
It is also often difficult to remunerate typists for such work on a fair basis due to the fact that audio files differ greatly in complexity and the number of words typed does
not give a fair indication of the amount of work and time which has gone into the producing the transcript.
It is consequently an object of the present invention to ameliorate one or more of the above disadvantages associated with existing recording and transcription systems or at least, to provide the market with an alternative.
Summary of the invention
According to the present invention there is disclosed a recording system to facilitate transcription of audio input from an event comprising two or more audio input devices inputting to one or more recording devices; an audio packet creation device associated with the recording device adapted to separate discreet packets of audio input from each input device and assign to each packet an identifier unique to the input device through which the packet was created and a code indicating time of input contained within each packet relative to time of other packets created from the same event:
Preferred Mode of Carrying out the Invention
According to one embodiment of the present invention there is disclosed a recording and transcription system comprising digital recording means adapted to simultaneously receive and record input from multiple audio recording devices each originating from a separate channel. Software is provided to merge information from the multiple audio input channels into an Multi-Mono Intelligent Audio File (MMIAF) stream. The Multi-Mono Intelligent Audio File (MMIAF) stream is
playable with each original audio channel voice packet isolated and re-playable in clirono logical order with no-overlapping sections, as illustrated in Figure 1.
Microphones interfaced with appropriate software in the recording device are provided which are operable at the time of recording to facilitate the embedding and linking of information and/or markers unique to each speaker into recorded material from each channel or discrete sections of recorded material from each channel as illustrated in Figure 2.
Software is provided to facilitate utilising the embedded information markers for linking text typed from the Intelligent Audio File or database to both the relevant section of the audio file or database and the imbedded information therein including speaker identification such that the typist need not know, comprehend or type speaker identification or other imbedded information, but that such information may automatically appear in the finished text file in a predetermined format.
According to the abovementioned embodiment of the present invention each person who approaches a microphone or other input device for an intended recording would wear or carry a magnetically encoded card which, when brought into the proximity of a microphone with an appropriate card reader, would transmit the identity of the speaker adjacent to the microphone to a device associated with the recording in order
that audio signals recorded after the card reader sensed the proximity of a card would be attributed to the person to whom the card had been assigned. In a similar manner, the system may automatically be advised of the termination of a particular speaker's input when the system sensed the absence of that speaker's proximity card. This lastmentioned system in conjunction with a real-time clock associated with the recording could obviate the necessity for a person in a court to keep a log. Alternatively, speaker changes on any channel can be manually applied by a court monitor.
The microphones, being the input devices for the recording to be transcribed, would also be associated with an audio sensing device such that during periods of silence no audio is sent to the typist and hence a typist would not be required to waste time listening to periods of silence. This feature is incoiporated in the software of the recording device itself so that periods of silence are extracted after the initial recording has been made.
As each speaker inputs audio material into a given microphone the audio material is recorded in an audio file on a computer. As each audio packet is detected and extracted the header information listed below is added, to allow file management.
Speaker ID (Speaker Name) Channel ID
Custom Dictionary Sample Rate & Compression
Speaker Language (English, French etc) Transcription Priority
Start, time / date of audio packet Duration of packet
End time / date of audio packet Chronological Key
Court Location & Court Room No Return IP address for completed transcript
As audio is recorded into each channel, audio (when detected) is extracted as separate Intelligent Audio File packets which can be sent individually, sent as a multi-mono stream or accumulated into predefined payloads with all audio packets in chronological order so there are no overlapping speakers in the final mono audio file created. Essentially, we have programiriatically converted Complex audio (2 or more channels) into a Simple audio (mono) file in real-time. As a result, this audio file may then be used by a typist to generate a final transcript within a dedicated multi-mono text editor or with N1S Word via VBA integration between the multi-mono audio stream and MS Word. The resultant multi-mono audio file contains embedded information concerning the identity of the speaker, the start and stop time of the speaker and the matter concerned, all of which is not necessarily displayed to the typist. This information is then used to automatically fonnat the document which the typist is generating according to the preferred template and rules required to be applied by a particular court. The software, via which the typist generates text whilst transcribing the audio, may be designed to provide the typist with the inability to apply any form of text formatting to the document other than plain and formatted text.
All formatting is advantageously automatically applied to the text once the typist has completed the transcript. In doing so significant transcription time is saved whilst at the same time ensuring perfect fomiatting consistency, regardless of how
many typists do the work on the same transcript. Since the text is fully indexed to the audio a fully formatted transcript can be instantly created utilising a software function similar to that of "mail merge" whereby the text within the database for the indexed audio is automatically formatted based on a particular court template. Speaker names (which are not transcribed by the typist) are also added to the final transcript based upon information embedded in the original Intelligent Audio File.
It will be appreciated that utilising apparatus and a system in accordance with the present invention as above described, will result in a very simple task for a typist who no longer has to enable or disable various channels of a recording in order to hear a recording when two or more speakers speak at once in order to keep track of and determine who is speaking at any particular time, as illustrated in Figure 3.
The typist need furthermore not know the required format in which the transcript must be presented according to the rules of a particular court or tribunal. There is also no requirement for the typist to constantly have reference to a log.
Another feature of an Multi-Mono Intelligent Audio File such as can be generated in accordance with the present invention is that software may link the audio file to the text being typed. This assists proofreading of the final text document as a person reading text in a document may immediately recall the portion of the audio file which was used to generate that portion of the text. This feature may also be
very useful to counsel arguing a case during, for example, cross-examination as when the text transcript is returned to the court (which may now occur very quickly due to the efficiencies of the present invention) and a person queries whether they said what is actually typed in the transcript, the audio file can be replayed immediately by performing a search function on the relevant word or words of the transcript.
As the embedded information referable to each segment of input from each channel effectively indexes the relationship of that segment to the final document which is intended to be generated it is possible to store each segment separately in a database and then to readily reassemble them when required. This is the case whether the segment is in the form of a mere audio file or whether it is in the form of an audio file which has been transcribed and consequently has associated text This feature enables the use of multiple typists, if required, in order to compile a single transcript quickly.
From the typists perspective, a typist can readily see how many words they have typed and they know that they have not put a lot of time and effort into fonnatting the document in accordance with a particular template and keeping track of a log. Typists may therefore be remunerated in accordance with number of words which they have typed which is more likely to be perceived as fair by both the typist and their employer than an hourly rate in a situation where transcription has been complex compared to the amount of words in the final typewritten document.
The present invention has the additional advantage that when the file generated by the audio input is transmitted electronically the bandwidth required is kept to a minimum as multiple channels need not be sent in parallel but rather transmission is preferably by way of a single mono stream of Intelligent Audio Packets file separated by speaker identification markers. Furthermore, periods of silence may be deleted prior to transmission.
The Intelligent Audio Files generated by the present invention are well adapted for encryption should security be required.
Additionally, the invention is capable of adaptation to situations where re-voicing is utilised so as to enable the integration of voice recognition capabilities rather than just manual transcription.
The automated feature of the software used by the typist may also be used in order to add unique words which are specific to a particular matter as well as unusual names and addresses specific to a particular court case. These words and information are added prior to transcription and transmitted with the audio file to the typist. If new words are introduced these can automatically be detected from the resulting transcripts and added to a custom dictionary contained in the audio file during the court case. In this respect, the invention can be said to be self learning.
Rather than referring to text via page and paragraph which can cause discrepancies due to the difference in the manner in which some printers operate (resulting in a different number of lines per page), it is possible, in accordance with the present invention, to reference all text via unique paragraph numbers which are added by the program rather than the typist and consequently are independent of printer drivers and font type or size. This has not been previously practical because typists are currently needed to manually enter paragraph numbers. If multiple typists were to work on the same transcript in accordance with prior, art systems it is almost impossible, or at the very least, very time consuming, to synchronise paragraphs in the final document.
Speakers using different or varying languages can easily be transcribed, as speaker language is individually assignable for each channel, hence resulting audio packets sent for transcription have an internal language file set within them, ensuring only typists with the corresponding language skills will type them.
Traditionally, using a parallel method of transcription (tape or digital) if a section of audio is inaudible, all work from this point onwards is delayed. Using the Multi-Mono methodology, if a specific packet of audio is received that is beyond the skill of Hie typist (for whatever reason), they can simply abort this packet, whereby this packet can be automatically forwarded onto a supervisor who will hopefully, have greater skill and experience to transcribe what was said within this packet Thus, transcription past this point can continue without delay, while Hie packet causing issues can be resolved in parallel by a supervisor.
- l i ¬
lt will be appreciated that variations may be made to the invention as above described without departing from the scope and intendment thereof and for example the proximity card readers contemplated above (in order that each new witness may be identified to the system by way of a proximity card worn by the witness in conjunction with a card reader associated with each microphone) may be substituted with other types of card readers or even a manual system for generating speaker information which is to be transmitted as part of the audio file to the software utilised by the typist.