US20090024183A1 - Somatic, auditory and cochlear communication system and method - Google Patents

Somatic, auditory and cochlear communication system and method Download PDF

Info

Publication number
US20090024183A1
US20090024183A1 US11/997,902 US99790206A US2009024183A1 US 20090024183 A1 US20090024183 A1 US 20090024183A1 US 99790206 A US99790206 A US 99790206A US 2009024183 A1 US2009024183 A1 US 2009024183A1
Authority
US
United States
Prior art keywords
sequence
phonemes
sound
sounds
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/997,902
Inventor
Mark I. Fitchmun
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Somatek Inc
Original Assignee
Fitchmun Mark I
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fitchmun Mark I filed Critical Fitchmun Mark I
Priority to US11/997,902 priority Critical patent/US20090024183A1/en
Publication of US20090024183A1 publication Critical patent/US20090024183A1/en
Assigned to Somatek reassignment Somatek ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: FITCHMUN, MARK I.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61NELECTROTHERAPY; MAGNETOTHERAPY; RADIATION THERAPY; ULTRASOUND THERAPY
    • A61N1/00Electrotherapy; Circuits therefor
    • A61N1/18Applying electric currents by contact electrodes
    • A61N1/32Applying electric currents by contact electrodes alternating or intermittent currents
    • A61N1/36Applying electric currents by contact electrodes alternating or intermittent currents for stimulation
    • A61N1/36036Applying electric currents by contact electrodes alternating or intermittent currents for stimulation of the outer, middle or inner ear
    • A61N1/36038Cochlear stimulation
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61NELECTROTHERAPY; MAGNETOTHERAPY; RADIATION THERAPY; ULTRASOUND THERAPY
    • A61N1/00Electrotherapy; Circuits therefor
    • A61N1/02Details
    • A61N1/04Electrodes
    • A61N1/05Electrodes for implantation or insertion into the body, e.g. heart electrode
    • A61N1/0526Head electrodes
    • A61N1/0541Cochlear electrodes
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61NELECTROTHERAPY; MAGNETOTHERAPY; RADIATION THERAPY; ULTRASOUND THERAPY
    • A61N1/00Electrotherapy; Circuits therefor
    • A61N1/18Applying electric currents by contact electrodes
    • A61N1/32Applying electric currents by contact electrodes alternating or intermittent currents
    • A61N1/36Applying electric currents by contact electrodes alternating or intermittent currents for stimulation
    • A61N1/36036Applying electric currents by contact electrodes alternating or intermittent currents for stimulation of the outer, middle or inner ear
    • A61N1/36038Cochlear stimulation
    • A61N1/36039Cochlear stimulation fitting procedures
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/16Transforming into a non-visible representation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L2021/065Aids for the handicapped in understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2225/00Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
    • H04R2225/43Signal processing in hearing aids to enhance the speech intelligibility
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
    • H04R25/60Mounting or interconnection of hearing aid parts, e.g. inside tips, housings or to ossicles
    • H04R25/604Mounting or interconnection of hearing aid parts, e.g. inside tips, housings or to ossicles of acoustic or vibrational transducers
    • H04R25/606Mounting or interconnection of hearing aid parts, e.g. inside tips, housings or to ossicles of acoustic or vibrational transducers acting directly on the eardrum, the ossicles or the skull, e.g. mastoid, tooth, maxillary or mandibular bone, or mechanically stimulating the cochlea, e.g. at the oval window

Definitions

  • the invention relates to somatic, auditory or cochlear communication to a user, and more particularly, to somatic, auditory or cochlear communication using phonemes.
  • Phonemes are the speech sounds that form the words of a language, when used alone or when combined. More precisely, a phoneme is the smallest phonetic unit, or part of speech that distinguishes one word from another.
  • Various nomenclatures have been developed to describe words in terms of their constituent phonemes. The nomenclature of the International Phonetic Association (IPA) will be used here.
  • IPA International Phonetic Association
  • examples of speech, speech sounds, phonetic symbols, phonetic spellings, and conventional spellings will be with respect to an American dialect of English, hereto forth referred to simply as English. The principles can be extended to other languages.
  • FIG. 1 illustrates several exemplary plots 100 to introduce several spectral and temporal features of human speech through the examination of the English word, “fake”, 105 , and its component phonemes.
  • the phonetic spelling (per the IPA) of the English word, “fake”, 105 is “faik”, 110 .
  • the word comprises three separate phonemes: the consonant, “f”, 142 ; the diphthong vowel, “ai”, 191 ; and the consonant, “k”, 107 .
  • phonemes are language and dialect dependent, an English speaker will hear “ai” as a single sound, “long A”, 191 , a diphthong (a sound combining two vowel sounds), while speakers of other languages may hear two different vowels, “a”, 113 , and “i”, 114 , each a monophthong (a single vowel sound).
  • the phoneme, “k”, 107 also comprises two parts: a short period of relative silence, 117 ; followed by the abrupt appearance of sound frequencies in a range of about 2500 to 7000 Hz, 118 .
  • Spectral and temporal features of the individual phonemes are partially observable when viewing a plot of the waveform 140 of the spoken word.
  • pressure is shown on the vertical axis and time is shown on the horizontal axis.
  • a spectrogram 120 reveals greater detail and structure.
  • frequency is shown on the vertical axis, time on the horizontal axis, and power is represented as a grey scale, with darker shades corresponding to higher power (sound intensity) levels.
  • the consonants “f”, 142 , and “k”, 107 primarily consist of sound frequencies above approximately 3000 Hz, while the vowel “ai”, 191 , primarily consists of sound frequencies below approximately 3500 Hz.
  • the highlighted areas of the spectrogram 132 , 134 , 138 reveal additional features of human speech.
  • An early portion of the phoneme “f”, 132 , magnified in panel (A), 133 comprises sound frequencies predominantly above 3000 Hz.
  • the distribution of power is irregular over time and frequency giving rise to a sound quality resembling rushing air, and creating the granular pattern on the spectrogram 132 , 133 .
  • the highlighted portion of the phoneme “ai”, 134 , magnified in panel (B), 135 shows a bimodal distribution of relatively low sound frequencies. Characteristic of diphthongs, one or more dominant frequencies, called “formants”, shift in frequency over time. A portion 136 of panel (B), 135 , magnified further in panel (D), 137 , reveals a waxing and waning of power in all frequencies, a characteristic of the human voice. Unvoiced phonemes such as “f”, 142 , 132 , 133 , and “k”, 107 , 118 , 138 , 139 , do not exhibit these cyclical amplitude fluctuations.
  • Some phonemes increase or decrease in power or intensity over their duration. This is evident in the highlighted portion of the phoneme “k”, 138 , magnified in panel (C), 139 . Here, sound energy decreases continually during a period of about 70 milliseconds.
  • the phoneme “k”, 107 comprises approximately 70 milliseconds of quiet 117 followed by the audible portion 118 of the phoneme “k”, 107 . Without this period of relative silence, some phonemes, including “k” would be unintelligible. Also, intervals of relative silence or power shifts are important for syllabification.
  • FIG. 2 is a table 200 of American English phonemes 225 shown in three nomenclatures: the International Phonics Association (IPA), s ⁇ mpA (a phonetic spelling of SAMPA, the abbreviation for Speech Assessment Methods Phonetic Alphabet, a computer readable phonetic alphabet), and the Merriam Webster Online Dictionary (m-w). Examples 226 of each phoneme (bold underlined letters) as used in an American English word are provided, along with the manner 237 and place 247 of articulation 227 .
  • IPA International Phonics Association
  • s ⁇ mpA a phonetic spelling of SAMPA, the abbreviation for Speech Assessment Methods Phonetic Alphabet, a computer readable phonetic alphabet
  • m-w Merriam Webster Online Dictionary
  • the manner of articulation 237 refers primarily to the way in which the speech organs, such as the vocal cords, tongue, teeth, lips, nasal cavity, etc. are used.
  • Plosives 201 , 204 , 207 , 211 , 214 , 217 are consonants pronounced by completely closing the breath passage and then releasing air.
  • Fricatives 242 , 243 , 244 , 245 , 250 , 252 , 253 , 254 , 255 are consonants pronounced by forcing the breath through a narrow opening.
  • Between the plosives, and the fricatives are two affricates 224 , 234 composite speech sounds that begin as a plosive and end as a fricative.
  • Nasals 261 , 264 , 267 are consonants pronounced with breath escaping mainly through the nose rather than the mouth.
  • Approximants 274 , 275 , 276 , 271 are sounds produced while the airstream is barely disturbed by the tongue, lips, or other vocal organs.
  • Vowels are speech sounds produced by the passage of air through the vocal tract, with relatively little obstruction, including the monoplithong vowels 280 , 281 , 282 , 283 , 284 , 285 , 286 , 287 , 288 , 289 and the diphthong vowels 291 , 292 , 293 , 294 , 295 .
  • the place of articulation 247 refers largely to the position of the tongue, teeth, and lips.
  • Bilabials are pronounced by bringing both lips into contact with each other or by rounding them.
  • Labiodentals are pronounced with the upper teeth resting on the inside of the lower lip.
  • Dentals are formed by placing the tongue against the back of the top front teeth.
  • Alveolars are sounded with the tongue touching or close to the ridge behind the teeth of the upper jaw.
  • Palato-alveolars are produced by raising the tongue to or near the forward-most portion of the hard palate.
  • Palatals are produced by raising the tongue to or near the hard palate.
  • Velars are spoken with the back of the tongue close to, or in contact with, the soft palate (velum).
  • Other speech characteristics 228 include voice, dominant sound frequencies above about 3000 Hz (3 kHz+), and stops.
  • eight phonemes comprise a period of relative silence followed by a period of relatively high sound energy. These phonemes, called stops 228 are the plosives and the affricates 201 , 204 , 207 , 211 , 214 , 217 , 224 , 234 . Stops are not recognizable from their audible portion alone. Recognition of these phonemes requires that they begin with silence. Phonemes may be voiced or unvoiced. For example, “b”, 211 , is the voiced version of “p”, 201 , and “z”, 254 , is the voiced version of “s”, 244 .
  • the plosives, affricates, and fricatives 201 , 204 , 207 , 211 , 214 , 217 , 224 , 234 , 242 , 243 , 244 , 245 , 250 , 252 , 253 , 254 , 255 comprise sound frequencies above 3000 Hz.
  • Unvoiced phonemes 201 , 204 , 207 , 224 , 242 , 243 , 244 , 245 , 250 in particular tend to be dominated by the higher sound frequencies.
  • a method of transforming a sequence of symbols representing phonemes into a sequence of arrays of nerve stimuli comprising establishing a correlation between each member of a phoneme symbol set and an assignment of one or more channels of a multi-electrode array, accessing a sequence of phonetic symbols corresponding to a message, and activating a sequence of one or more electrodes corresponding to each phonetic symbol of the message identified by the correlation.
  • the phonetic symbols may belong to one of SAMPA, Kirshenbaum, or IPA Unicode digital character sets.
  • the symbols may belong to the cmudict phoneme set.
  • the correlation may be a one to one correlation.
  • Activating a sequence of one or more electrodes may include an energizing period for each electrode, wherein the energizing period comprises a begin time parameter and an end time parameter.
  • the begin time parameter may be representative of a time from an end of components of a previous energizing period of a particular electrode.
  • the electrodes may be associated with a hearing prosthesis.
  • the hearing prosthesis may comprise a cochlear implant.
  • a method of processing a sequence of spoken words into a sequence of sounds comprising converting a sequence of spoken words into electrical signals, digitizing the electrical signals representative of the speech sounds, transforming the speech sounds into digital symbols representing corresponding phonemes, transforming the symbols representing the corresponding phonemes into sound representations, and transforming the sound representations into sounds.
  • Transforming the symbols representing the phonemes into sound representations may comprise accessing a data structure configured to map phonemes to sound representations, locating the symbols representing the corresponding phonemes in the data structure, and mapping the phonemes to sound representations.
  • the method additionally may comprise creating the data structure, comprising identifying phonemes corresponding to a language used by a user of the method, establishing a set of allowed sound frequencies, generating a correspondence mapping the identified phonemes to the set of allowed sound frequencies such that each constituent phoneme of the identified phonemes is assigned a subset of one or more frequencies from the set of allowed sound frequencies, and mapping each constituent phoneme of the identified phonemes to a set of one or more sounds.
  • Establishing a set of allowed sound frequencies may comprise selecting a set of sound frequencies that are in a hearing range of the user.
  • Each sound of the set of one more sounds may comprise an initial frequency parameter.
  • Each sound of the set of one more sounds may comprise a begin time parameter. The begin time parameter may be representative of a time from an end of components of a previous sound representation.
  • Each sound of the set of one more sounds may comprise an end time parameter.
  • Each sound of the set of one more sounds may comprise a power parameter.
  • Each sound of the set of one more sounds may comprise a power shift parameter.
  • Each sound of the set of one more sounds may comprise a frequency shift parameter.
  • Each sound of the set of one more sounds may comprise a pulse rate parameter.
  • Each sound of the set of one more sounds may comprise a duty cycle parameter.
  • a method of processing a sequence of spoken words into a sequence of nerve stimuli comprising converting a sequence of spoken words into electrical signals, digitizing the electrical signals representative of the speech sounds, transforming the speech sounds into digital symbols representing corresponding phonemes, transforming the symbols representing the corresponding phonemes into stimulus definitions and transforming the stimulus definitions into a sequence of nerve stimuli.
  • the nerve stimuli may be associated with a hearing prosthesis.
  • the hearing prosthesis may comprise a cochlear implant.
  • the nerve stimuli may be associated with a skin interface.
  • the skin interface may be located on the wrist and/or hand of the user. Alternatively, the skin interface may be located on the ankle and/or foot of the user.
  • the nerve stimuli may be mechanical and/or electrical.
  • Transforming the symbols representing the phonemes into stimulus definitions may comprise accessing a data structure configured to map phonemes to stimulus definitions, locating the symbols representing the corresponding phonemes in the data structure, and mapping the phonemes to stimulus definitions.
  • the stimulus definitions may comprise sets of one or more stimuli.
  • the sets of one or more stimuli may correspond to one or more locations on the skin or one or more locations in the cochlea.
  • Each stimulus of the sets of one or more stimuli may comprise a begin time parameter.
  • the begin time parameter may be representative of a time from an end of components of a previous stimulus definition.
  • a method of transforming a sequence of symbols representing phonemes into a sequence of arrays of nerve stimuli comprising establishing a correlation between each member of a phoneme symbol set and an assignment of one or more channels of a multi-stimulator array, accessing a sequence of phonetic symbols corresponding to a message, and activating a sequence of one or more stimulators corresponding to each phonetic symbol of the message identified by the correlation.
  • the stimulators may be vibrators affixed to the user's skin.
  • the phonetic symbols may belong to one of SAMPA, Kirshenbaum, or IPA Unicode digital character sets.
  • the symbols may belong to the cmudict phoneme set.
  • the correlation may be a one to one correlation.
  • Activating a sequence of one or more stimulators may include an energizing period for each stimulator, wherein the energizing period comprises a begin time parameter and an end time parameter.
  • the begin time parameter may be representative of a time from an end of components of a previous energizing period of a particular stimulator.
  • a method of training a user comprising providing a set of somatic stimulations to a user, wherein the set of somatic stimulations is indicative of a plurality of phonemes, and wherein the phonemes are based at least in part on an audio communication; providing the audio communication concurrently to the user with the plurality of phonemes; and selectively modifying at least portions of the audio communication to the user during the providing of the set of somatic stimulations to the user.
  • Selectively modifying at least portions of the audio communication may comprise reducing an audio property of the audio communication.
  • the audio property may comprise a volume of the audio.
  • the audio property may comprise omitting selected words from the audio.
  • the audio property may comprise attenuating a volume of selected words from the audio.
  • the audio property may comprise omitting selected phonemes from the audio.
  • the audio property may comprise attenuating a volume of selected phonemes from the audio.
  • the audio property may comprise omitting selected sound frequencies from the audio.
  • the audio property may comprise attenuating a volume of selected sound frequencies from the audio.
  • a method of training a user comprising providing a set of somatic stimulations to a user, wherein the set of somatic stimulations is indicative of a plurality of phonemes, and wherein the phonemes are based at least in part on an audiovisual communication; providing the audiovisual communication concurrently to the user with the plurality of phonemes; and selectively modifying at least portions of the audiovisual communication to the user during the providing of the set of somatic stimulations to the user.
  • Selectively modifying at least portions of the audiovisual communication may comprise reducing an audio or video property of the audiovisual communication.
  • the audio property may comprise a volume of the audio.
  • the audio property may comprise omitting selected words from the audio.
  • the audio property may comprise attenuating a volume of selected words from the audio.
  • the audio property may comprise omitting selected phonemes from the audio.
  • the audio property may comprise attenuating a volume of selected phonemes from the audio.
  • the audio property may comprise omitting selected sound frequencies from the audio.
  • the audio property may comprise attenuating a volume of selected sound frequencies from the audio.
  • the video property may comprise a presence or brightness of the video.
  • a system for processing a sequence of spoken words into a sequence of sounds comprising a first converter configured to digitize electrical signals representative of a sequence of spoken words, a speech recognizer configured to receive the digitized electrical signals and generate a sequence of phonemes representative of the sequence of spoken words, a mapper configured to assign sound sets to phonemes utilizing an audiogram so as to generate a map, a transformer configured to receive the sequence of phonemes representative of the sequence of spoken words and the map and to generate a sequence of sound representations corresponding to the sequence of phonemes, and a second converter configured to convert the sequence of sound representations into a sequence of audible sounds.
  • the map may be a user-specific map based on a particular user's audiogram.
  • a system for processing a sequence of spoken words into a sequence of sounds comprising a first converter configured to digitize electrical signals representative of a sequence of spoken words, a speech recognizer configured to receive the digitized electrical signals and generate a sequence of phonemes representative of the sequence of spoken words, a data structure comprising sound sets mapped to phonemes, a transformer configured to receive the sequence of phonemes representative of the sequence of spoken words and the data structure and to generate a sequence of sound representations corresponding to the sequence of phonemes, and a second converter configured to convert the sequence of sound representations into a sequence of audible sounds.
  • the data structure may be generated utilizing a user's audiogram.
  • a system for processing a sequence of spoken words into a sequence of nerve stimuli comprising a converter configured to digitize electrical signals representative of a sequence of spoken words, a speech recognizer configured to receive the digitized electrical signals and generate a sequence of phonemes representative of the sequence of spoken words, a mapper configured to assign nerve stimuli arrays to phonemes utilizing an audiogram so as to generate a map, and a transformer configured to receive the sequence of phonemes representative of the sequence of spoken words and the map and to generate a sequence of stimulus definitions corresponding to the sequence of phonemes.
  • the system may additionally comprise a receiver configured to convert the sequence of stimulus definitions into electrical waveforms and an electrode array configured to receive the electrical waveforms.
  • the electrode array may be surgically placed in the user's cochlea.
  • the sequence of stimulus definitions may comprise digital representations of nerve stimulation patterns.
  • a system for processing a sequence of spoken words into a sequence of nerve stimuli comprising a converter configured to digitize electrical signals representative of a sequence of spoken words, a speech recognizer configured to receive the digitized electrical signals and generate a sequence of phonemes representative of the sequence of spoken words, a data structure comprising nerve stimuli arrays mapped to phonemes, and a transformer configured to receive the sequence of phonemes representative of the sequence of spoken words and the data structure and to generate a sequence of stimulus definitions corresponding to the sequence of phonemes.
  • the data structure may be generated utilizing a user's audiogram.
  • the system may additionally comprise a receiver configured to convert the sequence of stimulus definitions into electrical waveforms and an electrode array configured to receive the electrical waveforms.
  • the electrode array may be surgically placed in the user's cochlea.
  • the sequence of stimulus definitions may comprise digital representations of nerve stimulation patterns.
  • a system for processing a sequence of spoken words into a sequence of nerve stimuli comprising a processor configured to generate a sequence of phonemes representative of a sequence of spoken words and to transform the sequence of phonemes using a data structure comprising nerve stimuli arrays mapped to phonemes to produce a sequence of stimulus definitions corresponding to the sequence of phonemes, and an electrode array configured to play the sequence of stimulus definitions.
  • the data structure may be generated utilizing a user's audiogram.
  • the electrode array may comprise a converter configured to convert the sequence of stimulus definitions into electrical waveforms.
  • the electrode array may be surgically placed in the user's cochlea.
  • the electrode array may comprise a plurality of mechanical stimulators or a plurality of electrodes.
  • the sequence of stimulus definitions may comprise digital representations of nerve stimulation patterns.
  • a system for processing a sequence of spoken words into a sequence of sounds comprising a processor configured to generate a sequence of phonemes representative of the sequence of spoken words and to transform the sequence of phonemes using a data structure comprising sound sets mapped to phonemes to produce sound representations corresponding to the sequence of phonemes, and a converter configured to convert the sound representations into audible sounds.
  • the data structure may be generated utilizing a user's audiogram.
  • a system for processing a sequence of text into a sequence of sounds comprising, a first converter configured to receive a sequence of text and generate a sequence of phonemes representative of the sequence of text, a mapper configured to assign sound sets to phonemes utilizing a hearing audiogram so as to generate a map, a transformer configured to receive the sequence of phonemes representative of the sequence of text and the map and to generate sound representations corresponding to the sequence of phonemes, and a second converter configured to convert the sound representations into audible sounds.
  • the hearing audiogram may be representative of a normal human hearing range.
  • the hearing audiogram may be representative of a hearing range for a specific individual.
  • a system for processing a sequence of text into a sequence of sounds comprising a text converter configured to receive a sequence of text and generate a sequence of phonemes representative of the sequence of text, a data structure comprising sound sets mapped to phonemes, a transformer configured to receive the sequence of phonemes representative of the sequence of text and the data structure and to generate sound representations corresponding to the sequence of phonemes, and a second converter configured to convert the sound representations into audible sounds.
  • the data structure may be generated utilizing a user's audiogram.
  • a system for processing a sequence of text into a sequence of nerve stimuli comprising a converter configured to receive a sequence of text and generate a sequence of phonemes representative of the sequence of text, a data structure comprising nerve stimuli arrays mapped to phonemes, and a transformer configured to receive the sequence of phonemes representative of the sequence of text and the data structure and to generate a sequence of stimulus definitions corresponding to the sequence of phonemes.
  • the data structure may be generated utilizing a user's abilities.
  • the user's abilities may comprise useable channels of a cochlear implant of the user.
  • the user's abilities may comprise the ability to distinguish between two or more unique stimuli.
  • a method of processing a sequence of text into a sequence of sounds comprising transforming the sequence of text into digital symbols representing corresponding phonemes, transforming the symbols representing the corresponding phonemes into sound representations, and transforming the sound representations into a sequence of sounds.
  • a method of processing a sequence of text into a sequence of nerve stimuli comprising transforming the sequence of text into digital symbols representing corresponding phonemes, transforming the symbols representing the corresponding phonemes into stimulus definitions, and transforming the stimulus definitions into a sequence of nerve stimuli.
  • the nerve stimuli may be associated with a cochlear implant.
  • the nerve stimuli may be associated with a skin interface, where the skin interface may be located on the wrist and/or hand of the user.
  • Transforming the symbols representing the phonemes into stimulus definitions may comprise accessing a data structure configured to map phonemes to stimulus definitions, locating the symbols representing the corresponding phonemes in the data structure, and mapping the phonemes to stimulus definitions.
  • a method of creating a data structure configured to transform symbols representing phonemes into sound representations comprising identifying phonemes corresponding to a language utilized by a user, establishing a set of allowed sound frequencies, generating a correspondence mapping the identified phonemes to the set of allowed sound frequencies such that each constituent phoneme of the identified phonemes is assigned a subset of one or more frequencies from the set of allowed sound frequencies, and mapping each constituent phoneme of the identified phonemes to a set of one or more sounds.
  • Establishing a set of allowed sound frequencies may comprise selecting a set of sound frequencies that are in a hearing range of the user.
  • Each sound of the set of one more sounds may comprise an initial frequency parameter.
  • Each sound of the set of one more sounds may comprise a begin time parameter.
  • the begin time parameter may be representative of a time from an end of components of a previous sound representation.
  • Each sound of the set of one more sounds may comprise an end time parameter.
  • Each sound of the set of one more sounds may comprise a power parameter.
  • Each sound of the set of one more sounds may comprise a power shift parameter.
  • Each sound of the set of one more sounds may comprise a frequency shift parameter.
  • Each sound of the set of one more sounds may comprise a pulse rate parameter.
  • Each sound of the set of one more sounds may comprise a duty cycle parameter.
  • FIG. 1 is a diagram showing a spectrogram, waveform and phonemes for an English word.
  • FIG. 2 is a table of English phonemes shown in three nomenclatures.
  • FIG. 3A is a plot of sound intensity and sound frequency showing normal human hearing.
  • FIG. 3B is a plot of sound intensity and sound frequency showing hearing loss such as caused by chronic exposure to loud noise.
  • FIG. 3C is a plot of hearing level and sound frequency, as would appear in the form of a clinical audiogram, showing normal human hearing, and is analogous to the plot of FIG. 3A .
  • FIG. 3D is a plot of hearing level and sound frequency, as would appear in the form of a clinical audiogram, showing hearing loss such as caused by chronic exposure to loud noise, and is analogous to the plot of FIG. 3B .
  • FIGS. 4A and 4B are diagrams showing conventional physical configurations of body-worn and in-the-ear hearing aids, respectively.
  • FIGS. 4C and 4D are diagrams showing functional components of low-complexity and medium-complexity hearing aids, respectively.
  • FIG. 4E is a diagram of a phoneme substitution based hearing aid.
  • FIG. 5A is a diagram showing a spectrogram, waveform and phonemes for an English word “chew”
  • FIG. 5B is a diagram similar to that of FIG. 5A but showing use of amplification in the spectrogram and waveform.
  • FIG. 5C is a diagram similar to that of FIG. 5A but showing use of speech processing in the spectrogram and waveform.
  • FIG. 5D is a diagram similar to that of FIG. 5A but showing use of phoneme substitution in the spectrogram and waveform.
  • FIG. 6 is a diagram of an embodiment of the components associated with a hearing aid using phoneme substitution.
  • FIG. 7 is a flowchart of an embodiment of an assignment of sound sets to phonemes process shown in FIG. 6 .
  • FIG. 8 is a diagram of an example of a phoneme substitution data structure such as resulting from the assignment of sound sets to phonemes process shown in FIG. 7 .
  • FIG. 9 is a plot of a spectrogram for the English word “jousting” as a result of phoneme substitution such as performed using the data structures shown in FIG. 8 .
  • FIG. 10A is a diagram of physical components of an example of a cochlear implant hearing device.
  • FIG. 10B is a diagram of a functional configuration of the example cochlear implant hearing device shown in FIG. 10A .
  • FIG. 11A is a diagram showing a spectrogram, waveform and phonemes for an English word “chew”
  • FIG. 11B is a diagram similar to that of FIG. 11A but showing use of conventional sound processing in the spectrogram.
  • FIG. 11C is a diagram similar to that of FIG. 11A but showing use of phoneme substitution in the spectrogram.
  • FIG. 12 is a diagram of an embodiment of the components associated with a hearing implant using phoneme substitution.
  • FIG. 13 is a diagram showing an embodiment of an implanted electrode array and an example structure of potential electrode assignments, such as stored in the database of nerve stimuli arrays to phonemes shown in FIG. 12 .
  • FIG. 14A is a diagram of an embodiment of a skin interface, used with phoneme substitution, having mechanical or electrical stimulators fitted about a person's hand and wrist.
  • FIG. 14B is a diagram of an embodiment of a skin interface, used with phoneme substitution, having mechanical or electrical stimulators fitted about a person's wrist.
  • FIG. 15 is a table providing examples of mapping English phonemes to tactile symbols, such as for the skin interfaces shown in FIGS. 14A and 14B .
  • FIG. 16A is a diagram of various ways of representing the English word “chew”.
  • FIG. 16B is a diagram showing embodiments of transmitters and receivers for implementing phoneme substitution communication, such as shown in FIGS. 6 , 12 and 14 A and 14 B.
  • each of the modules may comprise various sub-routines, procedures, definitional statements and macros.
  • Each of the modules are typically separately compiled and linked into a single executable program. Therefore, the following description of each of the modules is used for convenience to describe the functionality of the preferred system.
  • the processes that are undergone by each of the modules may be arbitrarily redistributed to one of the other modules, combined together in a single module, or made available in, for example, a shareable dynamic link library.
  • the system modules, tools, and applications may be written in any programming language such as, for example, C, C++, BASIC, Visual Basic, Pascal, Ada, Java, HTML, XML, or FORTRAN, and executed on an operating system, such as variants of Windows, Macintosh, UNIX, Linux, VxWorks, or other operating system.
  • C, C++, BASIC, Visual Basic, Pascal, Ada, Java, HTML, XML and FORTRAN are industry standard programming languages for which many commercial compilers can be used to create executable code.
  • a computer or computing device may be any processor controlled device, which may permit access to the Internet, including terminal devices, such as personal computers, workstations, servers, clients, mini-computers, main-frame computers, laptop computers, a network of individual computers, mobile computers, palm-top computers, hand-held computers, set top boxes for a television, other types of web-enabled televisions, interactive kiosks, personal digital assistants, interactive or web-enabled wireless communications devices, mobile web browsers, or a combination thereof.
  • the computers may further possess one or more input devices such as a keyboard, mouse, touch pad, joystick, pen-input-pad, and the like.
  • the computers may also possess an output device, such as a visual display and an audio output.
  • One or more of these computing devices may form a computing environment.
  • These computers may be uni-processor or multi-processor machines. Additionally, these computers may include an addressable storage medium or computer accessible medium, such as random access memory (RAM), an electronically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), hard disks, floppy disks, laser disk players, digital video devices, compact disks, video tapes, audio tapes, magnetic recording tracks, electronic networks, and other techniques to transmit or store electronic content such as, by way of example, programs and data.
  • the computers are equipped with a network communication device such as a network interface card, a modem, or other network connection device suitable for connecting to the communication network.
  • the computers execute an appropriate operating system such as Linux, UNIX, any of the versions of Microsoft Windows, Apple MacOS, IBM OS/2 or other operating system.
  • the appropriate operating system may include a communications protocol implementation that handles all incoming and outgoing message traffic passed over the Internet.
  • the operating system may differ depending on the type of computer, the operating system will continue to provide the appropriate communications protocols to establish communication links with the Internet.
  • the computers may contain program logic, or other substrate configuration representing data and instructions, which cause the computer to operate in a specific and predefined manner, as described herein.
  • a computer readable medium can store the data and instructions for the processes and methods described hereinbelow.
  • the program logic may be implemented as one or more object frameworks or modules. These modules may be configured to reside on the addressable storage medium and configured to execute on one or more processors. The modules include, but are not limited to, software or hardware components that perform certain tasks.
  • a module may include, by way of example, components, such as, software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.
  • components such as, software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.
  • the various components of the system may communicate with each other and other components comprising the respective computers through mechanisms such as, by way of example, interprocess communication, remote procedure call, distributed object interfaces, and other various program interfaces.
  • the functionality provided for in the components, modules, and databases may be combined into fewer components, modules, or databases or further separated into additional components, modules, or databases.
  • the components, modules, and databases may be implemented to execute on one or more computers.
  • some of the components, modules, and databases may be implemented to execute on one or more computers external to a website.
  • the website may include program logic, which enables the website to communicate with the externally implemented components, modules, and databases to perform the functions as disclosed herein.
  • the plots 100 of FIG. 1 illustrate one word in one language.
  • Each language and dialect has its own set or sets of phonemes (different classification systems may define different sets of phonemes for the same language or dialect).
  • the scope of this description encompasses all phonemes, both those currently defined and those not yet defined, for all languages.
  • FIG. 2 is a table 200 of American English phonemes 225 shown in three nomenclatures: the International Phonics Association (IPA), s ⁇ mpA (a phonetic spelling of SAMPA, the abbreviation for Speech Assessment Methods Phonetic Alphabet), and the Merriam Webster Online Dictionary (m-w). Other nomenclatures, such as the Carnegie Mellon University pronouncing dictionary (cmudict), can be used in certain embodiments. Examples 226 of each phoneme as used in an American English word are provided, along with the manner 237 and place 247 of articulation 227 .
  • IPA International Phonics Association
  • s ⁇ mpA a phonetic spelling of SAMPA, the abbreviation for Speech Assessment Methods Phonetic Alphabet
  • m-w Merriam Webster Online Dictionary
  • Examples 226 of each phoneme as used in an American English word are provided, along with the manner 237 and place 247 of articulation 227 .
  • Some embodiments relate to recoding phonemes to sets of sound frequencies that can be perceived by the user lacking the ability to hear the full range of human speech sounds.
  • FIG. 3A plot 300 a , shows a range of human hearing that is considered normal region 310 a , on a plot of the sound frequency in Hertz (horizontal axis) versus the sound intensity in watts/m 2 (vertical axis).
  • the threshold of perception is the bottom (low intensity) boundary 312 a , 314 a , which varies as a function of frequency. Human hearing is most sensitive to sound frequencies around 3000 Hz. At these frequencies, the threshold of perception 314 a can be less than 10 ⁇ 12 watts/m 2 (0 dB) 341 a for some individuals.
  • the threshold of discomfort is the top (high intensity) boundary, 316 a .
  • the low frequency limit of human hearing is defined as the frequency that is both the threshold of perception and the threshold of discomfort 318 a .
  • the high frequency limit of human hearing 319 a is defined in the same manner.
  • the OSHA limit for safe long term exposure to noise in the work environment, 90 dB is equivalent to 10 ⁇ 3 watts/m 2 343 a .
  • Sound frequencies and intensities required for speech perception are generally between about 300 Hz and 9000 Hz and about 10 ⁇ 10 to 10 ⁇ 7 watts/m 2 (20 dB to 50 dB) region 320 a .
  • Lower frequencies area 323 a are most important for the recognition of vowel sounds, while higher sound frequencies area 326 a , are more important for the recognition of consonants (also see FIGS. 1 and 2 ).
  • Hearing impairments Five to ten percent of people have a more limited hearing range shown in region 310 than that shown in FIG. 3A . Many different types of hearing impairments exist. For example, one or both ears may be affected in their sensitivities to different sound frequencies. Hearing impairments may be congenital or acquired later in life, and may result from, or be influenced by, genetic factors, disease processes, medical treatments, and/or physical trauma.
  • FIG. 3B plot 300 b , illustrates a reduced range of hearing region 310 b as might result from chronic exposure to noise levels above 90 dB 343 b .
  • the threshold of perception for low frequency sounds 312 b is only slightly affected, the ability to hear higher frequency sounds 314 b is significantly impaired.
  • a person with a hearing range as shown in FIG. 3B at region 310 b would be able to hear and recognize most low frequency vowel sounds at region 320 b , but would find it difficult or impossible to hear and recognize many high frequency consonant sounds 330 b .
  • the normal threshold of perception 0 dB or 10 ⁇ 12 watts/m 2 is indicated by the arrow 341 b and the OSHA limit for safe long term exposure to noise in the work environment, 90 dB or 10 ⁇ 3 watts/m 2 , is indicated by the arrow 343 b .
  • the threshold of discomfort 316 b is relatively unaffected by a rise in the threshold of perception.
  • hearing aids can improve speech recognition by amplifying speech sounds above the threshold of perception for hearing impaired persons.
  • One embodiment is a device that recodes speech sounds to frequencies in a range of sensitive hearing rather than amplifying them at the frequencies where hearing is impaired. For example, an individual with a hearing range similar to that shown in FIG. 3B , region 300 b , would not hear most speech sounds at frequencies above around 1500 Hz in region 330 b , but could hear sounds recoded to sound frequencies around 400 Hz in area 350 b.
  • Audiometry provides a practical and clinically useful measurement of hearing by having the subject wear earphones attached to the audiometer. Pure tones of controlled intensity are delivered to one ear at a time. The subject is asked to indicate when he or she hears a sound. The minimum intensity (volume) required to hear each tone is graphed versus frequency. The objective of audiometry is to plot an audiogram, a chart of the weakest intensity of sound that a subject can detect at various frequencies.
  • an audiogram presents similar information to the graphs in FIGS. 3A and 3B , it differs in several aspects.
  • the human ear can detect frequencies from 20 to 20,000 Hz, hearing threshold sensitivity is usually measured only for the frequencies needed to hear the sounds of speech, 250 to 8,000 Hz.
  • the sound intensity scale of an audiogram is inverted compared with the graphs in FIGS. 3A and 3B , and measured in decibels, dB, a log scale where zero has been arbitrarily defined as 10 ⁇ 12 watts/m 2 .
  • the audiogram provides an individual assessment of each ear.
  • FIG. 3C plot 300 c , shows an audiogram for an individual with normal hearing, similar to that shown in FIG. 3A , plot 300 a .
  • a shaded area 320 c represents the decibel levels and frequencies where speech sounds are generally perceived (the so-called “speech banana”, similar to the shaded area of FIG. 3A , region 320 a , but inverted).
  • Hearing in the right ear is represented by circles connected by a line 362 c and in the left ear by crosses connected by a line 364 c .
  • the symbols (circle for the right ear and cross for the left) indicate the person's hearing threshold at particular frequencies, e.g., the loudness (intensity) point where sound is just audible.
  • Thresholds of perception from zero dB shown by arrow 341 c to 15 dB (1.0 ⁇ 10 ⁇ 2 to 3.2 ⁇ 10 ⁇ 10 watts/m 2 ) are considered to be within the normal hearing range.
  • the OSHA limit for safe long-term exposure to noise, 90 dB, or 10 ⁇ 3 watts/m 2 , shown by arrow 343 c is also provided for reference.
  • An area designated 323 c indicates the range most important for hearing vowel sounds, and an area designated 326 c indicates the range most important for hearing consonants.
  • FIG. 3D plot 300 d , shows an exemplary audiogram of an individual with bilaterally symmetrical hearing loss (similar hearing losses in both ears) similar to that shown in FIG. 3B , plot 300 b ).
  • Hearing in the right ear is represented by circles connected by a line 362 d and in the left ear by crosses connected by a line 364 d .
  • a line 362 d At the lower frequencies (250 to 500 Hz), little hearing loss has occurred, area 350 d . However, at the mid-range of frequencies (500 to 1000 Hz) hearing loss is moderate, area 320 d , and at the higher frequencies (>2000 Hz), hearing loss is severe, area 330 d .
  • a person with this degree of hearing loss would able to hear and recognize most low frequency vowel sounds, area 320 d , but would find it difficult or impossible to hear and recognize many high frequency consonant sounds, area 330 d . As a result, this person would be able to hear when people are speaking, but would be unable to understand what they are saying.
  • the normal threshold of perception 0 dB or 10 ⁇ 12 watts/m 2 , shown by arrow 341 d
  • the OSHA limit for safe long term exposure to noise in the work environment 90 dB or 10 ⁇ 3 watts/m 2 , shown by arrow 343 d , are provided for reference.
  • hearing aids can improve speech recognition by amplifying speech sounds above the threshold of perception for hearing impaired persons.
  • An embodiment is a device that recodes speech sounds to frequencies in a range of sensitive hearing rather than amplifying them at the frequencies where hearing is impaired. For example, an individual with an audiogram similar to that shown in FIG. 3D , plot 300 d , would not hear most speech sounds at frequencies above around 1500 Hz, area 330 d , but could hear sounds recoded to sound frequencies around 400 Hz, area 350 d.
  • hearing aids which vary in physical configuration, power, circuitry, and performance. They all aid sound and speech perception by amplifying sounds that would otherwise be imperceptible to the user; however, their effectiveness is often limited by distortion and the narrow range in which the amplified sound is audible, but not uncomfortable. Certain embodiments described herein overcome these limitations.
  • FIGS. 4A and 4B diagrams 400 a , 400 b , illustrate some of the basic physical configurations found in hearing aid designs.
  • a body worn aid 420 a may comprise a case 412 a containing a power supply and components of amplification; and an ear mold 416 a containing an electronic speaker, connected to the case by a cord 414 a .
  • Behind-the-ear aids 410 b , 420 b may consist of a small case 412 b containing a power supply, components of amplification and an electronic speaker, which fits behind an ear 404 b ; an ear mold 416 b ; and a connector 414 b , which conducts sound to the ear 404 b through the ear mold 416 b .
  • In-the-ear aids 430 b comprise a power supply, components of amplification, and an electronic speaker, fit entirely within an outer ear 406 b.
  • FIGS. 4C and 4D diagrams 400 c and 400 d , illustrate some of the functional components found in hearing aid designs.
  • the least complex device 420 c comprises a microphone 413 c , which converts sounds such as speech from another person 408 c into an electronic signal.
  • the electronic signal is then amplified by an amplifier 415 c and converted back into sound by an electronic speaker 417 c in proximity to the user's ear 404 c.
  • More sophisticated devices 420 d comprise a microphone 413 d and a speaker 417 d , which perform the same functions as their counterparts 413 c , 417 c respectively.
  • sound and speech processing circuitry 415 d can function differently from simple amplification circuitry 415 c .
  • Sound and speech processing circuitry 415 d may be either digital or analog in nature. Unlike the simple amplifier 415 c , sound and speech processing circuitry 415 d can amplify different portions of the sound spectrum to different degrees.
  • These devices might incorporate electronic filters that reduce distracting noise and might be programmed with different settings corresponding to the user's needs in different environments (e.g., noisy office or quiet room).
  • FIG. 4E diagram 400 e .
  • a device 420 e differs in its principle of operation from the hearing aids 420 c and 420 d in that its circuitry 415 e can substitute the phonemes of speech sounds with unique sets of sounds (acoustic symbols).
  • acoustic symbols By substituting some or all of the phonemes in a given language with simple acoustic symbols, it is possible to utilize portions of the sound spectrum where a user may have relatively unimpaired hearing.
  • the symbols themselves may represent phonemes, sets of phonemes, portions of phonemes, or types of phonemes.
  • the acoustic symbols could, for example, comprise sound frequencies between 200 Hz and 600 Hz, which would be audible to that person.
  • FIG. 5 the English word, “chew”, 505 a , is used to compare and contrast certain embodiments described herein to conventional technologies.
  • FIG. 5A plots 500 a provides a spectrogram 520 a and waveform 540 a for the word, “chew” 505 a .
  • “chew” comprises two phonemes, 524 a , and u , 586 a , which are visible as two distinctive regions 542 a and 544 a of the waveform 540 a .
  • the waveform is too complex to expose much informative detail via visual inspection.
  • the spectrogram 520 a reveals a greater level of relevant detail.
  • the phoneme, 524 a comprises a complex set of sound frequencies 521 a broadly distributed largely above 3000 Hz.
  • Most of the power for the phoneme, u, 586 a is contained in relatively tight frequency ranges around 500 Hz, 523 a , and 2500 Hz, 522 a .
  • u, 586 a is a voiced phoneme, exhibiting characteristic waxing and waning of power over many frequencies, observable as faint vertical stripes within the bands labeled 522 a and 523 a .
  • the waxing and waning itself has a frequency of approximately 250 Hz ( ⁇ 25 stripes per 100 milliseconds on the time axis).
  • plots 300 d An individual with an audiogram similar to that shown in FIG. 3D , plots 300 d , might be able to hear the phoneme, u, 586 a , reasonably well because its frequencies are in the lower range of speech. However, this individual would not hear 524 a because this person's hearing is impaired at higher frequencies.
  • a hearing aid using simple amplification can help to some extent by increasing the sound pressure (a.k.a. volume, a.k.a. power) at all frequencies as illustrated in FIG. 5B , plots 500 b .
  • FIG. 5C plots 500 c , illustrates a spectrogram 520 c and waveform 540 c obtained when the word, “chew” is spoken into a hearing aid with speech/sound processing capability. Increased amplitude is observed in the waveform area 542 c but less so in the area 544 c relative to corresponding portions of the waveform 540 a , 542 a and 544 a , FIG. 5A .
  • the spectrogram 520 c reveals that most amplification occurs at the higher frequencies 521 c and 522 c but less so at the lower frequencies 523 c . Therefore the low frequency components 523 c of the phoneme u, 586 c are not too loud. Noise problems are also reduced. However, the sound at 521 c and 522 c may be so loud that it is uncomfortable and could damage remaining hearing.
  • FIG. 5D plots 500 d , provides an example of a waveform, 540 d , and spectrogram, 520 d , as might result from recoding the word “chew” using the phoneme substitution method described herein.
  • the waveform 540 d and spectrogram 520 d have been simplified relative to those in FIGS. 5A , 5 B, and 5 C, 540 a , 520 a , 540 b , 520 b , 540 c , 520 c , and all sound energy has been redirected to frequencies easily audible for an individual having an audiogram, plots 300 d , similar to that shown in FIG. 3D .
  • the spectrogram 520 d shows a simple frequency distribution in a narrow range. All frequencies 531 d , 532 d , 533 d , 536 d and 537 d are below 1000 Hz. Power at frequencies 536 d and 537 d representing the phoneme, u, 586 a , is pulsed at a frequency of approximately 12 Hz.
  • FIG. 6 diagram 600 , provides an overview of how one embodiment transforms speech 609 (exemplified by the waveform illustrated in FIG. 5A , plots 500 a ) from a person speaking 608 in simple acoustic symbols 605 (exemplified in the waveform, 540 d , illustrated in FIG. 5D , plots 500 d ) for a user 604 by use of a hearing aid 620 .
  • the components of the hearing aid 620 are described below.
  • the hearing aid 620 includes a microphone 613 to transform speech sound 609 into electronic analog signals which are then digitized by an analog to digital converter 622 .
  • the embodiment illustrated here provides a user interface 619 that allows the selection of one of two operating modes depending upon whether or not speech recognition is of primary interest to the user, 604 , in any given setting. Other embodiments need not provide this option.
  • a speech recognition process 630 transforms digitized speech sounds into digital symbols representing phonemes of the speech 609 produced by the person speaking 608 . Characters representing phonemes are then exchanged for digital sound representations by a transformation process 650 .
  • the transformation process of transformer 650 can be performed by software, hardware or by combinations of software and hardware.
  • the transformation process 650 comprises a correspondence from a set of phonemes to a set of sound representations held in a database or other data structure 652 and a way 654 of generating sound representations corresponding to phonemes from the speech recognizer 630 .
  • the sounds representations held in the database 652 may be wav files, mp3 files, aac files, aiff files, MIDI files, characters representing sounds, characters representing sound qualities, and the like.
  • the sound files are then converted to analog signals by a digital to analog process 626 amplified by an amplification process 628 and converted into audible sounds by a speaker 617 .
  • the value at decision state 624 will be false.
  • the device will function as a digital hearing aid with conventional speech/sound processing functions 615 , digital to analog signal conversion 626 , amplification 628 , and sound generation 617 .
  • some embodiments utilize speech recognition.
  • a number of strategies and techniques for building devices capable of recognizing and translating human speech into text are known to those skilled in such arts.
  • a generic diagram of the inner workings of the speech recognizer, 630 as might be employed by some embodiments is provided in FIG. 6 .
  • the digitized acoustic signal may be processed by a digital filter 632 in order to reduce the complexity of the data.
  • a segmentation process 634 parses the data into overlapping temporal intervals called frames.
  • Feature extraction 636 involves computing a spectral representation (somewhat like a spectrogram) of the incoming speech data, followed by identification of acoustically relevant parameters such as energy, spectral features, and pitch information.
  • a decoder 638 can be a search algorithm that may use phone models 644 , lexicons 647 , and grammatical rules 648 , for computing a match between a spoken utterance 609 and a corresponding word string.
  • phonemes are the smallest phonetic units of speech, more fundamental units, phones, are the basic sounds of speech. Unlike phonemes, phones vary widely from individual to individual, depending on gender, age, accent, etc., and even over time for a single individual depending on sentence structure, word structure, mood, social context, etc. Therefore, phone models 644 may use a database 642 , comprising tens of thousands of samples of speech from different individuals.
  • a lexicon 647 contains the phonetic spellings for the words that are expected to be observed by the speech recognizer 630 .
  • the lexicon 647 serves as a reference for converting the phone sequences determined by the search algorithm into words.
  • the grammar network or rules 648 defines the recognition task in terms of legitimate word combinations at the level of phrases and sentences.
  • Some speech recognizers employ more sophisticated language models (not shown) that predict the most likely continuation of an utterance on the basis of statistical information about the frequency in which word sequences occur on average in the language.
  • the lexicon 647 and grammar network 648 use a task database 646 comprising words and their various pronunciations, common phrases, grammar, and usage.
  • a computer 660 can be used to aid in the creation of user specific phonic symbol databases, which are then downloaded to the database 652 of the hearing aid 620 .
  • the computer 660 comprises software allowing the input of data (e.g., audiogram) 664 from a user's hearing tests, a user interface 662 , and a process or mapper 670 for creating a map (for database 652 ) to transform symbols representing phonemes into sets of sounds.
  • the mapper 670 can be performed by hardware circuits.
  • each unique phoneme maps to a unique acoustic symbol.
  • Each acoustic symbol comprises a unique set of sounds, each sound being audible the user, and each acoustic symbol, or sound set, having a distinctive perceived sound.
  • the function of the Assignment Of Sound Sets to Phonemes process 670 in FIG. 6 is to build such a map.
  • Process 670 further described in conjunction with FIG. 7 , outlines one method for constructing the map. This and other methods can be performed manually or in an automated fashion using a computer or other computational device such as a table.
  • Acoustic symbols or sound sets may comprise one or more sounds. Sounds may differ in a number of qualities including but not limited to frequency, intensity, duration, overtones (harmonics and partials), attack, decay, sustain, release, tremolo, and vibrato. Although any or all of these differences can be employed, the example process 670 shown in FIG. 7 places a primary emphasis on variations in frequency. Therefore, the example process 670 provides acoustic symbols (sound sets) that are unique with respect to the sound frequencies they comprise. For simplicity, this example will employ only combinations of pure tones (no overtones). Sounds having harmonic content could be employed in a similar fashion.
  • state 710 calls for a value, i, the input intensity limit.
  • the input intensity limit, i is an intensity or power density level, above which the user should be able to perceive each and every sound present in the set of acoustic symbols. As the value for i is increased, the range of available sounds to construct acoustic symbols will increase.
  • state 715 determines a range of sound frequencies, [f 1 , f h ], such that each sound frequency in the range [f 1 , f h ], is perceptible to the user at power densities at or below i.
  • the closest any two frequencies could be at the low frequency end of the range would be 79 Hz, and the closest any two frequencies could be at the high frequency end of the range would be 183 Hz. More sophisticated rules can be used to factor in non-logarithmic and other components of the human hearing response to sound frequency.
  • process 670 calls for values of v, x, and y.
  • a database or data structure 731 comprises a list of phonemes that the user is likely to require.
  • a person who uses only the English language might need approximately 39 phonemes as listed in FIG. 2 , table 200 .
  • Someone who uses only the Hawaiian language would require approximately 13 phonemes while a person using two European and two Asian languages might require approximately 200 phonemes.
  • each symbol comprises a unique set of sound frequencies. Therefore, the composition of a given symbol either contains a particular sound frequency, or it doesn't. Therefore the maximum number of acoustic symbols that can be constructed from n frequencies is 2 n ⁇ 1. For example, three different frequencies could yield up to seven unique symbols, while eleven frequencies could yield up to 2047 unique symbols. Conversely, the minimum number, m, of frequencies, f, needed to create a unique symbol for each phoneme, p, of a set of phonemes, P, is at least 2 ⁇ log 2
  • State 730 determines the value of
  • State 740 is the first of two states 740 and 745 that assigns acoustic symbols, (sets of sounds) to phonemes.
  • process 670 assigns additional qualities to be associated with each frequency element, f, of each frequency set, Q, of each element (p, Q) of the set, M. Seven variables are assigned in this example. In other embodiments, a different number of variables can be assigned.
  • a data structure 752 is constructed mapping each phoneme to a set of sounds, each sound having eight parameters, f, b, e, w, d, h, r, c as described above. The completion of the data structure 752 allows progression to the end state 755 .
  • acoustic symbols were assembled about each phoneme.
  • the order of these steps is not critical to the practice of certain embodiments described herein, and acoustic symbols may be predefined and later assigned to phonemes.
  • the parameters, f, b, e, w, d, h, r, c are given only as examples.
  • state 740 would return the set of allowed frequencies, F, ⁇ 84, 89, 94, 100, 106, 112, 119, 126, 133, 141, 150, 159, 168, 178, 189, 200, 212, 224, 238, 252, 267, 283, 300, 317, 336, 356, 378, 400, 424, 449, 476, 504, 534, 566, 599, 635, 673, 713, 755, 800 ⁇ .
  • the user's phoneme set, P comprises a minimal set of phonemes needed for American English, the number of elements,
  • State 730 would return the value, 2 ⁇ log 2 39, which is 5.3.
  • , in the set, F, is 40. Because 40 ⁇ 5.3, the Boolean value at decision state 735 is true, and process 670 would proceed to state 740 .
  • the choice of frequencies will be further restricted to just nine of the 40 allowed frequencies, ⁇ 300, 317, 336, 400, 424, 449, 504, 534, 566 ⁇ .
  • the symbols are unique combinations of one or more sound frequencies.
  • the symbols are unique frequency intervals.
  • a frequency interval is the absolute value log difference of two frequencies. Constructing acoustic symbols as frequency intervals has advantages as most people, including trained musicians, lack the ability to recognize individual sound frequencies but are able to recognize intervals.
  • the combination of frequencies and their temporal modifications are unique for each symbol.
  • the combination of frequency intervals and the temporal modifications for each frequency are unique for each symbol.
  • the combination of frequencies and their timbre which may comprise overtones (harmonics and partials), tremolo, and vibrato, is unique for each symbol.
  • the combination of frequency intervals and the timbre of each frequency is unique for each symbol.
  • phonemes are placed into groups of like phonemes (e.g., plosive, fricative, diphthong, monophthong, etc.). Such a placement of phonemes into groups of like phonemes is known to linguists and others skilled in such arts. All phonemes are then assigned a sound frequency (the root), all phonemes being given the same root. Each member of each group of like phonemes is given a second frequency unique to that group. Once all phonemes have been assigned a second sound frequency, the most frequently used phoneme of each group is not assigned additional sound frequencies. Therefore, the most frequently used phonemes are represented by single frequency intervals. One or more additional sound frequencies are then assigned to the remaining phonemes to create a unique combination of frequencies for each phoneme.
  • groups of like phonemes e.g., plosive, fricative, diphthong, monophthong, etc.
  • phonemes are placed into groups of like phonemes (e.g., plosive, fricative, diphthong, monophthong, etc.).
  • groups of like phonemes e.g., plosive, fricative, diphthong, monophthong, etc.
  • All phonemes are then assigned a sound frequency (the root), all phonemes being given the same root.
  • Each member of each group of like phonemes is given a second frequency unique to that group.
  • the most frequently used phoneme of each group is not assigned additional sound frequencies. Therefore, the most frequently used phonemes are represented by single frequency intervals.
  • One or more additional sound frequencies are then assigned to the remaining phonemes to create a unique combination of frequencies for each phoneme.
  • every frequency of every phoneme in one group of like phonemes is shifted up or down by multiplying every frequency of every phoneme in one group of like phonemes by a constant.
  • Additional groups of like phonemes may or may not be adjusted in a similar fashion using the same constant or a different constant.
  • the acoustic symbol's frequencies, intervals, temporal modifiers, and/or timbre may be selected to resemble features of the phoneme from which it was derived.
  • Frequencies, intervals, temporal modifiers, timbre, and other qualities may be applied methodically, arbitrarily, or randomly.
  • FIG. 8 illustrates an exemplary example data structure 752 as might be returned by state 745 , FIG. 7 .
  • the data structure 752 contains examples of the use of sound qualities listed above. Not all of the sound qualities in the example are required to practice certain embodiments described herein, and other qualities not listed here may be employed.
  • the data structure comprises ordered sets, each ordered set matching a phoneme, p, to one or more sounds.
  • Each sound is defined by an ordered set comprising values for the variables f, b, e, w, d, h, r, c.
  • the last two digits of each callout or reference label in FIG. 8 are the same as the last two digits of corresponding phonemes in FIGS. 1 , 2 , 5 , 9 , 11 , 15 , and 16 .
  • the time scale as well as nature of the symbols does however vary from figure to figure.
  • the word jousting will used in the next example.
  • the IPA representation of the word, “jousting”, is and comprises seven phonetic symbols, a, , s, t, i, and .
  • the monophthong, “a”, 996 ( FIG. 9 ) is not used as a sole vowel sound in American English words or syllables, but exists only as part of the diphthongs, ai, and a . Therefore, in English 920 , actually comprises just six phonemes, , s, t, i, and .
  • FIG. 8 callouts 834 , 894 , 844 , 804 , 880 , 867 , respectively).
  • FIG. 9 provides a schematic representation 900 of a spectrogram 999 of the sound 605 emitted by the speaker 617 ( FIG. 6 ), after transformation 650 of the word “jousting” via the assignment state 654 , drawing upon the data structure 652 or 752 ( FIG. 8 ).
  • the vertical axis spans 300 Hz to 600 Hz rather than 0 Hz to 5000 Hz as in FIGS. 1 and 5 .
  • power is depicted through line thickness rather than color intensity, thicker lines representing greater power.
  • the IPA representation of the English word, “jousting”, 910 is 920 , and comprises seven phonetic symbols, 934 , a, 996 , , 997 , s, 944 , t, 904 , i, 980 , and , 967 .
  • the phonemes are, 934 , a , 994 , s, 944 , t, 904 , i, 980 , and , 967 .
  • the first phoneme, 934 is represented by an acoustic symbol defined by an ordered set of two ordered sets of eight elements, each defining a sound components of the acoustic symbol, [(449,20,90,50,0,1,100,100),(504,20,90,50,0,1,90,50)]. This definition calls for two sounds 925 and 923 .
  • the next ordered set of ordered sets [(317,0,150,50,0,1,84,67), (400,0,150,50,0,0.75,84,67)] defines an acoustic symbol comprising two sounds 929 and 928 representing 994 .
  • the next phoneme, s, 944 is represented by two un-pulsed sounds, one at 534 Hz, 927 , and the other at 566 Hz, 926 , each having a constant power of 50 dB, lasting 100 ms.
  • the phoneme, t, 904 is represented by two un-pulsed sounds, 933 and 932 , starting 20 ms, 908 and 931 , after the acoustic symbol representing the phoneme, s.
  • the phoneme, i, 980 is represented by two pulsed sounds, 937 and 936 .
  • the final acoustic symbol defined by the ordered set of ordered sets, [(336,0,100,50,0,1,100,100),(449,0,100,50,0,1,100,100),(534,0,100,60,0,1,126,80)], comprises three sounds.
  • One sound 948 is pulsed, and two sounds 947 and 946 are not.
  • the sound at 534 Hz, 948 is 10 dB louder that than the other two sounds 947 and 946 .
  • FIG. 10 a illustrates a configuration 1000 a of a cochlear implant hearing aid device and FIG. 10 b shows a schematic representation 1000 b of this device.
  • a microphone 1013 a , 1013 b transforms speech and other sounds into electrical signals that are conveyed to a sound and speech processor 1020 a , 1020 b via an electrical cable 1023 a , 1023 b .
  • the sound and speech processor unit 1020 a , 1020 b also houses a power supply for external components 1013 a , 1013 b , 1031 a , 1031 b and implanted components 1045 a , 1045 b of the cochlear implant hearing aid device.
  • the sound and speech processor 1020 a , 1020 b can contain bandpass filters to divide the acoustic waveforms into channels and convert the sounds into electrical signals. These signals go back through a cable 1024 a , 1024 b to a transmitter 1031 a , 1031 b attached to the head by a magnet, not shown, within a surgically implanted receiver 1045 a , 1045 b.
  • the transmitter 1031 a , 1031 b sends the signals and power from the sound and speech processing unit 1020 a , 1020 b via a combined signal and power transmission 1033 b (and similarly for 1000 a ) across the skin 1036 a , 1036 b to the implanted receiver 1045 a , 1045 b .
  • the receiver 1045 a , 1045 b uses the power from the combined signal and power transmission 1033 b to decodes the signal component of the transmission 1033 b and sends corresponding electrical waveforms through a cable 1049 a , 1049 b to an electrode array 1088 a , 1088 b surgically placed in the user's cochlea 1082 a , 1082 b .
  • the electrical waveforms stimulate local nerve tissue creating the perception of sound.
  • Individual electrodes, not shown, are positioned at different locations along the array 1088 a , 1088 b , allowing the device to deliver different stimuli representing sounds having different pitches, and importantly, having the sensation of different pitch to the user.
  • the effectiveness of a cochlear prosthesis depends to a large extent on the stimulation algorithm used to generate the waveforms sent to the individual electrodes of the electrode array 1088 a , 1088 b .
  • Stimulation algorithms are generally based on two approaches. The first places an emphasis on temporal aspects of speech and involves transforming the speech signal into different signals that are transmitted directly to the concerned regions of the cochlea. The second places an emphasis on spectral speech qualities and involves extracting features, such as formants, and formatting them according to the cochlea's tonotopy (the spatial arrangement of where sound is perceived).
  • Certain embodiments apply to novel stimulation algorithms for a cochlear prosthesis. These algorithms substitute some or all temporal and spectral features of natural speech for a small number (such as in a range of 10 to 500) of symbols, comprising the waveforms to be sent to the electrode array, 1088 a , 1088 b.
  • plots 1100 a provide a spectrogram 1120 a and waveform 1140 a for the word “chew” 1105 a.
  • the cochlea For a person with normal hearing, the cochlea provides the brain with detailed information about the speech signal shown by waveform 1140 a . Within the cochlea the original sound waveform 1140 a is lost in the process of being transformed into nerve impulses. These nerve impulses actually contain little information describing the actual waveform 1140 a , but instead, convey detailed information about power as a function of time and frequency. Therefore, a spectrogram such as spectrogram 1120 a , but not a waveform, is a convenient representation of the information conveyed through the auditory nerve to the auditory cortex of the brain.
  • a cochlear prosthesis (see FIG. 10 ) can restore a level of hearing to a person whose cochlea is not functional, but still has a functional auditory cortex and auditory nerve innervating the cochlea.
  • the cochlear prosthesis electrically stimulates nervous tissue in the cochlea, resulting in nerve impulses traveling along the auditory nerve to the auditory cortex of the brain.
  • hearing can often be successfully restored to deafened individuals, speech recognition often remains challenging.
  • the cochlea divides the speech signal into several thousand overlapping frequency bands that the auditory cortex uses to extract speech information.
  • Prior cochlear implants are able to provide a speech signal divided into just a dozen or so frequency bands. As a result, much of the fine spectral detail is lost as many frequency bands are blended into a few frequency bands. The auditory cortex is thereby deprived of much of the speech information it normally uses to identify features of spoken language.
  • plots 1100 b schematically illustrate the spectral resolution and detail of a speech signal shown by a spectrogram 1120 b generated by a conventional cochlear prosthesis.
  • Gross temporal and spectral features are similar to that of natural speech shown by the spectrogram 1120 a .
  • spectrally important portions 1121 b , 1122 b , 1123 b of the phonemes 1124 a and u , 1186 a lack the fine detail seen in the natural speech example shown at portions 1121 a , 1122 a , 1123 a.
  • stimulation algorithms are used to help convey speech information through the limited number of frequency bands or channels.
  • Stimulation algorithms are generally based on two approaches. The first places an emphasis on temporal aspects of speech and involves transforming the speech signal into different signals that are transmitted directly to the concerned regions of the cochlea. The second places an emphasis on spectral speech qualities and involves extracting features, such as formants, and formatting them according to the cochlea's tonotopy (the spatial arrangement of where sound is perceived).
  • Current stimulation algorithms do help, but are unable to provide most users with speech recognition comparable to that of those with normal hearing.
  • Certain embodiments apply to novel stimulation algorithms for cochlear prostheses. These algorithms substitute some or all temporal and spectral features of natural speech for a small number (approximately 20 to 100) of symbols, comprising the waveforms to be sent to the electrode array 1088 a , 1088 b as shown in FIG. 10 .
  • the symbols themselves may represent phonemes, sets of phonemes, or types of phonemes.
  • plots 1100 C schematically illustrate a speech signal shown by spectrogram 1120 c as might result from recoding for the word “chew” 1105 a using a phoneme substitution method of certain embodiments described herein.
  • the symbols may, but do not need to, preserve some spectral and temporal features of the natural speech signal shown by the spectrogram 1120 a .
  • the conventional stimulation algorithm shown by plots 1100 b approximates spectral features 1121 a of the phoneme, 1124 a , and spectral features 1122 a , 1123 a of the phoneme, u, 1186 a in corresponding areas 1121 b , 1122 b , 1123 b .
  • a speech signal generated using a stimulation algorithm employing phoneme substitution does not approximate spectral features 1121 a , of the phoneme, 1124 a , and spectral features 1122 a , 1123 a of the phoneme, u, 1186 a in its corresponding areas 1172 c , 1174 c , 1176 c , 1178 c.
  • An advantage of certain embodiments described herein is that, in principle, the speech signal will not vary from speaker to speaker and location to location. Another advantage is that the speech signal is no longer more complicated than the language based information it contains. Both features result in speech signals that are easier to learn and recognize than those generated using current state-of-the-art stimulation algorithms.
  • FIG. 12 provides an overview diagram 1200 of how one embodiment transforms speech 1209 (exemplified by the waveform illustrated in FIG. 1A by spectrogram 1100 a ) from a person speaking 1208 into simple symbols (exemplified in the speech signal illustrated in FIG. 11C by spectrogram 1120 c ) that are delivered to an electrode array of a user's cochlear implant 1288 .
  • the transformation is performed by external components of a cochlear implant system such as sound and speech processing unit 1220 .
  • the sound and speech processing unit or processor 1220 includes a microphone 1213 to transform speech sounds 1209 into electronic analog signals that are then digitized by an analog to digital converter 1222 .
  • the embodiment illustrated here provides a user interface 1219 that allows the selection of one of at least of two operating modes depending upon whether or not speech recognition is of primary interest to the user, in any given setting. Other embodiments need not provide this option.
  • a speech recognition process 1230 transforms digitized speech sounds into digital characters representing phonemes of the speech 1209 produced by the person speaking 1208 . Characters representing phonemes are then exchanged for digital representations of stimulation patterns by a transformation process 1250 .
  • the transformation process or transformer 1250 can be performed by software, by hardware, or by combinations of software and hardware.
  • the transformation process 1250 comprises a correspondence from a set of phonemes to stimulation patterns held in a database or other data structure 1252 and a process 1254 for generating a sequence of representations of stimulation patterns corresponding to a sequence of phonemes from the speech recognizer 1230 .
  • the digital representations are sent to a data and power transmitter 1231 and 1232 attached to the user's head by a magnet, not shown, within a surgically implanted receiver 1245 .
  • the transmitter 1231 and 1232 sends the signals and power from the sound and speech processing unit 1220 via a combined signal and power transmission 1233 across the skin 1236 to the implanted receiver 1245 .
  • the receiver 1245 uses the power from the combined signal and power transmission 1233 , the receiver 1245 decodes the signal component of the transmission 1233 and sends corresponding electrical waveforms through a cable 1249 to the electrode array 1288 surgically placed in the user's cochlea 1282 .
  • the value at decision state 1224 will be false, and the device will function using other stimulation algorithms 1215 .
  • FIG. 6 provides a generic diagram 600 of the inner workings of a speech recognizer 630 as might be employed by some embodiments.
  • the database 1252 of representations of stimulation patterns can be created and customized in consideration of each individual user.
  • a computer 1260 can be used to aid in the creation of user databases, which are then downloaded to the database memory 1252 of the sound and speech processing unit 1220 .
  • the computer 1260 comprises software allowing the input of data 1264 from a user's hearing tests, a user interface 1262 and a process or mapper 1270 for creating a map to be stored in the database 1252 to transform symbols representing phonemes into digital representations of stimulation patterns.
  • the process 1270 for creating the map to transform symbols representing phonemes into digital representations of stimulation patterns is similar to the process 670 shown in FIG. 6 , and defined in FIG. 7 .
  • the process 1270 can be considered a modified version of process 670 in which the interval [f l , f h ], is replaced with a set, G, of functional electrodes, ⁇ g n , g n+1 , g n+2 , . . . ⁇ of the electrode array 1288 .
  • the set, F then becomes the subset of G, its elements representing electrodes rather than frequencies.
  • FIG. 13 is a diagram 1300 showing an example structure of potential electrode assignments 1352 , such as stored in database 1252 , for one embodiment in which the user wishes to comprehend American English speech.
  • the upper portion of the figure shows the middle and inner ear 1360 including the cochlea 1365 .
  • Within the cochlea 1365 is an implanted electrode array 1320 of a cochlear prosthesis.
  • the electrode array 1320 comprises 16 electrodes, nine of which, 1303 , 1304 , 1305 , 1306 , 1307 , 1308 , 1309 , 1310 , 1311 , are functional and able to produce unique sound sensations for the user.
  • 39 American English phonemes are mapped using the exemplary data structure 1352 (stored in 1252 , FIG. 12 ) to stimulation patterns (symbols) comprising electrical waveforms being sent to different combinations of one, two, or three electrodes.
  • the symbols themselves may represent phonemes, sets of phonemes, portions of phonemes, or types of phonemes.
  • the symbols are unique combinations of stimuli at one or more electrodes.
  • the symbols are unique physical spacings of stimuli.
  • the combination of electrodes used and other qualities including, but are not limited to, pauses between some phonemes, duration, intensity, low frequency pulsations or higher frequency signals, stimulus rates, and shifts in the values of such parameters as a function of time, are unique, for each symbol.
  • phonemes are placed into groups of like phonemes (e.g., plosive, fricative, diphthong, monophthong, etc.). Such a placement of phonemes into groups of like phonemes is known to linguists and others skilled in such arts. All phonemes are then assigned a common electrode or channel (the root), all phonemes being given the same root. Each member of each group of like phonemes is assigned a second channel unique to that group. Once all phonemes have been assigned a second channel, the most frequently used phoneme of each group is not assigned additional channels. Therefore, the most frequently used phonemes are represented by unique combinations of two channels. One or more additional channels are then assigned to the remaining phonemes to create a unique combination of channels for each phoneme.
  • groups of like phonemes e.g., plosive, fricative, diphthong, monophthong, etc.
  • phonemes are placed into groups of like phonemes (e.g., plosive, fricative, diphthong, monophthong, etc.). Such a placement of phonemes into groups of like phonemes is known to linguists and others skilled in such arts. All phonemes are then assigned a common electrode or channel (the root), all phonemes being given the same root. Each member of each group of like phonemes is assigned a second channel unique to that group. Once all phonemes have been assigned a second channel, the most frequently used phoneme of each group is not assigned additional channels. Therefore, the most frequently used phonemes are represented by unique combinations of two channels. One or more additional channels are then assigned to the remaining phonemes to create a unique combination of channels for each phoneme. Next, every channel assignment for every phoneme in one group of like phonemes is shifted up or down along the electrode array. Additional groups of like phonemes may or may not be adjusted in a similar fashion.
  • groups of like phonemes may or may not be adjusted in a similar fashion.
  • the concept of phoneme substitution can be applied to sensory tissues other than the cochlea. These can include but are not limited to pressure, pain, stretch, temperature, photo and olfactory receptor tissue as well as innervating nerves tissue and corresponding central nervous system tissue.
  • FIGS. 14A and 14B provide schematic examples 1400 a and 1400 b of skin interfaces 1410 a and 1410 b of some embodiments.
  • FIG. 14A shows an interface 1410 a fitted about the hand and wrist of a person's left arm 1450 a for example.
  • the interface 1410 a comprises six stimulators 1401 a , 1402 a , 1403 a , 1404 a , 1405 a , 1406 a positioned against the person's skin 1440 a .
  • the stimulators have been placed as to assure that no two are close to being positioned over the same receptive field, the smallest area of skin capable of allowing the recognition of two different but similar stimuli.
  • the stimulators 1405 a and 1406 b are located under the wrist of the user.
  • FIG. 14B shows an interface 1410 b fitted about the wrist of a person's left arm 1450 b for example.
  • the interface 1410 b comprises six stimulators 1401 b , 1402 b , 1403 b , 1404 b , 1405 b , 1406 b positioned against the person's skin 1440 b , some close enough to each other to be on the outer threshold of occupying same receptive field.
  • the stimulators 1405 b and 1406 b are located under the wrist of the user.
  • FIG. 15 table 1500 , provides three examples for mapping English phonemes to tactile symbols suitable for use with the tactile interfaces 1410 a and 1410 b presented in FIG. 14 .
  • each of the three maps uses the same channel assignments, and each stimulator generates a motion of vibration perpendicular to the skin.
  • the first step for all three examples is to place phonemes into a group of like phonemes (e.g., plosive, fricative, diphthong, monophthong, etc.). These groups are known to linguists and others skilled in such arts.
  • Affricates being both plosive and fricative like, are assigned both channels 3 and 4.
  • No further channel assignments are made to the most frequently used member of each set, t, n, s, and These assignments can be made by linguists and others skilled in such arts.
  • Additional channels are assigned to other phonemes creating a unique combination of channel assignments corresponding to each.
  • the channel assignments for each phoneme are the same as in example 1. However for each tactile symbol representing a phoneme, the channel common to all members of its group of related phonemes is vibrated at a different frequency than the other channels comprising that symbol. These stimulators are indicated by boxes in the column for example 2. The advantage in this approach is that phonemes that sound most alike will feel most alike, and thereby enhance the learning process, and reduce errors.
  • the symbols are unique combinations of stimuli at one or more electrodes. In another embodiment, the symbols are unique physical spacings of stimuli. In another embodiment, the combination of electrodes used, and other qualities including, but are not limited to, pauses between some phonemes, duration, intensity, low frequency pulsations or higher frequency signals, stimulus rates, and shifts in the values of such parameters as a function of time, are unique for each symbol.
  • phonemes are placed into groups of like phonemes (e.g., plosive, fricative, diphthong, monophthong, etc.). Such a placement of phonemes into groups of like phonemes is known to linguists and others skilled in such arts. All phonemes are then assigned a common electrode or channel (the root), all phonemes being given the same root. Each member of each group of like phonemes is assigned a second channel unique to that group. Once all phonemes have been assigned a second channel, the most frequently used phoneme of each group is not assigned additional channels. Therefore, the most frequently used phonemes are represented by unique combinations of two channels. One or more additional channels are then assigned to the remaining phonemes to create a unique combination of channels for each phoneme.
  • groups of like phonemes e.g., plosive, fricative, diphthong, monophthong, etc.
  • phonemes are placed into groups of like phonemes (e.g., plosive, fricative, diphthong, monophthong, etc.). Such a placement of phonemes into groups of like phonemes is known to linguists and others skilled in such arts. All phonemes are then assigned a common electrode or channel (the root), all phonemes being given the same root. Each member of each group of like phonemes is assigned a second channel unique to that group. Once all phonemes have been assigned a second channel, the most frequently used phoneme of each group is not assigned additional channels. Therefore, the most frequently used phonemes are represented by unique combinations of two channels. One or more additional channels are then assigned to the remaining phonemes to create a unique combination of channels for each phoneme. Next, every channel assignment for every phoneme in one group of like phonemes is shifted up or down along the electrode array. Additional groups of like phonemes may or may not be adjusted in a similar fashion.
  • groups of like phonemes may or may not be adjusted in a similar fashion.
  • FIG. 16A via plots 1600 a , shows the word “chew” 1605 a ; its component phonemes, 1624 a ; and u, 1686 a ; a waveform 1635 a obtained when “chew” is spoken; “chew” written in machine shorthand 1645 a ; “chew” as it appears in acoustic symbols generated by the phoneme substitution method described herein 1655 a ; “chew” as it might be encoded by phoneme substitution and then transmitted to electrodes in a cochlear implant 1665 a ; “chew” as it might be transmitted to electrodes on a skin interface 1675 a ; and “chew” as it might be perceived in the form of its component phonemes by the user.
  • 16B diagram 1600 b , illustrates embodiments as transmitters 1605 b , 1635 b , 1645 b and receivers 1655 b , 1665 b , 1675 b .
  • a computer 1605 b is shown transmitting the typed word “chew” to a hearing aid 1655 b ; cochlear implant 1665 b ; or skin interface 1675 b .
  • the waveform produced by a person speaking 1635 b is shown being transmitted to 1655 b , 1665 b , and 1675 b .
  • the shorthand machine 1645 b is shown transmitting a signal to 1655 b , 1665 b , and 1675 b.

Abstract

Methods and devices (620, 1220, 1410) to deliver a tactile speech analog to a person's skin (404, 604, 1082, 1440) providing a silent, invisible, hands-free, eyes-free, and ears-free way to receive and directly comprehend electronic communications (1600 b). Embodiments include an alternative to hearing aids that will enable people with hearing loss to better understand speech. A device (1410), worn like watch or bracelet, supplements a person's remaining hearing to help identify and disambiguate those sounds he or she can not hear properly. Embodiments for hearing aids (620) and hearing prosthetics (1220) are also described.

Description

    REFERENCE TO PRIOR APPLICATION
  • This application claims the benefit of U.S. Provisional Application No. 60/705,219, filed Aug. 3, 2005, which is incorporated by reference in its entirety.
  • BACKGROUND
  • 1. Field of the Invention
  • The invention relates to somatic, auditory or cochlear communication to a user, and more particularly, to somatic, auditory or cochlear communication using phonemes.
  • 2. Description of the Related Art
  • Phonemes are the speech sounds that form the words of a language, when used alone or when combined. More precisely, a phoneme is the smallest phonetic unit, or part of speech that distinguishes one word from another. Various nomenclatures have been developed to describe words in terms of their constituent phonemes. The nomenclature of the International Phonetic Association (IPA) will be used here. Unless otherwise noted, examples of speech, speech sounds, phonetic symbols, phonetic spellings, and conventional spellings will be with respect to an American dialect of English, hereto forth referred to simply as English. The principles can be extended to other languages.
  • FIG. 1 illustrates several exemplary plots 100 to introduce several spectral and temporal features of human speech through the examination of the English word, “fake”, 105, and its component phonemes. The phonetic spelling (per the IPA) of the English word, “fake”, 105, is “faik”, 110. In English, the word comprises three separate phonemes: the consonant, “f”, 142; the diphthong vowel, “ai”, 191; and the consonant, “k”, 107. Because phonemes are language and dialect dependent, an English speaker will hear “ai” as a single sound, “long A”, 191, a diphthong (a sound combining two vowel sounds), while speakers of other languages may hear two different vowels, “a”, 113, and “i”, 114, each a monophthong (a single vowel sound). The phoneme, “k”, 107, also comprises two parts: a short period of relative silence, 117; followed by the abrupt appearance of sound frequencies in a range of about 2500 to 7000 Hz, 118.
  • Spectral and temporal features of the individual phonemes are partially observable when viewing a plot of the waveform 140 of the spoken word. Here, pressure is shown on the vertical axis and time is shown on the horizontal axis. A spectrogram 120 reveals greater detail and structure. Here, frequency is shown on the vertical axis, time on the horizontal axis, and power is represented as a grey scale, with darker shades corresponding to higher power (sound intensity) levels. The consonants “f”, 142, and “k”, 107, primarily consist of sound frequencies above approximately 3000 Hz, while the vowel “ai”, 191, primarily consists of sound frequencies below approximately 3500 Hz. The highlighted areas of the spectrogram 132, 134, 138 reveal additional features of human speech.
  • An early portion of the phoneme “f”, 132, magnified in panel (A), 133, comprises sound frequencies predominantly above 3000 Hz. The distribution of power is irregular over time and frequency giving rise to a sound quality resembling rushing air, and creating the granular pattern on the spectrogram 132, 133.
  • The highlighted portion of the phoneme “ai”, 134, magnified in panel (B), 135, shows a bimodal distribution of relatively low sound frequencies. Characteristic of diphthongs, one or more dominant frequencies, called “formants”, shift in frequency over time. A portion 136 of panel (B), 135, magnified further in panel (D), 137, reveals a waxing and waning of power in all frequencies, a characteristic of the human voice. Unvoiced phonemes such as “f”, 142, 132, 133, and “k”, 107, 118, 138, 139, do not exhibit these cyclical amplitude fluctuations.
  • Some phonemes increase or decrease in power or intensity over their duration. This is evident in the highlighted portion of the phoneme “k”, 138, magnified in panel (C), 139. Here, sound energy decreases continually during a period of about 70 milliseconds.
  • Another important feature of human speech is the period of relative silence preceding some consonants. In the current example, the phoneme “k”, 107, comprises approximately 70 milliseconds of quiet 117 followed by the audible portion 118 of the phoneme “k”, 107. Without this period of relative silence, some phonemes, including “k” would be unintelligible. Also, intervals of relative silence or power shifts are important for syllabification.
  • FIG. 2 is a table 200 of American English phonemes 225 shown in three nomenclatures: the International Phonics Association (IPA), s{mpA (a phonetic spelling of SAMPA, the abbreviation for Speech Assessment Methods Phonetic Alphabet, a computer readable phonetic alphabet), and the Merriam Webster Online Dictionary (m-w). Examples 226 of each phoneme (bold underlined letters) as used in an American English word are provided, along with the manner 237 and place 247 of articulation 227.
  • The manner of articulation 237 refers primarily to the way in which the speech organs, such as the vocal cords, tongue, teeth, lips, nasal cavity, etc. are used. Plosives 201, 204, 207, 211, 214, 217 are consonants pronounced by completely closing the breath passage and then releasing air. Fricatives 242, 243, 244, 245, 250, 252, 253, 254, 255 are consonants pronounced by forcing the breath through a narrow opening. Between the plosives, and the fricatives are two affricates 224, 234 composite speech sounds that begin as a plosive and end as a fricative. Nasals 261, 264, 267 are consonants pronounced with breath escaping mainly through the nose rather than the mouth. Approximants 274, 275, 276, 271 are sounds produced while the airstream is barely disturbed by the tongue, lips, or other vocal organs. Vowels are speech sounds produced by the passage of air through the vocal tract, with relatively little obstruction, including the monoplithong vowels 280, 281, 282, 283, 284, 285, 286, 287, 288, 289 and the diphthong vowels 291, 292, 293, 294, 295.
  • The place of articulation 247 refers largely to the position of the tongue, teeth, and lips. Bilabials, are pronounced by bringing both lips into contact with each other or by rounding them. Labiodentals are pronounced with the upper teeth resting on the inside of the lower lip. Dentals are formed by placing the tongue against the back of the top front teeth. Alveolars are sounded with the tongue touching or close to the ridge behind the teeth of the upper jaw. Palato-alveolars are produced by raising the tongue to or near the forward-most portion of the hard palate. Palatals are produced by raising the tongue to or near the hard palate. Velars are spoken with the back of the tongue close to, or in contact with, the soft palate (velum).
  • Other speech characteristics 228 include voice, dominant sound frequencies above about 3000 Hz (3 kHz+), and stops. In English, eight phonemes comprise a period of relative silence followed by a period of relatively high sound energy. These phonemes, called stops 228 are the plosives and the affricates 201, 204, 207, 211, 214, 217, 224, 234. Stops are not recognizable from their audible portion alone. Recognition of these phonemes requires that they begin with silence. Phonemes may be voiced or unvoiced. For example, “b”, 211, is the voiced version of “p”, 201, and “z”, 254, is the voiced version of “s”, 244. Most English consonants, the plosives, affricates, and fricatives 201, 204, 207, 211, 214, 217, 224, 234, 242, 243, 244, 245, 250, 252, 253, 254, 255 comprise sound frequencies above 3000 Hz. In order for an individual to be able to discriminate between these phonemes, he/she must be able to hear their higher frequencies. Unvoiced phonemes 201, 204, 207, 224, 242, 243, 244, 245, 250 in particular tend to be dominated by the higher sound frequencies.
  • SUMMARY OF CERTAIN EMBODIMENTS
  • In another embodiment there is a method of transforming a sequence of symbols representing phonemes into a sequence of arrays of nerve stimuli, the method comprising establishing a correlation between each member of a phoneme symbol set and an assignment of one or more channels of a multi-electrode array, accessing a sequence of phonetic symbols corresponding to a message, and activating a sequence of one or more electrodes corresponding to each phonetic symbol of the message identified by the correlation. The phonetic symbols may belong to one of SAMPA, Kirshenbaum, or IPA Unicode digital character sets. The symbols may belong to the cmudict phoneme set. The correlation may be a one to one correlation. Activating a sequence of one or more electrodes may include an energizing period for each electrode, wherein the energizing period comprises a begin time parameter and an end time parameter. The begin time parameter may be representative of a time from an end of components of a previous energizing period of a particular electrode. The electrodes may be associated with a hearing prosthesis. The hearing prosthesis may comprise a cochlear implant.
  • In one embodiment there is a method of processing a sequence of spoken words into a sequence of sounds, the method comprising converting a sequence of spoken words into electrical signals, digitizing the electrical signals representative of the speech sounds, transforming the speech sounds into digital symbols representing corresponding phonemes, transforming the symbols representing the corresponding phonemes into sound representations, and transforming the sound representations into sounds.
  • Transforming the symbols representing the phonemes into sound representations may comprise accessing a data structure configured to map phonemes to sound representations, locating the symbols representing the corresponding phonemes in the data structure, and mapping the phonemes to sound representations. The method additionally may comprise creating the data structure, comprising identifying phonemes corresponding to a language used by a user of the method, establishing a set of allowed sound frequencies, generating a correspondence mapping the identified phonemes to the set of allowed sound frequencies such that each constituent phoneme of the identified phonemes is assigned a subset of one or more frequencies from the set of allowed sound frequencies, and mapping each constituent phoneme of the identified phonemes to a set of one or more sounds. Establishing a set of allowed sound frequencies may comprise selecting a set of sound frequencies that are in a hearing range of the user. Each sound of the set of one more sounds may comprise an initial frequency parameter. Each sound of the set of one more sounds may comprise a begin time parameter. The begin time parameter may be representative of a time from an end of components of a previous sound representation. Each sound of the set of one more sounds may comprise an end time parameter. Each sound of the set of one more sounds may comprise a power parameter. Each sound of the set of one more sounds may comprise a power shift parameter. Each sound of the set of one more sounds may comprise a frequency shift parameter. Each sound of the set of one more sounds may comprise a pulse rate parameter. Each sound of the set of one more sounds may comprise a duty cycle parameter.
  • In another embodiment there is a method of processing a sequence of spoken words into a sequence of nerve stimuli, the method comprising converting a sequence of spoken words into electrical signals, digitizing the electrical signals representative of the speech sounds, transforming the speech sounds into digital symbols representing corresponding phonemes, transforming the symbols representing the corresponding phonemes into stimulus definitions and transforming the stimulus definitions into a sequence of nerve stimuli.
  • The nerve stimuli may be associated with a hearing prosthesis. The hearing prosthesis may comprise a cochlear implant. The nerve stimuli may be associated with a skin interface. The skin interface may be located on the wrist and/or hand of the user. Alternatively, the skin interface may be located on the ankle and/or foot of the user. The nerve stimuli may be mechanical and/or electrical. Transforming the symbols representing the phonemes into stimulus definitions may comprise accessing a data structure configured to map phonemes to stimulus definitions, locating the symbols representing the corresponding phonemes in the data structure, and mapping the phonemes to stimulus definitions. The stimulus definitions may comprise sets of one or more stimuli. The sets of one or more stimuli may correspond to one or more locations on the skin or one or more locations in the cochlea. Each stimulus of the sets of one or more stimuli may comprise a begin time parameter. The begin time parameter may be representative of a time from an end of components of a previous stimulus definition. Each stimulus of the sets of one or more stimuli may comprise an end time parameter.
  • In another embodiment there is a method of transforming a sequence of symbols representing phonemes into a sequence of arrays of nerve stimuli, the method comprising establishing a correlation between each member of a phoneme symbol set and an assignment of one or more channels of a multi-stimulator array, accessing a sequence of phonetic symbols corresponding to a message, and activating a sequence of one or more stimulators corresponding to each phonetic symbol of the message identified by the correlation. The stimulators may be vibrators affixed to the user's skin. The phonetic symbols may belong to one of SAMPA, Kirshenbaum, or IPA Unicode digital character sets. The symbols may belong to the cmudict phoneme set. The correlation may be a one to one correlation. Activating a sequence of one or more stimulators may include an energizing period for each stimulator, wherein the energizing period comprises a begin time parameter and an end time parameter. The begin time parameter may be representative of a time from an end of components of a previous energizing period of a particular stimulator.
  • In another embodiment there is a method of training a user, the method comprising providing a set of somatic stimulations to a user, wherein the set of somatic stimulations is indicative of a plurality of phonemes, and wherein the phonemes are based at least in part on an audio communication; providing the audio communication concurrently to the user with the plurality of phonemes; and selectively modifying at least portions of the audio communication to the user during the providing of the set of somatic stimulations to the user.
  • Selectively modifying at least portions of the audio communication may comprise reducing an audio property of the audio communication. The audio property may comprise a volume of the audio. The audio property may comprise omitting selected words from the audio. The audio property may comprise attenuating a volume of selected words from the audio. The audio property may comprise omitting selected phonemes from the audio. The audio property may comprise attenuating a volume of selected phonemes from the audio. The audio property may comprise omitting selected sound frequencies from the audio. The audio property may comprise attenuating a volume of selected sound frequencies from the audio.
  • In another embodiment there is a method of training a user, the method comprising providing a set of somatic stimulations to a user, wherein the set of somatic stimulations is indicative of a plurality of phonemes, and wherein the phonemes are based at least in part on an audiovisual communication; providing the audiovisual communication concurrently to the user with the plurality of phonemes; and selectively modifying at least portions of the audiovisual communication to the user during the providing of the set of somatic stimulations to the user.
  • Selectively modifying at least portions of the audiovisual communication may comprise reducing an audio or video property of the audiovisual communication. The audio property may comprise a volume of the audio. The audio property may comprise omitting selected words from the audio. The audio property may comprise attenuating a volume of selected words from the audio. The audio property may comprise omitting selected phonemes from the audio. The audio property may comprise attenuating a volume of selected phonemes from the audio. The audio property may comprise omitting selected sound frequencies from the audio. The audio property may comprise attenuating a volume of selected sound frequencies from the audio. The video property may comprise a presence or brightness of the video.
  • In another embodiment there is a system for processing a sequence of spoken words into a sequence of sounds, the system comprising a first converter configured to digitize electrical signals representative of a sequence of spoken words, a speech recognizer configured to receive the digitized electrical signals and generate a sequence of phonemes representative of the sequence of spoken words, a mapper configured to assign sound sets to phonemes utilizing an audiogram so as to generate a map, a transformer configured to receive the sequence of phonemes representative of the sequence of spoken words and the map and to generate a sequence of sound representations corresponding to the sequence of phonemes, and a second converter configured to convert the sequence of sound representations into a sequence of audible sounds. The map may be a user-specific map based on a particular user's audiogram.
  • In another embodiment there is a system for processing a sequence of spoken words into a sequence of sounds, the system comprising a first converter configured to digitize electrical signals representative of a sequence of spoken words, a speech recognizer configured to receive the digitized electrical signals and generate a sequence of phonemes representative of the sequence of spoken words, a data structure comprising sound sets mapped to phonemes, a transformer configured to receive the sequence of phonemes representative of the sequence of spoken words and the data structure and to generate a sequence of sound representations corresponding to the sequence of phonemes, and a second converter configured to convert the sequence of sound representations into a sequence of audible sounds. The data structure may be generated utilizing a user's audiogram.
  • In another embodiment there is a system for processing a sequence of spoken words into a sequence of nerve stimuli, the system comprising a converter configured to digitize electrical signals representative of a sequence of spoken words, a speech recognizer configured to receive the digitized electrical signals and generate a sequence of phonemes representative of the sequence of spoken words, a mapper configured to assign nerve stimuli arrays to phonemes utilizing an audiogram so as to generate a map, and a transformer configured to receive the sequence of phonemes representative of the sequence of spoken words and the map and to generate a sequence of stimulus definitions corresponding to the sequence of phonemes. The system may additionally comprise a receiver configured to convert the sequence of stimulus definitions into electrical waveforms and an electrode array configured to receive the electrical waveforms. The electrode array may be surgically placed in the user's cochlea. The sequence of stimulus definitions may comprise digital representations of nerve stimulation patterns.
  • In another embodiment there is a system for processing a sequence of spoken words into a sequence of nerve stimuli, the system comprising a converter configured to digitize electrical signals representative of a sequence of spoken words, a speech recognizer configured to receive the digitized electrical signals and generate a sequence of phonemes representative of the sequence of spoken words, a data structure comprising nerve stimuli arrays mapped to phonemes, and a transformer configured to receive the sequence of phonemes representative of the sequence of spoken words and the data structure and to generate a sequence of stimulus definitions corresponding to the sequence of phonemes. The data structure may be generated utilizing a user's audiogram. The system may additionally comprise a receiver configured to convert the sequence of stimulus definitions into electrical waveforms and an electrode array configured to receive the electrical waveforms. The electrode array may be surgically placed in the user's cochlea. The sequence of stimulus definitions may comprise digital representations of nerve stimulation patterns.
  • In another embodiment there is a system for processing a sequence of spoken words into a sequence of nerve stimuli, the system comprising a processor configured to generate a sequence of phonemes representative of a sequence of spoken words and to transform the sequence of phonemes using a data structure comprising nerve stimuli arrays mapped to phonemes to produce a sequence of stimulus definitions corresponding to the sequence of phonemes, and an electrode array configured to play the sequence of stimulus definitions. The data structure may be generated utilizing a user's audiogram. The electrode array may comprise a converter configured to convert the sequence of stimulus definitions into electrical waveforms. The electrode array may be surgically placed in the user's cochlea. The electrode array may comprise a plurality of mechanical stimulators or a plurality of electrodes. The sequence of stimulus definitions may comprise digital representations of nerve stimulation patterns.
  • In another embodiment there is a system for processing a sequence of spoken words into a sequence of sounds, the system comprising a processor configured to generate a sequence of phonemes representative of the sequence of spoken words and to transform the sequence of phonemes using a data structure comprising sound sets mapped to phonemes to produce sound representations corresponding to the sequence of phonemes, and a converter configured to convert the sound representations into audible sounds. The data structure may be generated utilizing a user's audiogram.
  • In another embodiment there is a system for processing a sequence of text into a sequence of sounds, the system comprising, a first converter configured to receive a sequence of text and generate a sequence of phonemes representative of the sequence of text, a mapper configured to assign sound sets to phonemes utilizing a hearing audiogram so as to generate a map, a transformer configured to receive the sequence of phonemes representative of the sequence of text and the map and to generate sound representations corresponding to the sequence of phonemes, and a second converter configured to convert the sound representations into audible sounds. The hearing audiogram may be representative of a normal human hearing range. The hearing audiogram may be representative of a hearing range for a specific individual.
  • In another embodiment there is a system for processing a sequence of text into a sequence of sounds, the system comprising a text converter configured to receive a sequence of text and generate a sequence of phonemes representative of the sequence of text, a data structure comprising sound sets mapped to phonemes, a transformer configured to receive the sequence of phonemes representative of the sequence of text and the data structure and to generate sound representations corresponding to the sequence of phonemes, and a second converter configured to convert the sound representations into audible sounds. The data structure may be generated utilizing a user's audiogram.
  • In another embodiment there is a system for processing a sequence of text into a sequence of nerve stimuli, the system comprising a converter configured to receive a sequence of text and generate a sequence of phonemes representative of the sequence of text, a data structure comprising nerve stimuli arrays mapped to phonemes, and a transformer configured to receive the sequence of phonemes representative of the sequence of text and the data structure and to generate a sequence of stimulus definitions corresponding to the sequence of phonemes. The data structure may be generated utilizing a user's abilities. The user's abilities may comprise useable channels of a cochlear implant of the user. The user's abilities may comprise the ability to distinguish between two or more unique stimuli.
  • In another embodiment there is a method of processing a sequence of text into a sequence of sounds, the method comprising transforming the sequence of text into digital symbols representing corresponding phonemes, transforming the symbols representing the corresponding phonemes into sound representations, and transforming the sound representations into a sequence of sounds.
  • In another embodiment there is a method of processing a sequence of text into a sequence of nerve stimuli, the method comprising transforming the sequence of text into digital symbols representing corresponding phonemes, transforming the symbols representing the corresponding phonemes into stimulus definitions, and transforming the stimulus definitions into a sequence of nerve stimuli. The nerve stimuli may be associated with a cochlear implant. The nerve stimuli may be associated with a skin interface, where the skin interface may be located on the wrist and/or hand of the user. Transforming the symbols representing the phonemes into stimulus definitions may comprise accessing a data structure configured to map phonemes to stimulus definitions, locating the symbols representing the corresponding phonemes in the data structure, and mapping the phonemes to stimulus definitions.
  • In yet another embodiment there is a method of creating a data structure configured to transform symbols representing phonemes into sound representations, the method comprising identifying phonemes corresponding to a language utilized by a user, establishing a set of allowed sound frequencies, generating a correspondence mapping the identified phonemes to the set of allowed sound frequencies such that each constituent phoneme of the identified phonemes is assigned a subset of one or more frequencies from the set of allowed sound frequencies, and mapping each constituent phoneme of the identified phonemes to a set of one or more sounds. Establishing a set of allowed sound frequencies may comprise selecting a set of sound frequencies that are in a hearing range of the user. Each sound of the set of one more sounds may comprise an initial frequency parameter. Each sound of the set of one more sounds may comprise a begin time parameter. The begin time parameter may be representative of a time from an end of components of a previous sound representation. Each sound of the set of one more sounds may comprise an end time parameter. Each sound of the set of one more sounds may comprise a power parameter. Each sound of the set of one more sounds may comprise a power shift parameter. Each sound of the set of one more sounds may comprise a frequency shift parameter. Each sound of the set of one more sounds may comprise a pulse rate parameter. Each sound of the set of one more sounds may comprise a duty cycle parameter.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a diagram showing a spectrogram, waveform and phonemes for an English word.
  • FIG. 2 is a table of English phonemes shown in three nomenclatures.
  • FIG. 3A is a plot of sound intensity and sound frequency showing normal human hearing.
  • FIG. 3B is a plot of sound intensity and sound frequency showing hearing loss such as caused by chronic exposure to loud noise.
  • FIG. 3C is a plot of hearing level and sound frequency, as would appear in the form of a clinical audiogram, showing normal human hearing, and is analogous to the plot of FIG. 3A.
  • FIG. 3D is a plot of hearing level and sound frequency, as would appear in the form of a clinical audiogram, showing hearing loss such as caused by chronic exposure to loud noise, and is analogous to the plot of FIG. 3B.
  • FIGS. 4A and 4B are diagrams showing conventional physical configurations of body-worn and in-the-ear hearing aids, respectively.
  • FIGS. 4C and 4D are diagrams showing functional components of low-complexity and medium-complexity hearing aids, respectively.
  • FIG. 4E is a diagram of a phoneme substitution based hearing aid.
  • FIG. 5A is a diagram showing a spectrogram, waveform and phonemes for an English word “chew”
  • FIG. 5B is a diagram similar to that of FIG. 5A but showing use of amplification in the spectrogram and waveform.
  • FIG. 5C is a diagram similar to that of FIG. 5A but showing use of speech processing in the spectrogram and waveform.
  • FIG. 5D is a diagram similar to that of FIG. 5A but showing use of phoneme substitution in the spectrogram and waveform.
  • FIG. 6 is a diagram of an embodiment of the components associated with a hearing aid using phoneme substitution.
  • FIG. 7 is a flowchart of an embodiment of an assignment of sound sets to phonemes process shown in FIG. 6.
  • FIG. 8 is a diagram of an example of a phoneme substitution data structure such as resulting from the assignment of sound sets to phonemes process shown in FIG. 7.
  • FIG. 9 is a plot of a spectrogram for the English word “jousting” as a result of phoneme substitution such as performed using the data structures shown in FIG. 8.
  • FIG. 10A is a diagram of physical components of an example of a cochlear implant hearing device.
  • FIG. 10B is a diagram of a functional configuration of the example cochlear implant hearing device shown in FIG. 10A.
  • FIG. 11A is a diagram showing a spectrogram, waveform and phonemes for an English word “chew”
  • FIG. 11B is a diagram similar to that of FIG. 11A but showing use of conventional sound processing in the spectrogram.
  • FIG. 11C is a diagram similar to that of FIG. 11A but showing use of phoneme substitution in the spectrogram.
  • FIG. 12 is a diagram of an embodiment of the components associated with a hearing implant using phoneme substitution.
  • FIG. 13 is a diagram showing an embodiment of an implanted electrode array and an example structure of potential electrode assignments, such as stored in the database of nerve stimuli arrays to phonemes shown in FIG. 12.
  • FIG. 14A is a diagram of an embodiment of a skin interface, used with phoneme substitution, having mechanical or electrical stimulators fitted about a person's hand and wrist.
  • FIG. 14B is a diagram of an embodiment of a skin interface, used with phoneme substitution, having mechanical or electrical stimulators fitted about a person's wrist.
  • FIG. 15 is a table providing examples of mapping English phonemes to tactile symbols, such as for the skin interfaces shown in FIGS. 14A and 14B.
  • FIG. 16A is a diagram of various ways of representing the English word “chew”.
  • FIG. 16B is a diagram showing embodiments of transmitters and receivers for implementing phoneme substitution communication, such as shown in FIGS. 6, 12 and 14A and 14B.
  • DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS
  • The following detailed description of certain embodiments presents various descriptions of specific embodiments of the invention. However, the invention can be embodied in a multitude of different ways as defined and covered by the claims. In this description, reference is made to the drawings wherein like parts are designated with like numerals throughout.
  • The terminology used in the description presented herein is not intended to be interpreted in any limited or restrictive manner, simply because it is being utilized in conjunction with a detailed description of certain specific embodiments of the invention. Furthermore, embodiments of the invention may include several novel features, no single one of which is solely responsible for its desirable attributes or which is essential to practicing the inventions herein described.
  • The system is comprised of various modules, tools, and applications as discussed in detail below. As can be appreciated by one of ordinary skill in the art, each of the modules may comprise various sub-routines, procedures, definitional statements and macros. Each of the modules are typically separately compiled and linked into a single executable program. Therefore, the following description of each of the modules is used for convenience to describe the functionality of the preferred system. Thus, the processes that are undergone by each of the modules may be arbitrarily redistributed to one of the other modules, combined together in a single module, or made available in, for example, a shareable dynamic link library.
  • The system modules, tools, and applications may be written in any programming language such as, for example, C, C++, BASIC, Visual Basic, Pascal, Ada, Java, HTML, XML, or FORTRAN, and executed on an operating system, such as variants of Windows, Macintosh, UNIX, Linux, VxWorks, or other operating system. C, C++, BASIC, Visual Basic, Pascal, Ada, Java, HTML, XML and FORTRAN are industry standard programming languages for which many commercial compilers can be used to create executable code.
  • A computer or computing device may be any processor controlled device, which may permit access to the Internet, including terminal devices, such as personal computers, workstations, servers, clients, mini-computers, main-frame computers, laptop computers, a network of individual computers, mobile computers, palm-top computers, hand-held computers, set top boxes for a television, other types of web-enabled televisions, interactive kiosks, personal digital assistants, interactive or web-enabled wireless communications devices, mobile web browsers, or a combination thereof. The computers may further possess one or more input devices such as a keyboard, mouse, touch pad, joystick, pen-input-pad, and the like. The computers may also possess an output device, such as a visual display and an audio output. One or more of these computing devices may form a computing environment.
  • These computers may be uni-processor or multi-processor machines. Additionally, these computers may include an addressable storage medium or computer accessible medium, such as random access memory (RAM), an electronically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), hard disks, floppy disks, laser disk players, digital video devices, compact disks, video tapes, audio tapes, magnetic recording tracks, electronic networks, and other techniques to transmit or store electronic content such as, by way of example, programs and data. In one embodiment, the computers are equipped with a network communication device such as a network interface card, a modem, or other network connection device suitable for connecting to the communication network. Furthermore, the computers execute an appropriate operating system such as Linux, UNIX, any of the versions of Microsoft Windows, Apple MacOS, IBM OS/2 or other operating system. The appropriate operating system may include a communications protocol implementation that handles all incoming and outgoing message traffic passed over the Internet. In other embodiments, while the operating system may differ depending on the type of computer, the operating system will continue to provide the appropriate communications protocols to establish communication links with the Internet.
  • The computers may contain program logic, or other substrate configuration representing data and instructions, which cause the computer to operate in a specific and predefined manner, as described herein. A computer readable medium can store the data and instructions for the processes and methods described hereinbelow. In one embodiment, the program logic may be implemented as one or more object frameworks or modules. These modules may be configured to reside on the addressable storage medium and configured to execute on one or more processors. The modules include, but are not limited to, software or hardware components that perform certain tasks. Thus, a module may include, by way of example, components, such as, software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.
  • The various components of the system may communicate with each other and other components comprising the respective computers through mechanisms such as, by way of example, interprocess communication, remote procedure call, distributed object interfaces, and other various program interfaces. Furthermore, the functionality provided for in the components, modules, and databases may be combined into fewer components, modules, or databases or further separated into additional components, modules, or databases. Additionally, the components, modules, and databases may be implemented to execute on one or more computers. In another embodiment, some of the components, modules, and databases may be implemented to execute on one or more computers external to a website. In this instance, the website may include program logic, which enables the website to communicate with the externally implemented components, modules, and databases to perform the functions as disclosed herein.
  • The plots 100 of FIG. 1 illustrate one word in one language. Each language and dialect has its own set or sets of phonemes (different classification systems may define different sets of phonemes for the same language or dialect). The scope of this description encompasses all phonemes, both those currently defined and those not yet defined, for all languages.
  • As previously described, FIG. 2 is a table 200 of American English phonemes 225 shown in three nomenclatures: the International Phonics Association (IPA), s{mpA (a phonetic spelling of SAMPA, the abbreviation for Speech Assessment Methods Phonetic Alphabet), and the Merriam Webster Online Dictionary (m-w). Other nomenclatures, such as the Carnegie Mellon University pronouncing dictionary (cmudict), can be used in certain embodiments. Examples 226 of each phoneme as used in an American English word are provided, along with the manner 237 and place 247 of articulation 227.
  • Some embodiments relate to recoding phonemes to sets of sound frequencies that can be perceived by the user lacking the ability to hear the full range of human speech sounds.
  • FIG. 3A, plot 300 a, shows a range of human hearing that is considered normal region 310 a, on a plot of the sound frequency in Hertz (horizontal axis) versus the sound intensity in watts/m2 (vertical axis). The threshold of perception is the bottom (low intensity) boundary 312 a, 314 a, which varies as a function of frequency. Human hearing is most sensitive to sound frequencies around 3000 Hz. At these frequencies, the threshold of perception 314 a can be less than 10−12 watts/m2 (0 dB) 341 a for some individuals. The threshold of discomfort is the top (high intensity) boundary, 316 a. The low frequency limit of human hearing is defined as the frequency that is both the threshold of perception and the threshold of discomfort 318 a. The high frequency limit of human hearing 319 a is defined in the same manner. For reference, the OSHA limit for safe long term exposure to noise in the work environment, 90 dB, is equivalent to 10−3 watts/m 2 343 a. Sound frequencies and intensities required for speech perception are generally between about 300 Hz and 9000 Hz and about 10−10 to 10−7 watts/m2 (20 dB to 50 dB) region 320 a. Lower frequencies area 323 a are most important for the recognition of vowel sounds, while higher sound frequencies area 326 a, are more important for the recognition of consonants (also see FIGS. 1 and 2).
  • Five to ten percent of people have a more limited hearing range shown in region 310 than that shown in FIG. 3A. Many different types of hearing impairments exist. For example, one or both ears may be affected in their sensitivities to different sound frequencies. Hearing impairments may be congenital or acquired later in life, and may result from, or be influenced by, genetic factors, disease processes, medical treatments, and/or physical trauma.
  • Exposure to loud noise causes irreversible damage to the human hearing apparatus. FIG. 3B, plot 300 b, illustrates a reduced range of hearing region 310 b as might result from chronic exposure to noise levels above 90 dB 343 b. Although the threshold of perception for low frequency sounds 312 b is only slightly affected, the ability to hear higher frequency sounds 314 b is significantly impaired. A person with a hearing range as shown in FIG. 3B at region 310 b, would be able to hear and recognize most low frequency vowel sounds at region 320 b, but would find it difficult or impossible to hear and recognize many high frequency consonant sounds 330 b. As a result, this person would be able to hear when people are speaking, but would be unable to understand what they are saying. For reference, the normal threshold of perception, 0 dB or 10−12 watts/m2 is indicated by the arrow 341 b and the OSHA limit for safe long term exposure to noise in the work environment, 90 dB or 10−3 watts/m2, is indicated by the arrow 343 b. Often, the threshold of discomfort 316 b is relatively unaffected by a rise in the threshold of perception.
  • Often, hearing aids can improve speech recognition by amplifying speech sounds above the threshold of perception for hearing impaired persons. One embodiment is a device that recodes speech sounds to frequencies in a range of sensitive hearing rather than amplifying them at the frequencies where hearing is impaired. For example, an individual with a hearing range similar to that shown in FIG. 3B, region 300 b, would not hear most speech sounds at frequencies above around 1500 Hz in region 330 b, but could hear sounds recoded to sound frequencies around 400 Hz in area 350 b.
  • Audiometry provides a practical and clinically useful measurement of hearing by having the subject wear earphones attached to the audiometer. Pure tones of controlled intensity are delivered to one ear at a time. The subject is asked to indicate when he or she hears a sound. The minimum intensity (volume) required to hear each tone is graphed versus frequency. The objective of audiometry is to plot an audiogram, a chart of the weakest intensity of sound that a subject can detect at various frequencies.
  • Although an audiogram presents similar information to the graphs in FIGS. 3A and 3B, it differs in several aspects. Although the human ear can detect frequencies from 20 to 20,000 Hz, hearing threshold sensitivity is usually measured only for the frequencies needed to hear the sounds of speech, 250 to 8,000 Hz. The sound intensity scale of an audiogram is inverted compared with the graphs in FIGS. 3A and 3B, and measured in decibels, dB, a log scale where zero has been arbitrarily defined as 10−12 watts/m2. Also, the audiogram provides an individual assessment of each ear.
  • FIG. 3C, plot 300 c, shows an audiogram for an individual with normal hearing, similar to that shown in FIG. 3A, plot 300 a. A shaded area 320 c represents the decibel levels and frequencies where speech sounds are generally perceived (the so-called “speech banana”, similar to the shaded area of FIG. 3A, region 320 a, but inverted). Hearing in the right ear is represented by circles connected by a line 362 c and in the left ear by crosses connected by a line 364 c. The symbols (circle for the right ear and cross for the left) indicate the person's hearing threshold at particular frequencies, e.g., the loudness (intensity) point where sound is just audible. Thresholds of perception from zero dB, shown by arrow 341 c to 15 dB (1.0×10−2 to 3.2×10−10 watts/m2) are considered to be within the normal hearing range. The OSHA limit for safe long-term exposure to noise, 90 dB, or 10−3 watts/m2, shown by arrow 343 c, is also provided for reference. An area designated 323 c indicates the range most important for hearing vowel sounds, and an area designated 326 c indicates the range most important for hearing consonants.
  • FIG. 3D, plot 300 d, shows an exemplary audiogram of an individual with bilaterally symmetrical hearing loss (similar hearing losses in both ears) similar to that shown in FIG. 3B, plot 300 b). Hearing in the right ear is represented by circles connected by a line 362 d and in the left ear by crosses connected by a line 364 d. At the lower frequencies (250 to 500 Hz), little hearing loss has occurred, area 350 d. However, at the mid-range of frequencies (500 to 1000 Hz) hearing loss is moderate, area 320 d, and at the higher frequencies (>2000 Hz), hearing loss is severe, area 330 d. A person with this degree of hearing loss would able to hear and recognize most low frequency vowel sounds, area 320 d, but would find it difficult or impossible to hear and recognize many high frequency consonant sounds, area 330 d. As a result, this person would be able to hear when people are speaking, but would be unable to understand what they are saying. Again, the normal threshold of perception, 0 dB or 10−12 watts/m2, shown by arrow 341 d, and the OSHA limit for safe long term exposure to noise in the work environment, 90 dB or 10−3 watts/m2, shown by arrow 343 d, are provided for reference.
  • Often, hearing aids can improve speech recognition by amplifying speech sounds above the threshold of perception for hearing impaired persons. An embodiment is a device that recodes speech sounds to frequencies in a range of sensitive hearing rather than amplifying them at the frequencies where hearing is impaired. For example, an individual with an audiogram similar to that shown in FIG. 3D, plot 300 d, would not hear most speech sounds at frequencies above around 1500 Hz, area 330 d, but could hear sounds recoded to sound frequencies around 400 Hz, area 350 d.
  • There are many types of hearing aids, which vary in physical configuration, power, circuitry, and performance. They all aid sound and speech perception by amplifying sounds that would otherwise be imperceptible to the user; however, their effectiveness is often limited by distortion and the narrow range in which the amplified sound is audible, but not uncomfortable. Certain embodiments described herein overcome these limitations.
  • FIGS. 4A and 4B, diagrams 400 a, 400 b, illustrate some of the basic physical configurations found in hearing aid designs. A body worn aid 420 a may comprise a case 412 a containing a power supply and components of amplification; and an ear mold 416 a containing an electronic speaker, connected to the case by a cord 414 a. Behind-the-ear aids 410 b, 420 b may consist of a small case 412 b containing a power supply, components of amplification and an electronic speaker, which fits behind an ear 404 b; an ear mold 416 b; and a connector 414 b, which conducts sound to the ear 404 b through the ear mold 416 b. In-the-ear aids 430 b comprise a power supply, components of amplification, and an electronic speaker, fit entirely within an outer ear 406 b.
  • Operational principles of hearing aids may vary among devices, even if they share the same physical configuration. FIGS. 4C and 4D, diagrams 400 c and 400 d, illustrate some of the functional components found in hearing aid designs. The least complex device 420 c comprises a microphone 413 c, which converts sounds such as speech from another person 408 c into an electronic signal. The electronic signal is then amplified by an amplifier 415 c and converted back into sound by an electronic speaker 417 c in proximity to the user's ear 404 c.
  • More sophisticated devices 420 d comprise a microphone 413 d and a speaker 417 d, which perform the same functions as their counterparts 413 c, 417 c respectively. However, sound and speech processing circuitry 415 d can function differently from simple amplification circuitry 415 c. Sound and speech processing circuitry 415 d may be either digital or analog in nature. Unlike the simple amplifier 415 c, sound and speech processing circuitry 415 d can amplify different portions of the sound spectrum to different degrees. These devices might incorporate electronic filters that reduce distracting noise and might be programmed with different settings corresponding to the user's needs in different environments (e.g., noisy office or quiet room).
  • An embodiment is shown in FIG. 4E, diagram 400 e. A device 420 e differs in its principle of operation from the hearing aids 420 c and 420 d in that its circuitry 415 e can substitute the phonemes of speech sounds with unique sets of sounds (acoustic symbols). By substituting some or all of the phonemes in a given language with simple acoustic symbols, it is possible to utilize portions of the sound spectrum where a user may have relatively unimpaired hearing. The symbols themselves may represent phonemes, sets of phonemes, portions of phonemes, or types of phonemes. For an individual with an audiogram similar to that shown in FIG. 3D, the acoustic symbols could, for example, comprise sound frequencies between 200 Hz and 600 Hz, which would be audible to that person.
  • In FIG. 5, the English word, “chew”, 505 a, is used to compare and contrast certain embodiments described herein to conventional technologies. FIG. 5A, plots 500 a provides a spectrogram 520 a and waveform 540 a for the word, “chew” 505 a. When spoken, “chew” comprises two phonemes,
    Figure US20090024183A1-20090122-P00001
    524 a, and u, 586 a, which are visible as two distinctive regions 542 a and 544 a of the waveform 540 a. However, as with the example for the English word, “fake”, FIG. 1, plots 100, the waveform is too complex to expose much informative detail via visual inspection. The spectrogram 520 a reveals a greater level of relevant detail. Here it is seen that the phoneme,
    Figure US20090024183A1-20090122-P00002
    524 a comprises a complex set of sound frequencies 521 a broadly distributed largely above 3000 Hz. Most of the power for the phoneme, u, 586 a, is contained in relatively tight frequency ranges around 500 Hz, 523 a, and 2500 Hz, 522 a. Additionally, u, 586 a, is a voiced phoneme, exhibiting characteristic waxing and waning of power over many frequencies, observable as faint vertical stripes within the bands labeled 522 a and 523 a. The waxing and waning itself has a frequency of approximately 250 Hz (≈25 stripes per 100 milliseconds on the time axis).
  • An individual with an audiogram similar to that shown in FIG. 3D, plots 300 d, might be able to hear the phoneme, u, 586 a, reasonably well because its frequencies are in the lower range of speech. However, this individual would not hear
    Figure US20090024183A1-20090122-P00002
    524 a because this person's hearing is impaired at higher frequencies. A hearing aid using simple amplification can help to some extent by increasing the sound pressure (a.k.a. volume, a.k.a. power) at all frequencies as illustrated in FIG. 5B, plots 500 b. As seen in the waveform, 540 b, sound pressure has been increased for the phonemes,
    Figure US20090024183A1-20090122-P00002
    542 b and u, 544 b relative to corresponding portions of the waveform 540 a, 542 a and 544 a, FIG. 5A. The spectrogram reveals that low frequency sounds, 523 b, have been amplified even though there is little or no need for amplification at these frequencies. This can result in distorted perception and discomfort for the user. Extraneous ambient noise is also amplified, as seen in area 528 b, interfering with speech recognition and comfort.
  • FIG. 5C, plots 500 c, illustrates a spectrogram 520 c and waveform 540 c obtained when the word, “chew” is spoken into a hearing aid with speech/sound processing capability. Increased amplitude is observed in the waveform area 542 c but less so in the area 544 c relative to corresponding portions of the waveform 540 a, 542 a and 544 a, FIG. 5A. The spectrogram 520 c reveals that most amplification occurs at the higher frequencies 521 c and 522 c but less so at the lower frequencies 523 c. Therefore the low frequency components 523 c of the phoneme u, 586 c are not too loud. Noise problems are also reduced. However, the sound at 521 c and 522 c may be so loud that it is uncomfortable and could damage remaining hearing.
  • FIG. 5D, plots 500 d, provides an example of a waveform, 540 d, and spectrogram, 520 d, as might result from recoding the word “chew” using the phoneme substitution method described herein. The waveform 540 d and spectrogram 520 d have been simplified relative to those in FIGS. 5A, 5B, and 5C, 540 a, 520 a, 540 b, 520 b, 540 c, 520 c, and all sound energy has been redirected to frequencies easily audible for an individual having an audiogram, plots 300 d, similar to that shown in FIG. 3D. The portion of the waveform, 540 d, corresponding to the phoneme,
    Figure US20090024183A1-20090122-P00001
    524 a, is shown in waveform portion 542 d, and that of the phoneme, u, 586 a, is shown in waveform portion 544 d. The spectrogram 520 d shows a simple frequency distribution in a narrow range. All frequencies 531 d, 532 d, 533 d, 536 d and 537 d are below 1000 Hz. Power at frequencies 536 d and 537 d representing the phoneme, u, 586 a, is pulsed at a frequency of approximately 12 Hz.
  • FIG. 6, diagram 600, provides an overview of how one embodiment transforms speech 609 (exemplified by the waveform illustrated in FIG. 5A, plots 500 a) from a person speaking 608 in simple acoustic symbols 605 (exemplified in the waveform, 540 d, illustrated in FIG. 5D, plots 500 d) for a user 604 by use of a hearing aid 620. The components of the hearing aid 620 are described below.
  • The hearing aid 620 includes a microphone 613 to transform speech sound 609 into electronic analog signals which are then digitized by an analog to digital converter 622. The embodiment illustrated here provides a user interface 619 that allows the selection of one of two operating modes depending upon whether or not speech recognition is of primary interest to the user, 604, in any given setting. Other embodiments need not provide this option.
  • When speech recognition is of primary interest to the user 604, the value at decision state 624 will be true. A speech recognition process 630 transforms digitized speech sounds into digital symbols representing phonemes of the speech 609 produced by the person speaking 608. Characters representing phonemes are then exchanged for digital sound representations by a transformation process 650. The transformation process of transformer 650 can be performed by software, hardware or by combinations of software and hardware.
  • The transformation process 650 comprises a correspondence from a set of phonemes to a set of sound representations held in a database or other data structure 652 and a way 654 of generating sound representations corresponding to phonemes from the speech recognizer 630. The sounds representations held in the database 652 may be wav files, mp3 files, aac files, aiff files, MIDI files, characters representing sounds, characters representing sound qualities, and the like.
  • The sound files are then converted to analog signals by a digital to analog process 626 amplified by an amplification process 628 and converted into audible sounds by a speaker 617.
  • When speech recognition is not of primary interest to the user 604, the value at decision state 624 will be false. The device will function as a digital hearing aid with conventional speech/sound processing functions 615, digital to analog signal conversion 626, amplification 628, and sound generation 617.
  • Although certain embodiments do not relate to the field of speech recognition technology, some embodiments utilize speech recognition. A number of strategies and techniques for building devices capable of recognizing and translating human speech into text are known to those skilled in such arts. For reference and background, a generic diagram of the inner workings of the speech recognizer, 630, as might be employed by some embodiments is provided in FIG. 6.
  • Within the speech recognizer 630, the digitized acoustic signal may be processed by a digital filter 632 in order to reduce the complexity of the data. Next, a segmentation process 634 parses the data into overlapping temporal intervals called frames. Feature extraction 636 involves computing a spectral representation (somewhat like a spectrogram) of the incoming speech data, followed by identification of acoustically relevant parameters such as energy, spectral features, and pitch information. A decoder 638 can be a search algorithm that may use phone models 644, lexicons 647, and grammatical rules 648, for computing a match between a spoken utterance 609 and a corresponding word string. While phonemes are the smallest phonetic units of speech, more fundamental units, phones, are the basic sounds of speech. Unlike phonemes, phones vary widely from individual to individual, depending on gender, age, accent, etc., and even over time for a single individual depending on sentence structure, word structure, mood, social context, etc. Therefore, phone models 644 may use a database 642, comprising tens of thousands of samples of speech from different individuals. A lexicon 647 contains the phonetic spellings for the words that are expected to be observed by the speech recognizer 630. The lexicon 647 serves as a reference for converting the phone sequences determined by the search algorithm into words. The grammar network or rules 648 defines the recognition task in terms of legitimate word combinations at the level of phrases and sentences. Some speech recognizers employ more sophisticated language models (not shown) that predict the most likely continuation of an utterance on the basis of statistical information about the frequency in which word sequences occur on average in the language. The lexicon 647 and grammar network 648 use a task database 646 comprising words and their various pronunciations, common phrases, grammar, and usage.
  • Referring again to the transformation process 650, because different users 604 may have different hearing requirements and abilities, the phonic symbol database 652 can be created and customized in consideration of each individual user 604. In some embodiments, a computer 660 can be used to aid in the creation of user specific phonic symbol databases, which are then downloaded to the database 652 of the hearing aid 620. The computer 660 comprises software allowing the input of data (e.g., audiogram) 664 from a user's hearing tests, a user interface 662, and a process or mapper 670 for creating a map (for database 652) to transform symbols representing phonemes into sets of sounds. In one embodiment, the mapper 670 can be performed by hardware circuits.
  • For some embodiments, each unique phoneme maps to a unique acoustic symbol. Each acoustic symbol comprises a unique set of sounds, each sound being audible the user, and each acoustic symbol, or sound set, having a distinctive perceived sound. The function of the Assignment Of Sound Sets to Phonemes process 670 in FIG. 6 is to build such a map. Process 670, further described in conjunction with FIG. 7, outlines one method for constructing the map. This and other methods can be performed manually or in an automated fashion using a computer or other computational device such as a table.
  • Acoustic symbols or sound sets may comprise one or more sounds. Sounds may differ in a number of qualities including but not limited to frequency, intensity, duration, overtones (harmonics and partials), attack, decay, sustain, release, tremolo, and vibrato. Although any or all of these differences can be employed, the example process 670 shown in FIG. 7 places a primary emphasis on variations in frequency. Therefore, the example process 670 provides acoustic symbols (sound sets) that are unique with respect to the sound frequencies they comprise. For simplicity, this example will employ only combinations of pure tones (no overtones). Sounds having harmonic content could be employed in a similar fashion.
  • Referring to FIG. 7, following the start state 705 of process 670, state 710 calls for a value, i, the input intensity limit. The input intensity limit, i, is an intensity or power density level, above which the user should be able to perceive each and every sound present in the set of acoustic symbols. As the value for i is increased, the range of available sounds to construct acoustic symbols will increase.
  • Based upon data 716 from the user's hearing tests, state 715 determines a range of sound frequencies, [f1, fh], such that each sound frequency in the range [f1, fh], is perceptible to the user at power densities at or below i.
  • Human hearing is receptive to sound frequency changes in an approximately logarithmic fashion. Therefore, for some embodiments, it may be desirable to establish rules constraining the choices of sound frequencies used to construct phonic symbols. An example of such a rule could be that the set of allowed sound frequencies must not contain any two frequencies f1 and f2 such that |(f2−f1)/(f2+f1)|≦j, where j is a constant between 0.02 and 0.1. To illustrate, if [fl, fh]=[1000 Hz, 2500 Hz] and j=0.038, there would be 13 allowed frequencies. The closest any two frequencies could be at the low frequency end of the range would be 79 Hz, and the closest any two frequencies could be at the high frequency end of the range would be 183 Hz. More sophisticated rules can be used to factor in non-logarithmic and other components of the human hearing response to sound frequency.
  • Mathematical functions can be used to generate lists of allowed frequencies. For example, an equation, f(z), where f(z)/f(z+1)=f(z+1)/f(z+2) for all integers, z, (zEZ) would generate a set of values evenly separated on a log scale. An example of such an equation is f(z)=(x·ŷ(z/v))/sec, where v, x, and y are real numbers greater than one. For illustration purposes, if x=2, y=10, v=2, and zεZ, the equation, f(z)=(x·ŷ(z/v))/sec, would generate the set { . . . 63 Hz, 200 Hz, 632 Hz, 2 kHz, . . . }. It may be noted that for f(z)=(x·ŷ(z/v))/sec, values for y that are powers of 2 such as 2, 4, 8, etc. and values for v such as 3, 4, 6, 12, and 24 would yield frequencies separated by intervals approximating naturally occurring overtones and partials. Such sets of frequencies may give rise to sets of acoustic symbols more pleasing and perhaps discernable to the human ear.
  • Proceeding to state 720, process 670 calls for values of v, x, and y. Using the values of v, x, and y from state 720 and integer values for z, state 725 finds all sound frequencies that satisfy the equation and are greater than, fl but less than fh. Stated symbolically, state 725 returns the set, F={f(z∩[fl, fh)]: f(z)=(x·ŷ(z/v))/sec, zεZ}. This equation is provided only as an example.
  • A database or data structure 731 comprises a list of phonemes that the user is likely to require. A person who uses only the English language might need approximately 39 phonemes as listed in FIG. 2, table 200. Someone who uses only the Hawaiian language would require approximately 13 phonemes while a person using two European and two Asian languages might require approximately 200 phonemes.
  • In this example, each symbol comprises a unique set of sound frequencies. Therefore, the composition of a given symbol either contains a particular sound frequency, or it doesn't. Therefore the maximum number of acoustic symbols that can be constructed from n frequencies is 2n−1. For example, three different frequencies could yield up to seven unique symbols, while eleven frequencies could yield up to 2047 unique symbols. Conversely, the minimum number, m, of frequencies, f, needed to create a unique symbol for each phoneme, p, of a set of phonemes, P, is at least 2̂log2 |P|, where |P| is the number of phonemes, p, in the set of phonemes, P.
  • State 730 determines the value of |P| from the user's phoneme database 731, and returns a solution, m, for the above equation. Proceeding to a decision state 735 process 670 determines if the number of solutions, IFS, from state 725 is sufficient to create a unique acoustic symbol, or set of frequencies, for each element, p, in the user's phoneme set, P, from database 731. A value of false at decision state 735 returns the process 670 to the state 710. From there, the value for i may be increased thereby expanding the interval [fl, fh], determined by state 715. Additionally, or alternatively, values for v, x, and y may be changed at state 720, to increase the number of solutions to the equation, f(z)=(x·ŷ(z/v))/sec, that are within the range, [fl, fh], determined by state 715. Decreasing the value for y, and/or increasing the value for v will tend to increase the number of solutions to f(z)=(x·ŷ(z/v))/see within [fl, fh]. Adjusting the value for x in either direction may or not alter the number of solutions to f(z)=(x·ŷ(z/v))/sec within [fl, fh]. When a change in the value of x does result in a change to the number of solutions to f(z)(x·ŷ(z/v))/sec within [fl, fh], that number will increase or decrease by one solution (one allowed frequency).
  • A value of true at the decision state, 735, moves process 670 to state 740. State 740 is the first of two states 740 and 745 that assigns acoustic symbols, (sets of sounds) to phonemes.
  • In the first state 740, process 670 assigns to each phoneme a set of one or more allowed sound frequencies. More precisely, each phoneme, p, of the set of phonemes, P, is assigned a set, Q, of frequencies, f, each frequency, f, being an element of the set of allowed frequencies, F. Stated symbolically, state 740 returns a set, M={(p, Q): pεP, Q
    Figure US20090024183A1-20090122-P00003
    F}.
  • In the second state 745, process 670 assigns additional qualities to be associated with each frequency element, f, of each frequency set, Q, of each element (p, Q) of the set, M. Seven variables are assigned in this example. In other embodiments, a different number of variables can be assigned.
    • b “begin” Sound at frequency, f, will start being produced b milliseconds after the end of the preceding acoustic symbol. If there is no preceding acoustic symbol, zero will be used in place of b. The variable, b, may have a value that is positive, negative, or zero.
    • e “end” Sound at frequency, f, will stop being produced e milliseconds after the end of the preceding acoustic symbol. If there is no preceding acoustic symbol, f, sound will stop being produced e milliseconds after it starts being produced.
    • w “power” Power at sound frequency, f, will be w decibels (dB) upon its initiation. 0 db≡10−12 watts/m2
    • d “Δw” Power at sound frequency, f, will smoothly transition toward d·w decibels (dB) and will be d·w at the end of its duration. The variable, d, may have a value that is positive, negative, or zero.
    • h “Δf” Cycles per second at frequency, f, will smoothly transition from f Hertz (Hz) at its initiation to d·f Hz at the end of its duration. The variable, h, may have any value that is greater than zero; however values between 0.1 and 10 are most practical.
    • r “pulse rate” Power at sound frequency, f, will be reduced by at least 20 dB and restored to wdB r times each second.
    • c “duty cycle” The duty cycle variable, c, is the time within each pulse cycle that the power is equal to w divided by the time that the power is equal to or less than w-20 dB. A c value of 50% would produce a square wave.
  • At the conclusion of state 745, a data structure 752 is constructed mapping each phoneme to a set of sounds, each sound having eight parameters, f, b, e, w, d, h, r, c as described above. The completion of the data structure 752 allows progression to the end state 755.
  • In the above example, the various elements of the acoustic symbols were assembled about each phoneme. The order of these steps is not critical to the practice of certain embodiments described herein, and acoustic symbols may be predefined and later assigned to phonemes. The parameters, f, b, e, w, d, h, r, c are given only as examples.
  • To illustrate how the process 670 can operate, providing an intensity limit, i, value of 30 dB (10−9 watts/m2), and an audiogram 716 similar to that shown in FIG. 3D, plots 300 d, would result in state 715, returning an interval of [80 Hz, 800 Hz]. If the values to state 720 are v=12, x=200, and y=2, state 740 would return the set of allowed frequencies, F, {84, 89, 94, 100, 106, 112, 119, 126, 133, 141, 150, 159, 168, 178, 189, 200, 212, 224, 238, 252, 267, 283, 300, 317, 336, 356, 378, 400, 424, 449, 476, 504, 534, 566, 599, 635, 673, 713, 755, 800}. If the user's phoneme set, P, comprises a minimal set of phonemes needed for American English, the number of elements, |P|, in the set, P, will be 39. State 730 would return the value, 2̂log239, which is 5.3. The number of elements, |F|, in the set, F, is 40. Because 40≧5.3, the Boolean value at decision state 735 is true, and process 670 would proceed to state 740. To simplify this example, the choice of frequencies will be further restricted to just nine of the 40 allowed frequencies, {300, 317, 336, 400, 424, 449, 504, 534, 566}.
  • In one embodiment, the symbols are unique combinations of one or more sound frequencies.
  • In another embodiment, the symbols are unique frequency intervals. A frequency interval is the absolute value log difference of two frequencies. Constructing acoustic symbols as frequency intervals has advantages as most people, including trained musicians, lack the ability to recognize individual sound frequencies but are able to recognize intervals.
  • In another embodiment, the combination of frequencies and their temporal modifications are unique for each symbol.
  • In another embodiment, the combination of frequency intervals and the temporal modifications for each frequency are unique for each symbol.
  • In another embodiment, the combination of frequencies and their timbre, which may comprise overtones (harmonics and partials), tremolo, and vibrato, is unique for each symbol.
  • In another embodiment, the combination of frequency intervals and the timbre of each frequency is unique for each symbol.
  • In another embodiment, phonemes are placed into groups of like phonemes (e.g., plosive, fricative, diphthong, monophthong, etc.). Such a placement of phonemes into groups of like phonemes is known to linguists and others skilled in such arts. All phonemes are then assigned a sound frequency (the root), all phonemes being given the same root. Each member of each group of like phonemes is given a second frequency unique to that group. Once all phonemes have been assigned a second sound frequency, the most frequently used phoneme of each group is not assigned additional sound frequencies. Therefore, the most frequently used phonemes are represented by single frequency intervals. One or more additional sound frequencies are then assigned to the remaining phonemes to create a unique combination of frequencies for each phoneme.
  • In another embodiment, phonemes are placed into groups of like phonemes (e.g., plosive, fricative, diphthong, monophthong, etc.). Such a placement of phonemes into groups of like phonemes is known to linguists and others skilled in such arts, All phonemes are then assigned a sound frequency (the root), all phonemes being given the same root. Each member of each group of like phonemes is given a second frequency unique to that group. Once all phonemes have been assigned a second sound frequency, the most frequently used phoneme of each group is not assigned additional sound frequencies. Therefore, the most frequently used phonemes are represented by single frequency intervals. One or more additional sound frequencies are then assigned to the remaining phonemes to create a unique combination of frequencies for each phoneme. Next, every frequency of every phoneme in one group of like phonemes is shifted up or down by multiplying every frequency of every phoneme in one group of like phonemes by a constant. Additional groups of like phonemes may or may not be adjusted in a similar fashion using the same constant or a different constant.
  • In another embodiment, the acoustic symbol's frequencies, intervals, temporal modifiers, and/or timbre, may be selected to resemble features of the phoneme from which it was derived. For example, the fricative, s, might be assigned a higher frequency or frequencies, than the vowel, 3; plosives might all have the modifier, g=2; voiced phonemes might have the modifier, b=2; and unvoiced phonemes might have the modifier, b=1. Frequencies, intervals, temporal modifiers, timbre, and other qualities may be applied methodically, arbitrarily, or randomly.
  • FIG. 8 illustrates an exemplary example data structure 752 as might be returned by state 745, FIG. 7. The data structure 752 contains examples of the use of sound qualities listed above. Not all of the sound qualities in the example are required to practice certain embodiments described herein, and other qualities not listed here may be employed.
  • In this example, the data structure comprises ordered sets, each ordered set matching a phoneme, p, to one or more sounds. Each sound is defined by an ordered set comprising values for the variables f, b, e, w, d, h, r, c. To facilitate cross-referencing, the last two digits of each callout or reference label in FIG. 8 are the same as the last two digits of corresponding phonemes in FIGS. 1, 2, 5, 9, 11, 15, and 16. The time scale as well as nature of the symbols does however vary from figure to figure.
  • Referring to FIGS. 8 and 9, the word jousting will used in the next example. The IPA representation of the word, “jousting”, is
    Figure US20090024183A1-20090122-P00004
    and comprises seven phonetic symbols,
    Figure US20090024183A1-20090122-P00005
    a,
    Figure US20090024183A1-20090122-P00006
    , s, t, i, and
    Figure US20090024183A1-20090122-P00007
    . However, the monophthong, “a”, 996 (FIG. 9), is not used as a sole vowel sound in American English words or syllables, but exists only as part of the diphthongs, ai, and a
    Figure US20090024183A1-20090122-P00008
    . Therefore, in English
    Figure US20090024183A1-20090122-P00004
    920, actually comprises just six phonemes,
    Figure US20090024183A1-20090122-P00009
    , s, t, i, and
    Figure US20090024183A1-20090122-P00007
    .
  • When state 654, FIG. 6, searches the data structure 652 or 752, FIG. 8, it finds the ordered sets;
  • (
    Figure US20090024183A1-20090122-P00005
    (449,20,90,50,0,1,100,100),(504,20,90,50,0,1,90,50))
  • (a
    Figure US20090024183A1-20090122-P00010
    ,(317,0,150,50,0,1,84,67),(400,0,150,50,0,0.75,84,67))
  • (s,(534,0,100,50,0,1,100,100),(566,0,100,50,0,1,100,100))
  • (t,(317,20,90,50,−30,1,100,100),(566,20,90,50,−30,1,100,100))
  • (i,(336,0,100,50,0,1,100,67),(566,0,100,50,0,1,100,67))
  • (
    Figure US20090024183A1-20090122-P00011
    ,(336,0,100,50,0,1,100,100),(449,0,100,50,0,1,100,100),(534,0,100,60,0,1,126,80))
  • (FIG. 8 callouts 834, 894, 844, 804, 880, 867, respectively).
  • and returns the sets of sound definitions;
  • [(449,20,90,50,0,1,100,100),(504,20,90,50,0,1,90,50)]
  • [(317,0,150,50,0,1,84,67),(400,0,150,50,0,0.75,84,67)]
  • [(534,0,100,50,0,1,100,100),(566,0,100,50,0,1,100,100)]
  • [(317,20,90,50,−30,1,100,100),(566,20,90,50,−30,1,100,100)]
  • [(336,0,100,50,0,1,100,67),(566,0,100,50,0,1,100,67)]
  • [(336,0,100,50,0,1,100,100),(449,0,100,50,0,1,100,100),(534,0,100,60,0,1,126,80)]
  • which are converted into 630 milliseconds of analog signal by the digital to analog state 626, amplified by the analog amplifier 628, and converted into sound 605 by the speaker 617.
  • FIG. 9 provides a schematic representation 900 of a spectrogram 999 of the sound 605 emitted by the speaker 617 (FIG. 6), after transformation 650 of the word “jousting” via the assignment state 654, drawing upon the data structure 652 or 752 (FIG. 8). To show detail, the vertical axis spans 300 Hz to 600 Hz rather than 0 Hz to 5000 Hz as in FIGS. 1 and 5. Also, power is depicted through line thickness rather than color intensity, thicker lines representing greater power.
  • As stated above, the IPA representation of the English word, “jousting”, 910, is
    Figure US20090024183A1-20090122-P00004
    920, and comprises seven phonetic symbols,
    Figure US20090024183A1-20090122-P00005
    934, a, 996,
    Figure US20090024183A1-20090122-P00006
    , 997, s, 944, t, 904, i, 980, and
    Figure US20090024183A1-20090122-P00011
    , 967. In American English the phonemes are,
    Figure US20090024183A1-20090122-P00005
    934, a
    Figure US20090024183A1-20090122-P00010
    , 994, s, 944, t, 904, i, 980, and
    Figure US20090024183A1-20090122-P00011
    , 967.
  • The first phoneme,
    Figure US20090024183A1-20090122-P00005
    934, is represented by an acoustic symbol defined by an ordered set of two ordered sets of eight elements, each defining a sound components of the acoustic symbol, [(449,20,90,50,0,1,100,100),(504,20,90,50,0,1,90,50)]. This definition calls for two sounds 925 and 923. The first sound 925 defined by the ordered set (449,20,90,50,0,1,100,100), has a constant frequency, h=0, of 449 Hz, f=449, a constant power, d=0, of 50 dB, w=50, starting after a 20 ms, b=20, delay 902 and 922 from the end of the previous acoustic symbol, and ending 90 ms, e=90, after the end of the previous acoustic symbol, and not pulsed, c=100. The value for r, pulse rate, is 100, but may be any positive value in this instance because a 100% duty cycle, c=100, obviates pulse rate. Read in the same manner, the second ordered set, (504,20,90,50,0,1,90,50), defines a sound 923 having a constant frequency of 504 Hz, a constant power of 50 dB, starting 20 ms after the end of the previous acoustic symbol, and ending 90 ms after the end of the previous acoustic symbol, and pulsed, at a frequency of 90 Hz r=90, and a 50% duty cycle, c=50.
  • The next ordered set of ordered sets, [(317,0,150,50,0,1,84,67), (400,0,150,50,0,0.75,84,67)] defines an acoustic symbol comprising two sounds 929 and 928 representing
    Figure US20090024183A1-20090122-P00010
    994. The first sound 929, defined by the ordered set, (317,0,150,50,0,1,84,67), has a constant frequency of 317 Hz, a constant power of 50 dB, starting immediately, b=0, after the end of the previous acoustic symbol 923 and 925, and ending 150 ms after the end of the previous acoustic symbol 923 and 925, and pulsed, at a frequency of 84 Hz r=84, and a 67% duty cycle, c=67. The second ordered set, (400,0,150,50,0,0.75,84,67), defines a sound 928 having an initial frequency of 400 Hz, f=400, a final frequency of 300 Hz, h=0.75, and 400·0.75=300, a constant power of 50 dB, starting 0 ms after the end of the previous acoustic symbol, ending 150 ms after the end of the previous acoustic symbol, and pulsed, at a frequency of 84 Hz r=84, with a 67% duty cycle, c=67.
  • The next phoneme, s, 944, is represented by two un-pulsed sounds, one at 534 Hz, 927, and the other at 566 Hz, 926, each having a constant power of 50 dB, lasting 100 ms.
  • The phoneme, t, 904, is represented by two un-pulsed sounds, 933 and 932, starting 20 ms, 908 and 931, after the acoustic symbol representing the phoneme, s. Initial power for each is 50 dB, w=50, and final power for each to 20 dB, d=−30, 50−30=20.
  • The phoneme, i, 980, is represented by two pulsed sounds, 937 and 936.
  • The final acoustic symbol, defined by the ordered set of ordered sets, [(336,0,100,50,0,1,100,100),(449,0,100,50,0,1,100,100),(534,0,100,60,0,1,126,80)], comprises three sounds. One sound 948 is pulsed, and two sounds 947 and 946 are not. Also, the sound at 534 Hz, 948, is 10 dB louder that than the other two sounds 947 and 946.
  • FIG. 10 a illustrates a configuration 1000 a of a cochlear implant hearing aid device and FIG. 10 b shows a schematic representation 1000 b of this device. A microphone 1013 a, 1013 b transforms speech and other sounds into electrical signals that are conveyed to a sound and speech processor 1020 a, 1020 b via an electrical cable 1023 a, 1023 b. The sound and speech processor unit 1020 a, 1020 b also houses a power supply for external components 1013 a, 1013 b, 1031 a, 1031 b and implanted components 1045 a, 1045 b of the cochlear implant hearing aid device. The sound and speech processor 1020 a, 1020 b can contain bandpass filters to divide the acoustic waveforms into channels and convert the sounds into electrical signals. These signals go back through a cable 1024 a, 1024 b to a transmitter 1031 a, 1031 b attached to the head by a magnet, not shown, within a surgically implanted receiver 1045 a, 1045 b.
  • The transmitter 1031 a, 1031 b sends the signals and power from the sound and speech processing unit 1020 a, 1020 b via a combined signal and power transmission 1033 b (and similarly for 1000 a) across the skin 1036 a, 1036 b to the implanted receiver 1045 a, 1045 b. Using the power from the combined signal and power transmission 1033 b, the receiver 1045 a, 1045 b decodes the signal component of the transmission 1033 b and sends corresponding electrical waveforms through a cable 1049 a, 1049 b to an electrode array 1088 a, 1088 b surgically placed in the user's cochlea 1082 a, 1082 b. The electrical waveforms stimulate local nerve tissue creating the perception of sound. Individual electrodes, not shown, are positioned at different locations along the array 1088 a, 1088 b, allowing the device to deliver different stimuli representing sounds having different pitches, and importantly, having the sensation of different pitch to the user.
  • The effectiveness of a cochlear prosthesis depends to a large extent on the stimulation algorithm used to generate the waveforms sent to the individual electrodes of the electrode array 1088 a, 1088 b. Stimulation algorithms are generally based on two approaches. The first places an emphasis on temporal aspects of speech and involves transforming the speech signal into different signals that are transmitted directly to the concerned regions of the cochlea. The second places an emphasis on spectral speech qualities and involves extracting features, such as formants, and formatting them according to the cochlea's tonotopy (the spatial arrangement of where sound is perceived).
  • Certain embodiments apply to novel stimulation algorithms for a cochlear prosthesis. These algorithms substitute some or all temporal and spectral features of natural speech for a small number (such as in a range of 10 to 500) of symbols, comprising the waveforms to be sent to the electrode array, 1088 a, 1088 b.
  • In FIG. 11, the English word, “chew” 1105 a is used to compare and contrast certain embodiments described herein to conventional stimulation algorithms. In FIG. 11A, plots 1100 a provide a spectrogram 1120 a and waveform 1140 a for the word “chew” 1105 a.
  • For a person with normal hearing, the cochlea provides the brain with detailed information about the speech signal shown by waveform 1140 a. Within the cochlea the original sound waveform 1140 a is lost in the process of being transformed into nerve impulses. These nerve impulses actually contain little information describing the actual waveform 1140 a, but instead, convey detailed information about power as a function of time and frequency. Therefore, a spectrogram such as spectrogram 1120 a, but not a waveform, is a convenient representation of the information conveyed through the auditory nerve to the auditory cortex of the brain.
  • A cochlear prosthesis (see FIG. 10) can restore a level of hearing to a person whose cochlea is not functional, but still has a functional auditory cortex and auditory nerve innervating the cochlea. The cochlear prosthesis electrically stimulates nervous tissue in the cochlea, resulting in nerve impulses traveling along the auditory nerve to the auditory cortex of the brain. Although hearing can often be successfully restored to deafened individuals, speech recognition often remains challenging.
  • Limitations in speech perception arise from limitations of the implanted portion of the prosthesis. Normally, the cochlea divides the speech signal into several thousand overlapping frequency bands that the auditory cortex uses to extract speech information. Prior cochlear implants are able to provide a speech signal divided into just a dozen or so frequency bands. As a result, much of the fine spectral detail is lost as many frequency bands are blended into a few frequency bands. The auditory cortex is thereby deprived of much of the speech information it normally uses to identify features of spoken language.
  • In FIG. 11B, plots 1100 b schematically illustrate the spectral resolution and detail of a speech signal shown by a spectrogram 1120 b generated by a conventional cochlear prosthesis. Gross temporal and spectral features are similar to that of natural speech shown by the spectrogram 1120 a. However, spectrally important portions 1121 b, 1122 b, 1123 b of the phonemes
    Figure US20090024183A1-20090122-P00001
    1124 a and u, 1186 a lack the fine detail seen in the natural speech example shown at portions 1121 a, 1122 a, 1123 a.
  • To ameliorate this problem, stimulation algorithms are used to help convey speech information through the limited number of frequency bands or channels. Stimulation algorithms are generally based on two approaches. The first places an emphasis on temporal aspects of speech and involves transforming the speech signal into different signals that are transmitted directly to the concerned regions of the cochlea. The second places an emphasis on spectral speech qualities and involves extracting features, such as formants, and formatting them according to the cochlea's tonotopy (the spatial arrangement of where sound is perceived). Current stimulation algorithms do help, but are unable to provide most users with speech recognition comparable to that of those with normal hearing.
  • Certain embodiments apply to novel stimulation algorithms for cochlear prostheses. These algorithms substitute some or all temporal and spectral features of natural speech for a small number (approximately 20 to 100) of symbols, comprising the waveforms to be sent to the electrode array 1088 a, 1088 b as shown in FIG. 10. The symbols themselves may represent phonemes, sets of phonemes, or types of phonemes.
  • In FIG. 11C, plots 1100C schematically illustrate a speech signal shown by spectrogram 1120 c as might result from recoding for the word “chew” 1105 a using a phoneme substitution method of certain embodiments described herein. The symbols may, but do not need to, preserve some spectral and temporal features of the natural speech signal shown by the spectrogram 1120 a. The conventional stimulation algorithm shown by plots 1100 b approximates spectral features 1121 a of the phoneme,
    Figure US20090024183A1-20090122-P00001
    1124 a, and spectral features 1122 a, 1123 a of the phoneme, u, 1186 a in corresponding areas 1121 b, 1122 b, 1123 b. In contrast, a speech signal generated using a stimulation algorithm employing phoneme substitution does not approximate spectral features 1121 a, of the phoneme,
    Figure US20090024183A1-20090122-P00001
    1124 a, and spectral features 1122 a, 1123 a of the phoneme, u, 1186 a in its corresponding areas 1172 c, 1174 c, 1176 c, 1178 c.
  • An advantage of certain embodiments described herein is that, in principle, the speech signal will not vary from speaker to speaker and location to location. Another advantage is that the speech signal is no longer more complicated than the language based information it contains. Both features result in speech signals that are easier to learn and recognize than those generated using current state-of-the-art stimulation algorithms.
  • FIG. 12 provides an overview diagram 1200 of how one embodiment transforms speech 1209 (exemplified by the waveform illustrated in FIG. 1A by spectrogram 1100 a) from a person speaking 1208 into simple symbols (exemplified in the speech signal illustrated in FIG. 11C by spectrogram 1120 c) that are delivered to an electrode array of a user's cochlear implant 1288. The transformation is performed by external components of a cochlear implant system such as sound and speech processing unit 1220.
  • The sound and speech processing unit or processor 1220 includes a microphone 1213 to transform speech sounds 1209 into electronic analog signals that are then digitized by an analog to digital converter 1222. The embodiment illustrated here provides a user interface 1219 that allows the selection of one of at least of two operating modes depending upon whether or not speech recognition is of primary interest to the user, in any given setting. Other embodiments need not provide this option.
  • When speech recognition is of primary interest to the user, the value at decision state 1224 will be true. A speech recognition process 1230 transforms digitized speech sounds into digital characters representing phonemes of the speech 1209 produced by the person speaking 1208. Characters representing phonemes are then exchanged for digital representations of stimulation patterns by a transformation process 1250. The transformation process or transformer 1250 can be performed by software, by hardware, or by combinations of software and hardware.
  • The transformation process 1250 comprises a correspondence from a set of phonemes to stimulation patterns held in a database or other data structure 1252 and a process 1254 for generating a sequence of representations of stimulation patterns corresponding to a sequence of phonemes from the speech recognizer 1230.
  • The digital representations are sent to a data and power transmitter 1231 and 1232 attached to the user's head by a magnet, not shown, within a surgically implanted receiver 1245.
  • The transmitter 1231 and 1232 sends the signals and power from the sound and speech processing unit 1220 via a combined signal and power transmission 1233 across the skin 1236 to the implanted receiver 1245. Using the power from the combined signal and power transmission 1233, the receiver 1245 decodes the signal component of the transmission 1233 and sends corresponding electrical waveforms through a cable 1249 to the electrode array 1288 surgically placed in the user's cochlea 1282.
  • When speech recognition is not of primary interest to the user, the value at decision state 1224 will be false, and the device will function using other stimulation algorithms 1215.
  • Although certain embodiments do not relate to the field of speech recognition technology, some embodiments utilize speech recognition. A number of strategies and techniques for building devices capable of recognizing and translating human speech into text are known to those skilled in such arts. For reference, FIG. 6 provides a generic diagram 600 of the inner workings of a speech recognizer 630 as might be employed by some embodiments.
  • Because different users may have different requirements and abilities, the database 1252 of representations of stimulation patterns can be created and customized in consideration of each individual user. In some embodiments, a computer 1260 can be used to aid in the creation of user databases, which are then downloaded to the database memory 1252 of the sound and speech processing unit 1220.
  • The computer 1260 comprises software allowing the input of data 1264 from a user's hearing tests, a user interface 1262 and a process or mapper 1270 for creating a map to be stored in the database 1252 to transform symbols representing phonemes into digital representations of stimulation patterns.
  • The process 1270 for creating the map to transform symbols representing phonemes into digital representations of stimulation patterns is similar to the process 670 shown in FIG. 6, and defined in FIG. 7. The process 1270 can be considered a modified version of process 670 in which the interval [fl, fh], is replaced with a set, G, of functional electrodes, {gn, gn+1, gn+2, . . . } of the electrode array 1288. The set, F, then becomes the subset of G, its elements representing electrodes rather than frequencies.
  • FIG. 13 is a diagram 1300 showing an example structure of potential electrode assignments 1352, such as stored in database 1252, for one embodiment in which the user wishes to comprehend American English speech. The upper portion of the figure shows the middle and inner ear 1360 including the cochlea 1365. Within the cochlea 1365 is an implanted electrode array 1320 of a cochlear prosthesis.
  • For illustration purposes, it is assumed that the electrode array 1320 comprises 16 electrodes, nine of which, 1303, 1304, 1305, 1306, 1307, 1308, 1309, 1310, 1311, are functional and able to produce unique sound sensations for the user. In this example, 39 American English phonemes are mapped using the exemplary data structure 1352 (stored in 1252, FIG. 12) to stimulation patterns (symbols) comprising electrical waveforms being sent to different combinations of one, two, or three electrodes.
  • For simplicity, other qualities used in the preceding examples of hearing aids are not contained in the structure 1352. However, analogs of each are envisioned for embodiments relating to hearing prosthesis including cochlear implants. These analogs and others include, but are not limited to, pauses between some phonemes, duration, intensity, low frequency pulsations or higher frequency signals, stimulus rates, and shifts in the values of such parameters as a function of time, or context.
  • The symbols themselves may represent phonemes, sets of phonemes, portions of phonemes, or types of phonemes.
  • In one embodiment, the symbols are unique combinations of stimuli at one or more electrodes. In another embodiment, the symbols are unique physical spacings of stimuli. In another embodiment, the combination of electrodes used and other qualities including, but are not limited to, pauses between some phonemes, duration, intensity, low frequency pulsations or higher frequency signals, stimulus rates, and shifts in the values of such parameters as a function of time, are unique, for each symbol.
  • In another embodiment, phonemes are placed into groups of like phonemes (e.g., plosive, fricative, diphthong, monophthong, etc.). Such a placement of phonemes into groups of like phonemes is known to linguists and others skilled in such arts. All phonemes are then assigned a common electrode or channel (the root), all phonemes being given the same root. Each member of each group of like phonemes is assigned a second channel unique to that group. Once all phonemes have been assigned a second channel, the most frequently used phoneme of each group is not assigned additional channels. Therefore, the most frequently used phonemes are represented by unique combinations of two channels. One or more additional channels are then assigned to the remaining phonemes to create a unique combination of channels for each phoneme.
  • In another embodiment, phonemes are placed into groups of like phonemes (e.g., plosive, fricative, diphthong, monophthong, etc.). Such a placement of phonemes into groups of like phonemes is known to linguists and others skilled in such arts. All phonemes are then assigned a common electrode or channel (the root), all phonemes being given the same root. Each member of each group of like phonemes is assigned a second channel unique to that group. Once all phonemes have been assigned a second channel, the most frequently used phoneme of each group is not assigned additional channels. Therefore, the most frequently used phonemes are represented by unique combinations of two channels. One or more additional channels are then assigned to the remaining phonemes to create a unique combination of channels for each phoneme. Next, every channel assignment for every phoneme in one group of like phonemes is shifted up or down along the electrode array. Additional groups of like phonemes may or may not be adjusted in a similar fashion.
  • The concept of phoneme substitution can be applied to sensory tissues other than the cochlea. These can include but are not limited to pressure, pain, stretch, temperature, photo and olfactory receptor tissue as well as innervating nerves tissue and corresponding central nervous system tissue.
  • For example, phonic symbols may be delivered to sensory tissue of the skin, by a number of means, including electrical and mechanical means. FIGS. 14A and 14B provide schematic examples 1400 a and 1400 b of skin interfaces 1410 a and 1410 b of some embodiments.
  • FIG. 14A, example 1400 a, shows an interface 1410 a fitted about the hand and wrist of a person's left arm 1450 a for example. The interface 1410 a comprises six stimulators 1401 a, 1402 a, 1403 a, 1404 a, 1405 a, 1406 a positioned against the person's skin 1440 a. In this example, the stimulators have been placed as to assure that no two are close to being positioned over the same receptive field, the smallest area of skin capable of allowing the recognition of two different but similar stimuli. In one embodiment the stimulators 1405 a and 1406 b are located under the wrist of the user.
  • FIG. 14B, example 1400 b, shows an interface 1410 b fitted about the wrist of a person's left arm 1450 b for example. The interface 1410 b comprises six stimulators 1401 b, 1402 b, 1403 b, 1404 b, 1405 b, 1406 b positioned against the person's skin 1440 b, some close enough to each other to be on the outer threshold of occupying same receptive field. In one embodiment the stimulators 1405 b and 1406 b are located under the wrist of the user.
  • Creating a correspondence mapping phonemes to sets of tactile stimuli, symbols, is not fundamentally different from mapping phonemes to acoustic symbols of hearing aid embodiments or electrical stimulation patterns of cochlear prosthesis embodiments. FIG. 15, table 1500, provides three examples for mapping English phonemes to tactile symbols suitable for use with the tactile interfaces 1410 a and 1410 b presented in FIG. 14. To better illustrate concepts not yet described, each of the three maps uses the same channel assignments, and each stimulator generates a motion of vibration perpendicular to the skin.
  • These maps were created using methods previously described but not illustrated. The first step for all three examples is to place phonemes into a group of like phonemes (e.g., plosive, fricative, diphthong, monophthong, etc.). These groups are known to linguists and others skilled in such arts.
  • For example 1, each group is then assigned a channel, for example plosive=1, nasal=2, fricative=3, approximant=4, monophthong=5, diphthong=6. Affricates, being both plosive and fricative like, are assigned both channels 3 and 4. No further channel assignments are made to the most frequently used member of each set, t, n, s,
    Figure US20090024183A1-20090122-P00012
    Figure US20090024183A1-20090122-P00013
    and
    Figure US20090024183A1-20090122-P00014
    These assignments can be made by linguists and others skilled in such arts. Additional channels are assigned to other phonemes creating a unique combination of channel assignments corresponding to each. An advantage in this approach is that training can begin with the use of only six symbols, each comprising a vibration at a single location on the skin.
  • In example 2, the channel assignments for each phoneme are the same as in example 1. However for each tactile symbol representing a phoneme, the channel common to all members of its group of related phonemes is vibrated at a different frequency than the other channels comprising that symbol. These stimulators are indicated by boxes in the column for example 2. The advantage in this approach is that phonemes that sound most alike will feel most alike, and thereby enhance the learning process, and reduce errors.
  • In example 3, even numbered stimulators vibrate at one frequency, and odd numbered stimulators vibrate at a different frequency. Odd numbered channels are highlighted with a box for better visualization of the figure. The advantage in this approach is that adjacent stimulators have a different feel, and therefore may be placed in closer proximity to one another, while maintaining the ability to create a sensation unique to each channel. A logical extension of this approach is to use only three stimulators, each having three states, off, on frequency 1, and on frequency 2.
  • For simplicity, other qualities used in the preceding examples of hearing aids and implants are not contained in the three data structures shown in FIG. 15. However, analogs of each quality are envisioned for embodiments relating to skin and interfaces. These analogs and others include, but are not limited to, pauses between some phonemes, duration, intensity, low frequency pulsations or higher frequency signals, stimulus rates, and shifts in the values of such parameters as a function of time, or context.
  • In one embodiment, the symbols are unique combinations of stimuli at one or more electrodes. In another embodiment, the symbols are unique physical spacings of stimuli. In another embodiment, the combination of electrodes used, and other qualities including, but are not limited to, pauses between some phonemes, duration, intensity, low frequency pulsations or higher frequency signals, stimulus rates, and shifts in the values of such parameters as a function of time, are unique for each symbol.
  • In another embodiment, phonemes are placed into groups of like phonemes (e.g., plosive, fricative, diphthong, monophthong, etc.). Such a placement of phonemes into groups of like phonemes is known to linguists and others skilled in such arts. All phonemes are then assigned a common electrode or channel (the root), all phonemes being given the same root. Each member of each group of like phonemes is assigned a second channel unique to that group. Once all phonemes have been assigned a second channel, the most frequently used phoneme of each group is not assigned additional channels. Therefore, the most frequently used phonemes are represented by unique combinations of two channels. One or more additional channels are then assigned to the remaining phonemes to create a unique combination of channels for each phoneme.
  • In another embodiment, phonemes are placed into groups of like phonemes (e.g., plosive, fricative, diphthong, monophthong, etc.). Such a placement of phonemes into groups of like phonemes is known to linguists and others skilled in such arts. All phonemes are then assigned a common electrode or channel (the root), all phonemes being given the same root. Each member of each group of like phonemes is assigned a second channel unique to that group. Once all phonemes have been assigned a second channel, the most frequently used phoneme of each group is not assigned additional channels. Therefore, the most frequently used phonemes are represented by unique combinations of two channels. One or more additional channels are then assigned to the remaining phonemes to create a unique combination of channels for each phoneme. Next, every channel assignment for every phoneme in one group of like phonemes is shifted up or down along the electrode array. Additional groups of like phonemes may or may not be adjusted in a similar fashion.
  • FIG. 16A, via plots 1600 a, shows the word “chew” 1605 a; its component phonemes,
    Figure US20090024183A1-20090122-P00001
    1624 a; and u, 1686 a; a waveform 1635 a obtained when “chew” is spoken; “chew” written in machine shorthand 1645 a; “chew” as it appears in acoustic symbols generated by the phoneme substitution method described herein 1655 a; “chew” as it might be encoded by phoneme substitution and then transmitted to electrodes in a cochlear implant 1665 a; “chew” as it might be transmitted to electrodes on a skin interface 1675 a; and “chew” as it might be perceived in the form of its component phonemes by the user. FIG. 16B, diagram 1600 b, illustrates embodiments as transmitters 1605 b, 1635 b, 1645 b and receivers 1655 b, 1665 b, 1675 b. A computer 1605 b is shown transmitting the typed word “chew” to a hearing aid 1655 b; cochlear implant 1665 b; or skin interface 1675 b. The waveform produced by a person speaking 1635 b is shown being transmitted to 1655 b, 1665 b, and 1675 b. The shorthand machine 1645 b is shown transmitting a signal to 1655 b, 1665 b, and 1675 b.
  • There are embodiments that do not require mapping of phonemes to unique symbols or sets of stimuli. Simply mapping each phoneme to a symbol or set of stimuli unique to it and similar phonemes may be helpful to hearing impaired individuals. For example, many people with hearing impairments have some proficiency in lip reading, or speech reading. Others maybe relatively proficient in vowel recognition, but have a difficult time with the recognition of consonants. The phonetic structure of the six words, two, do, sue, zoo, and new, is tu, du, su, zu, and nu, respectively. These five words differ appreciably only in their first phoneme, a consonant. However, all five words appear the same on a speaker's lips. Simply knowing which type of phoneme the initial consonant is would be enough information to disambiguate these words for an individual with relatively good low frequency hearing or proficiency in speech reading. In fact, simply knowing if the initial consonant is a plosive, fricative, and/or voiced is sufficient to discriminate between each word in the list.
  • CONCLUSION
  • While specific blocks, sections, devices, functions and modules may have been set forth above, a skilled technologist will realize that there are many ways to partition the system, and that there are many parts, components, modules or functions that may be substituted for those listed above.
  • While the above detailed description has shown, described, and pointed out the fundamental novel features of the invention as applied to various embodiments, it will be understood that various omissions and substitutions and changes in the form and details of the system illustrated may be made by those skilled in the art, without departing from the intent of the invention.

Claims (110)

1. A method of transforming a sequence of symbols representing phonemes into a sequence of arrays of nerve stimuli, the method comprising:
establishing a correlation between each member of a phoneme symbol set and an assignment of one or more channels of a multi-electrode array;
accessing a sequence of phonetic symbols corresponding to a message; and
activating a sequence of one or more electrodes corresponding to each phonetic symbol of the message identified by the correlation.
2. The method of claim 1, wherein the phonetic symbols belong to one of SAMPA, Kirshenbaum, or IPA Unicode digital character sets.
3. The method of claim 1, wherein the symbols belong to the cmudict phoneme set.
4. The method of claim 1, wherein the correlation is a one to one correlation.
5. The method of claim 1, wherein activating a sequence of one or more electrodes includes an energizing period for each electrode, wherein the energizing period comprises a begin time parameter and an end time parameter.
6. The method of claim 5, wherein the begin time parameter is representative of a time from an end of components of a previous energizing period of a particular electrode.
7. The method of claim 1, wherein the electrodes are associated with a hearing prosthesis.
8. The method of claim 1, wherein the hearing prosthesis comprises a cochlear implant.
9. A method of processing a sequence of spoken words into a sequence of sounds, the method comprising:
converting a sequence of spoken words into electrical signals;
digitizing the electrical signals representative of the speech sounds;
transforming the speech sounds into digital symbols representing corresponding phonemes;
transforming the symbols representing the corresponding phonemes into sound representations; and
transforming the sound representations into sounds.
10. The method of claim 9, wherein transforming the symbols representing the phonemes into sound representations comprises:
accessing a data structure configured to map phonemes to sound representations;
locating the symbols representing the corresponding phonemes in the data structure; and
mapping the phonemes to sound representations.
11. The method of claim 10, additionally comprising creating the data structure, comprising:
identifying phonemes corresponding to a language used by a user of the method;
establishing a set of allowed sound frequencies;
generating a correspondence mapping the identified phonemes to the set of allowed sound frequencies such that each constituent phoneme of the identified phonemes is assigned a subset of one or more frequencies from the set of allowed sound frequencies; and
mapping each constituent phoneme of the identified phonemes to a set of one or more sounds.
12. The method of claim 11, wherein establishing a set of allowed sound frequencies comprises selecting a set of sound frequencies that are in a hearing range of the user.
13. The method of claim 11, wherein each sound of the set of one more sounds comprises an initial frequency parameter.
14. The method of claim 11, wherein each sound of the set of one more sounds comprises a begin time parameter.
15. The method of claim 14, wherein the begin time parameter is representative of a time from an end of components of a previous sound representation.
16. The method of claim 11, wherein each sound of the set of one more sounds comprises an end time parameter.
17. The method of claim 11, wherein each sound of the set of one more sounds comprises a power parameter.
18. The method of claim 11, wherein each sound of the set of one more sounds comprises a power shift parameter.
19. The method of claim 11, wherein each sound of the set of one more sounds comprises a frequency shift parameter.
20. The method of claim 11, wherein each sound of the set of one more sounds comprises a pulse rate parameter.
21. The method of claim 11, wherein each sound of the set of one more sounds comprises a duty cycle parameter.
22. A method of processing a sequence of spoken words into a sequence of nerve stimuli, the method comprising:
converting a sequence of spoken words into electrical signals;
digitizing the electrical signals representative of the speech sounds;
transforming the speech sounds into digital symbols representing corresponding phonemes;
transforming the symbols representing the corresponding phonemes into stimulus definitions; and
transforming the stimulus definitions into a sequence of nerve stimuli.
23. The method of claim 22, wherein the nerve stimuli are associated with a hearing prosthesis.
24. The method of claim 23, wherein the hearing prosthesis comprises a cochlear implant.
25. The method of claim 22, wherein the nerve stimuli are associated with a skin interface.
26. The method of claim 25, wherein the skin interface is located on the wrist and/or hand of the user.
27. The method of claim 25, wherein the skin interface is located on the ankle and/or foot of the user.
28. The method of claim 22, wherein the nerve stimuli are mechanical.
29. The method of claim 22, wherein the nerve stimuli are electrical.
30. The method of claim 22, wherein transforming the symbols representing the phonemes into stimulus definitions comprises:
accessing a data structure configured to map phonemes to stimulus definitions;
locating the symbols representing the corresponding phonemes in the data structure; and
mapping the phonemes to stimulus definitions.
31. The method of claim 22, wherein the stimulus definitions comprise sets of one or more stimuli.
32. The method of claim 31, wherein the sets of one or more stimuli correspond to one or more locations on the skin.
33. The method of claim 31, wherein the sets of one or more stimuli correspond to one or more locations in the cochlea.
34. The method of claim 31, wherein each stimulus of the sets of one or more stimuli comprises a begin time parameter.
35. The method of claim 34, wherein the begin time parameter is representative of a time from an end of components of a previous stimulus definition.
36. The method of claim 31, wherein each stimulus of the sets of one or more stimuli comprises an end time parameter.
37. A method of transforming a sequence of symbols representing phonemes into a sequence of arrays of nerve stimuli, the method comprising:
establishing a correlation between each member of a phoneme symbol set and an assignment of one or more channels of a multi-stimulator array;
accessing a sequence of phonetic symbols corresponding to a message; and
activating a sequence of one or more stimulators corresponding to each phonetic symbol of the message identified by the correlation.
38. A method of claim 37, wherein the stimulators are vibrators affixed to the user's skin.
39. A method of claim 37, wherein the phonetic symbols belong to one of SAMPA, Kirshenbaum, or IPA Unicode digital character sets.
40. A method of claim 37, wherein the symbols belong to the cmudict phoneme set.
41. A method of claim 37, wherein the correlation is a one to one correlation.
42. The method of claim 37, wherein activating a sequence of one or more stimulators includes an energizing period for each stimulator, wherein the energizing period comprises a begin time parameter and an end time parameter.
43. The method of claim 42, wherein the begin time parameter is representative of a time from an end of components of a previous energizing period of a particular stimulator.
44. A method of training a user, the method comprising:
providing a set of somatic stimulations to a user, wherein the set of somatic stimulations is indicative of a plurality of phonemes, and wherein the phonemes are based at least in part on an audio communication;
providing the audio communication concurrently to the user with the plurality of phonemes; and
selectively modifying at least portions of the audio communication to the user during the providing of the set of somatic stimulations to the user.
45. The method of claim 44, wherein selectively modifying at least portions of the audio communication comprises reducing an audio property of the audio communication.
46. The method of claim 45, wherein the audio property comprises a volume of the audio.
47. The method of claim 45, wherein the audio property comprises omitting selected words from the audio.
48. The method of claim 45, wherein the audio property comprises attenuating a volume of selected words from the audio.
49. The method of claim 45, wherein the audio property comprises omitting selected phonemes from the audio.
50. The method of claim 45, wherein the audio property comprises attenuating a volume of selected phonemes from the audio.
51. The method of claim 45, wherein the audio property comprises omitting selected sound frequencies from the audio.
52. The method of claim 45, wherein the audio property comprises attenuating a volume of selected sound frequencies from the audio.
53. A method of training a user, the method comprising:
providing a set of somatic stimulations to a user, wherein the set of somatic stimulations is indicative of a plurality of phonemes, and wherein the phonemes are based at least in part on an audiovisual communication;
providing the audiovisual communication concurrently to the user with the plurality of phonemes; and
selectively modifying at least portions of the audiovisual communication to the user during the providing of the set of somatic stimulations to the user.
54. The method of claim 53, wherein selectively modifying at least portions of the audiovisual communication comprises reducing an audio or video property of the audiovisual communication.
55. The method of claim 54, wherein the audio property comprises a volume of the audio.
56. The method of claim 54, wherein the audio property comprises omitting selected words from the audio.
57. The method of claim 54, wherein the audio property comprises attenuating a volume of selected words from the audio.
58. The method of claim 54, wherein the audio property comprises omitting selected phonemes from the audio.
59. The method of claim 54, wherein the audio property comprises attenuating a volume of selected phonemes from the audio.
60. The method of claim 54, wherein the audio property comprises omitting selected sound frequencies from the audio.
61. The method of claim 54, wherein the audio property comprises attenuating a volume of selected sound frequencies from the audio.
62. The method of claim 54, wherein the video property comprises a presence or brightness of the video.
63. A system for processing a sequence of spoken words into a sequence of sounds, the system comprising:
a first converter configured to digitize electrical signals representative of a sequence of spoken words;
a speech recognizer configured to receive the digitized electrical signals and generate a sequence of phonemes representative of the sequence of spoken words;
a mapper configured to assign sound sets to phonemes utilizing an audiogram so as to generate a map;
a transformer configured to receive the sequence of phonemes representative of the sequence of spoken words and the map and to generate a sequence of sound representations corresponding to the sequence of phonemes; and
a second converter configured to convert the sequence of sound representations into a sequence of audible sounds.
64. The system of claim 63, wherein the map is a user-specific map based on a particular user's audiogram.
65. A system for processing a sequence of spoken words into a sequence of sounds, the system comprising:
a first converter configured to digitize electrical signals representative of a sequence of spoken words;
a speech recognizer configured to receive the digitized electrical signals and generate a sequence of phonemes representative of the sequence of spoken words;
a data structure comprising sound sets mapped to phonemes;
a transformer configured to receive the sequence of phonemes representative of the sequence of spoken words and the data structure and to generate a sequence of sound representations corresponding to the sequence of phonemes; and
a second converter configured to convert the sequence of sound representations into a sequence of audible sounds.
66. The system of claim 65, wherein the data structure is generated utilizing a user's audiogram.
67. A system for processing a sequence of spoken words into a sequence of nerve stimuli, the system comprising:
a converter configured to digitize electrical signals representative of a sequence of spoken words;
a speech recognizer configured to receive the digitized electrical signals and generate a sequence of phonemes representative of the sequence of spoken words;
a mapper configured to assign nerve stimuli arrays to phonemes utilizing an audiogram so as to generate a map; and
a transformer configured to receive the sequence of phonemes representative of the sequence of spoken words and the map and to generate a sequence of stimulus definitions corresponding to the sequence of phonemes.
68. The system of claim 67, additionally comprising:
a receiver configured to convert the sequence of stimulus definitions into electrical waveforms; and
an electrode array configured to receive the electrical waveforms.
69. The system of claim 68, wherein the electrode array is surgically placed in the user's cochlea.
70. The system of claim 67, wherein the sequence of stimulus definitions comprise digital representations of nerve stimulation patterns.
71. A system for processing a sequence of spoken words into a sequence of nerve stimuli, the system comprising:
a converter configured to digitize electrical signals representative of a sequence of spoken words;
a speech recognizer configured to receive the digitized electrical signals and generate a sequence of phonemes representative of the sequence of spoken words;
a data structure comprising nerve stimuli arrays mapped to phonemes; and
a transformer configured to receive the sequence of phonemes representative of the sequence of spoken words and the data structure and to generate a sequence of stimulus definitions corresponding to the sequence of phonemes.
72. The system of claim 71, wherein the data structure is generated utilizing a user's audiogram.
73. The system of claim 71, additionally comprising:
a receiver configured to convert the sequence of stimulus definitions into electrical waveforms; and
an electrode array configured to receive the electrical waveforms.
74. The system of claim 73, wherein the electrode array is surgically placed in the user's cochlea.
75. The system of claim 71, wherein the sequence of stimulus definitions comprise digital representations of nerve stimulation patterns.
76. A system for processing a sequence of spoken words into a sequence of nerve stimuli, the system comprising:
a processor configured to generate a sequence of phonemes representative of a sequence of spoken words and to transform the sequence of phonemes using a data structure comprising nerve stimuli arrays mapped to phonemes to produce a sequence of stimulus definitions corresponding to the sequence of phonemes; and
an electrode array configured to play the sequence of stimulus definitions.
77. The system of claim 76, wherein the data structure is generated utilizing a user's audiogram.
78. The system of claim 76, wherein the electrode array comprises a converter configured to convert the sequence of stimulus definitions into electrical waveforms.
79. The system of claim 76, wherein the electrode array is surgically placed in the user's cochlea.
80. The system of claim 76, wherein the electrode array comprises a plurality of mechanical stimulators.
81. The system of claim 76, wherein the electrode array comprises a plurality of electrodes.
82. The system of claim 76, wherein the sequence of stimulus definitions comprise digital representations of nerve stimulation patterns.
83. A system for processing a sequence of spoken words into a sequence of sounds, the system comprising:
a processor configured to generate a sequence of phonemes representative of the sequence of spoken words and to transform the sequence of phonemes using a data structure comprising sound sets mapped to phonemes to produce sound representations corresponding to the sequence of phonemes; and
a converter configured to convert the sound representations into audible sounds.
84. The system of claim 83, wherein the data structure is generated utilizing a user's audiogram.
85. A system for processing a sequence of text into a sequence of sounds, the system comprising:
a first converter configured to receive a sequence of text and generate a sequence of phonemes representative of the sequence of text;
a mapper configured to assign sound sets to phonemes utilizing a hearing audiogram so as to generate a map;
a transformer configured to receive the sequence of phonemes representative of the sequence of text and the map and to generate sound representations corresponding to the sequence of phonemes; and
a second converter configured to convert the sound representations into audible sounds.
86. The system of claim 85, wherein the hearing audiogram is representative of a normal human hearing range.
87. The system of claim 85, wherein the hearing audiogram is representative of a hearing range for a specific individual.
88. A system for processing a sequence of text into a sequence of sounds, the system comprising:
a text converter configured to receive a sequence of text and generate a sequence of phonemes representative of the sequence of text;
a data structure comprising sound sets mapped to phonemes;
a transformer configured to receive the sequence of phonemes representative of the sequence of text and the data structure and to generate sound representations corresponding to the sequence of phonemes; and
a second converter configured to convert the sound representations into audible sounds.
89. The system of claim 88, wherein the data structure is generated utilizing a user's audiogram.
90. A system for processing a sequence of text into a sequence of nerve stimuli, the system comprising:
a converter configured to receive a sequence of text and generate a sequence of phonemes representative of the sequence of text;
a data structure comprising nerve stimuli arrays mapped to phonemes; and
a transformer configured to receive the sequence of phonemes representative of the sequence of text and the data structure and to generate a sequence of stimulus definitions corresponding to the sequence of phonemes.
91. The system of claim 90, wherein the data structure is generated utilizing a user's abilities.
92. The system of claim 91, wherein the user's abilities comprise useable channels of a cochlear implant of the user.
93. The system of claim 91, wherein the user's abilities comprise the ability to distinguish between two or more unique stimuli.
94. A method of processing a sequence of text into a sequence of sounds, the method comprising:
transforming the sequence of text into digital symbols representing corresponding phonemes;
transforming the symbols representing the corresponding phonemes into sound representations; and
transforming the sound representations into a sequence of sounds.
95. A method of processing a sequence of text into a sequence of nerve stimuli, the method comprising:
transforming the sequence of text into digital symbols representing corresponding phonemes;
transforming the symbols representing the corresponding phonemes into stimulus definitions; and
transforming the stimulus definitions into a sequence of nerve stimuli.
96. The method of claim 95, wherein the nerve stimuli are associated with a cochlear implant.
97. The method of claim 95, wherein the nerve stimuli are associated with a skin interface.
98. The method of claim 97, wherein the skin interface is located on the wrist and/or hand of the user.
99. The method of claim 95, wherein transforming the symbols representing the phonemes into stimulus definitions comprises:
accessing a data structure configured to map phonemes to stimulus definitions;
locating the symbols representing the corresponding phonemes in the data structure; and
mapping the phonemes to stimulus definitions.
100. A method of creating a data structure configured to transform symbols representing phonemes into sound representations, the method comprising:
identifying phonemes corresponding to a language utilized by a user;
establishing a set of allowed sound frequencies;
generating a correspondence mapping the identified phonemes to the set of allowed sound frequencies such that each constituent phoneme of the identified phonemes is assigned a subset of one or more frequencies from the set of allowed sound frequencies; and
mapping each constituent phoneme of the identified phonemes to a set of one or more sounds.
101. The method of claim 100, wherein establishing a set of allowed sound frequencies comprises selecting a set of sound frequencies that are in a hearing range of the user.
102. The method of claim 100, wherein each sound of the set of one more sounds comprises an initial frequency parameter.
103. The method of claim 100, wherein each sound of the set of one more sounds comprises a begin time parameter.
104. The method of claim 103, wherein the begin time parameter is representative of a time from an end of components of a previous sound representation.
105. The method of claim 100, wherein each sound of the set of one more sounds comprises an end time parameter.
106. The method of claim 100, wherein each sound of the set of one more sounds comprises a power parameter.
107. The method of claim 100, wherein each sound of the set of one more sounds comprises a power shift parameter.
108. The method of claim 100, wherein each sound of the set of one more sounds comprises a frequency shift parameter.
109. The method of claim 100, wherein each sound of the set of one more sounds comprises a pulse rate parameter.
110. The method of claim 100, wherein each sound of the set of one more sounds comprises a duty cycle parameter.
US11/997,902 2005-08-03 2006-08-03 Somatic, auditory and cochlear communication system and method Abandoned US20090024183A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/997,902 US20090024183A1 (en) 2005-08-03 2006-08-03 Somatic, auditory and cochlear communication system and method

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US70521905P 2005-08-03 2005-08-03
US11/997,902 US20090024183A1 (en) 2005-08-03 2006-08-03 Somatic, auditory and cochlear communication system and method
PCT/US2006/030437 WO2007019307A2 (en) 2005-08-03 2006-08-03 Somatic, auditory and cochlear communication system and method

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2006/030437 A-371-Of-International WO2007019307A2 (en) 2005-08-03 2006-08-03 Somatic, auditory and cochlear communication system and method

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/489,406 Continuation US10540989B2 (en) 2005-08-03 2014-09-17 Somatic, auditory and cochlear communication system and method

Publications (1)

Publication Number Publication Date
US20090024183A1 true US20090024183A1 (en) 2009-01-22

Family

ID=37714372

Family Applications (4)

Application Number Title Priority Date Filing Date
US11/997,902 Abandoned US20090024183A1 (en) 2005-08-03 2006-08-03 Somatic, auditory and cochlear communication system and method
US14/489,406 Active 2028-12-19 US10540989B2 (en) 2005-08-03 2014-09-17 Somatic, auditory and cochlear communication system and method
US16/746,592 Abandoned US20200152223A1 (en) 2005-08-03 2020-01-17 Somatic, auditory and cochlear communication system and method
US17/657,581 Active US11878169B2 (en) 2005-08-03 2022-03-31 Somatic, auditory and cochlear communication system and method

Family Applications After (3)

Application Number Title Priority Date Filing Date
US14/489,406 Active 2028-12-19 US10540989B2 (en) 2005-08-03 2014-09-17 Somatic, auditory and cochlear communication system and method
US16/746,592 Abandoned US20200152223A1 (en) 2005-08-03 2020-01-17 Somatic, auditory and cochlear communication system and method
US17/657,581 Active US11878169B2 (en) 2005-08-03 2022-03-31 Somatic, auditory and cochlear communication system and method

Country Status (2)

Country Link
US (4) US20090024183A1 (en)
WO (1) WO2007019307A2 (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050243996A1 (en) * 2004-05-03 2005-11-03 Fitchmun Mark I System and method for providing particularized audible alerts
US20100056950A1 (en) * 2008-08-29 2010-03-04 University Of Florida Research Foundation, Inc. System and methods for creating reduced test sets used in assessing subject response to stimuli
US20110320481A1 (en) * 2010-06-23 2011-12-29 Business Objects Software Limited Searching and matching of data
WO2013009805A1 (en) 2011-07-11 2013-01-17 Med-El Elektromedizinische Geraete Gmbh Test methods for cochlear implant stimulation strategies
US20130023963A1 (en) * 2011-07-22 2013-01-24 Lockheed Martin Corporation Cochlear implant using optical stimulation with encoded information designed to limit heating effects
US20140207456A1 (en) * 2010-09-23 2014-07-24 Waveform Communications, Llc Waveform analysis of speech
US20150012261A1 (en) * 2012-02-16 2015-01-08 Continetal Automotive Gmbh Method for phonetizing a data list and voice-controlled user interface
US20150289786A1 (en) * 2014-04-11 2015-10-15 Reginald G. Garratt Method of Acoustic Screening for Processing Hearing Loss Patients by Executing Computer-Executable Instructions Stored On a Non-Transitory Computer-Readable Medium
US20160148616A1 (en) * 2014-11-26 2016-05-26 Panasonic Intellectual Property Corporation Of America Method and apparatus for recognizing speech by lip reading
US20160155437A1 (en) * 2014-12-02 2016-06-02 Google Inc. Behavior adjustment using speech recognition system
US20160180155A1 (en) * 2014-12-22 2016-06-23 Fu Tai Hua Industry (Shenzhen) Co., Ltd. Electronic device and method for processing voice in video
US20160331965A1 (en) * 2015-05-14 2016-11-17 Kuang-Chao Chen Cochlea hearing aid fixed on eardrum
US20170098350A1 (en) * 2015-05-15 2017-04-06 Mick Ebeling Vibrotactile control software systems and methods
WO2017062701A1 (en) * 2015-10-09 2017-04-13 Med-El Elektromedizinische Geraete Gmbh Estimation of harmonic frequencies for hearing implant sound coding using active contour models
EP3056022A4 (en) * 2013-10-07 2017-05-31 Med-El Elektromedizinische Geraete GmbH Method for extracting temporal features from spike-like signals
US20170294086A1 (en) * 2016-04-12 2017-10-12 Andrew Kerdemelidis Haptic Communication Apparatus and Method
US9800982B2 (en) 2014-06-18 2017-10-24 Cochlear Limited Electromagnetic transducer with expanded magnetic flux functionality
US9913983B2 (en) 2013-10-25 2018-03-13 Cochlear Limited Alternate stimulation strategies for perception of speech
US20180239581A1 (en) * 2013-03-15 2018-08-23 Sonitum Inc. Topological mapping of control parameters
CN108778410A (en) * 2016-03-11 2018-11-09 梅约医学教育与研究基金会 The cochlear stimulation system eliminated with surround sound and noise
CN108883274A (en) * 2016-02-29 2018-11-23 领先仿生公司 The system and method for determining behavioral audiogram value for using the response of induction
US10321247B2 (en) 2015-11-27 2019-06-11 Cochlear Limited External component with inductance and mechanical vibratory functionality
US10540989B2 (en) 2005-08-03 2020-01-21 Somatek Somatic, auditory and cochlear communication system and method
US10757516B2 (en) 2013-10-29 2020-08-25 Cochlear Limited Electromagnetic transducer with specific interface geometries
US10854108B2 (en) * 2017-04-17 2020-12-01 Facebook, Inc. Machine communication system using haptic symbol set
US20230351868A1 (en) * 2014-05-16 2023-11-02 Not Impossible, Llc Vibrotactile control systems and methods

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102011006515A1 (en) * 2011-03-31 2012-10-04 Siemens Medical Instruments Pte. Ltd. Method for improving speech intelligibility with a hearing aid device and hearing aid device
US20170154546A1 (en) * 2014-08-21 2017-06-01 Jobu Productions Lexical dialect analysis system
US20220076663A1 (en) * 2019-06-24 2022-03-10 Cochlear Limited Prediction and identification techniques used with a hearing prosthesis
RU192148U1 (en) * 2019-07-15 2019-09-05 Общество С Ограниченной Ответственностью "Бизнес Бюро" (Ооо "Бизнес Бюро") DEVICE FOR AUDIOVISUAL NAVIGATION OF DEAD-DEAF PEOPLE

Citations (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4441202A (en) * 1979-05-28 1984-04-03 The University Of Melbourne Speech processor
US4581491A (en) * 1984-05-04 1986-04-08 Research Corporation Wearable tactile sensory aid providing information on voice pitch and intonation patterns
US4682367A (en) * 1985-11-13 1987-07-21 General Electric Company Mobile radio communications system with join feature
US4720848A (en) * 1983-12-05 1988-01-19 Nippo Communication Industrial Co. Communication system with voice announcement means
US5479489A (en) * 1994-11-28 1995-12-26 At&T Corp. Voice telephone dialing architecture
WO1996012383A1 (en) * 1994-10-17 1996-04-25 The University Of Melbourne Multiple pulse stimulation
US5559860A (en) * 1992-06-11 1996-09-24 Sony Corporation User selectable response to an incoming call at a mobile station
US5661788A (en) * 1995-01-25 1997-08-26 Samsung Electronics Co., Ltd. Method and system for selectively alerting user and answering preferred telephone calls
US5740532A (en) * 1996-05-06 1998-04-14 Motorola, Inc. Method of transmitting emergency messages in a RF communication system
US6002966A (en) * 1995-04-26 1999-12-14 Advanced Bionics Corporation Multichannel cochlear prosthesis with flexible control of stimulus waveforms
US6122347A (en) * 1997-11-13 2000-09-19 Advanced Micro Devices, Inc. System and method for self-announcing a caller of an incoming telephone call
US6160489A (en) * 1994-06-23 2000-12-12 Motorola, Inc. Wireless communication device adapted to generate a plurality of distinctive tactile alert patterns
US6178167B1 (en) * 1996-04-04 2001-01-23 Lucent Technologies, Inc. Customer telecommunication interface device having a unique identifier
US6289085B1 (en) * 1997-07-10 2001-09-11 International Business Machines Corporation Voice mail system, voice synthesizing device and method therefor
US6353671B1 (en) * 1998-02-05 2002-03-05 Bioinstco Corp. Signal processing circuit and method for increasing speech intelligibility
US6373925B1 (en) * 1996-06-28 2002-04-16 Siemens Aktiengesellschaft Telephone calling party announcement system and method
US6385303B1 (en) * 1997-11-13 2002-05-07 Legerity, Inc. System and method for identifying and announcing a caller and a callee of an incoming telephone call
US20020118804A1 (en) * 2000-10-26 2002-08-29 Elisa Carroll Caller-identification phone without ringer
US20020137553A1 (en) * 2001-03-22 2002-09-26 Kraemer Tim D. Distinctive ringing for mobile devices using digitized user recorded audio message
US20020156630A1 (en) * 2001-03-02 2002-10-24 Kazunori Hayashi Reading system and information terminal
US20020196914A1 (en) * 2001-06-25 2002-12-26 Bellsouth Intellectual Property Corporation Audio caller identification
US6501967B1 (en) * 1996-02-23 2002-12-31 Nokia Mobile Phones, Ltd. Defining of a telephone's ringing tone
US20030013432A1 (en) * 2000-02-09 2003-01-16 Kazunari Fukaya Portable telephone and music reproducing method
US20030016813A1 (en) * 2001-07-17 2003-01-23 Comverse Network Systems, Ltd. Personal ring tone message indicator
US20030061041A1 (en) * 2001-09-25 2003-03-27 Stephen Junkins Phoneme-delta based speech compression
US6573825B1 (en) * 1998-12-25 2003-06-03 Nec Corporation Communication apparatus and alerting method
US20030161454A1 (en) * 2002-02-26 2003-08-28 Shary Nassimi Self-contained distinctive ring, voice, facsimile, and internet device
US6618474B1 (en) * 1999-03-08 2003-09-09 Morris Reese Method and apparatus for providing to a customer a promotional message between ringing signals or after a call waiting tone
US6621418B1 (en) * 1999-09-14 2003-09-16 Christophe Cayrol Device warning against the presence of dangerous objects
US6628195B1 (en) * 1999-11-10 2003-09-30 Jean-Max Coudon Tactile stimulation device for use by a deaf person
US6636602B1 (en) * 1999-08-25 2003-10-21 Giovanni Vlacancich Method for communicating
US20040037403A1 (en) * 2002-03-29 2004-02-26 Koch Robert A. Audio delivery of caller identification information
US6714637B1 (en) * 1999-10-19 2004-03-30 Nortel Networks Limited Customer programmable caller ID alerting indicator
US20040082980A1 (en) * 2000-10-19 2004-04-29 Jaouhar Mouine Programmable neurostimulator
US6807259B1 (en) * 2000-06-09 2004-10-19 Nortel Networks, Ltd. Audible calling line identification
US20050243996A1 (en) * 2004-05-03 2005-11-03 Fitchmun Mark I System and method for providing particularized audible alerts
US7062036B2 (en) * 1999-02-06 2006-06-13 Christopher Guy Williams Telephone call information delivery system
US7136811B2 (en) * 2002-04-24 2006-11-14 Motorola, Inc. Low bandwidth speech communication using default and personal phoneme tables
US20060274144A1 (en) * 2005-06-02 2006-12-07 Agere Systems, Inc. Communications device with a visual ring signal and a method of generating a visual signal
US7206572B2 (en) * 1992-01-29 2007-04-17 Classco Inc. Calling party announcement apparatus
US7231019B2 (en) * 2004-02-12 2007-06-12 Microsoft Corporation Automatic identification of telephone callers based on voice characteristics
US20070147601A1 (en) * 2002-01-18 2007-06-28 Tischer Steven N Audio alert system and method
US7257210B1 (en) * 1994-01-05 2007-08-14 Intellect Wireless Inc. Picture phone with caller id
US7315618B1 (en) * 2001-12-27 2008-01-01 At&T Bls Intellectual Property, Inc. Voice caller ID
US7366337B2 (en) * 2004-02-11 2008-04-29 Sbc Knowledge Ventures, L.P. Personal bill denomination reader
US7443967B1 (en) * 2003-09-29 2008-10-28 At&T Intellectual Property I, L.P. Second communication during ring suppression
US7483832B2 (en) * 2001-12-10 2009-01-27 At&T Intellectual Property I, L.P. Method and system for customizing voice translation of text to speech
US20100141467A1 (en) * 2007-02-07 2010-06-10 Gary John Kirkpatrick Apparatus for Providing Visual and/or Audible Alert Signals
US7881449B2 (en) * 2003-09-30 2011-02-01 At&T Intellectual Property Ii, L.P. Enhanced call notification service
US8086245B2 (en) * 2002-09-12 2011-12-27 Broadcom Corporation Advertising and controlling the advertisement of wireless hot spots

Family Cites Families (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5029200A (en) 1989-05-02 1991-07-02 At&T Bell Laboratories Voice message system using synthetic speech
JP4203122B2 (en) 1991-12-31 2008-12-24 ユニシス・パルスポイント・コミュニケーションズ Voice control communication apparatus and processing method
US5907597A (en) 1994-08-05 1999-05-25 Smart Tone Authentication, Inc. Method and system for the secure communication of data
DE69615667T2 (en) 1995-03-07 2002-06-20 British Telecomm VOICE RECOGNITION
EP0804850B1 (en) 1995-11-17 2005-08-03 AT&T Corp. Automatic vocabulary generation for telecommunications network-based voice-dialing
US5912949A (en) 1996-11-05 1999-06-15 Northern Telecom Limited Voice-dialing system using both spoken names and initials in recognition
US5933805A (en) 1996-12-13 1999-08-03 Intel Corporation Retaining prosody during speech analysis for later playback
US6775264B1 (en) 1997-03-03 2004-08-10 Webley Systems, Inc. Computer, internet and telecommunications based network
US5991364A (en) 1997-03-27 1999-11-23 Bell Atlantic Network Services, Inc. Phonetic voice activated dialing
AUPO709197A0 (en) * 1997-05-30 1997-06-26 University Of Melbourne, The Improvements in electrotactile vocoders
US5978689A (en) 1997-07-09 1999-11-02 Tuoriniemi; Veijo M. Personal portable communication and audio system
US6018571A (en) 1997-09-30 2000-01-25 Mitel Corporation System for interactive control of a computer and telephone
JPH11219443A (en) 1998-01-30 1999-08-10 Konami Co Ltd Method and device for controlling display of character image, and recording medium
WO1999049681A1 (en) 1998-03-25 1999-09-30 Qualcomm Incorporated Method and apparatus for performing handsfree operations and voicing text with a cdma telephone
US6073094A (en) 1998-06-02 2000-06-06 Motorola Voice compression by phoneme recognition and communication of phoneme indexes and voice features
US6163691A (en) 1998-06-24 2000-12-19 Uniden America Corporation Caller identification in a radio communication system
US6374217B1 (en) 1999-03-12 2002-04-16 Apple Computer, Inc. Fast update implementation for efficient latent semantic language modeling
US6385584B1 (en) 1999-04-30 2002-05-07 Verizon Services Corp. Providing automated voice responses with variable user prompting
US7260187B1 (en) 1999-05-11 2007-08-21 Verizon Services Corp. Voice response apparatus and method of providing automated voice responses with silent prompting
US6904405B2 (en) 1999-07-17 2005-06-07 Edwin A. Suominen Message recognition using shared language model
US6421672B1 (en) 1999-07-27 2002-07-16 Verizon Services Corp. Apparatus for and method of disambiguation of directory listing searches utilizing multiple selectable secondary search keys
US6975988B1 (en) 2000-11-10 2005-12-13 Adam Roth Electronic mail method and system using associated audio and visual techniques
US7376640B1 (en) 2000-11-14 2008-05-20 At&T Delaware Intellectual Property, Inc. Method and system for searching an information retrieval system according to user-specified location information
WO2002058378A2 (en) 2001-01-12 2002-07-25 Whp Wireless, Inc. Systems and methods for communications
AU2002255568B8 (en) 2001-02-20 2014-01-09 Adidas Ag Modular personal network systems and methods
US20040233892A1 (en) 2001-05-16 2004-11-25 Roberts Linda Ann Priority caller alert
EP2432190A3 (en) 2001-06-27 2014-02-19 SKKY Incorporated Improved media delivery platform
US7277734B1 (en) 2001-09-28 2007-10-02 At&T Bls Intellectual Property, Inc. Device, system and method for augmenting cellular telephone audio signals
US7735011B2 (en) 2001-10-19 2010-06-08 Sony Ericsson Mobile Communications Ab Midi composer
WO2003069874A2 (en) 2002-02-11 2003-08-21 Unified Dispatch, Inc. Automated transportation call-taking system
US7250846B2 (en) 2002-03-05 2007-07-31 International Business Machines Corporation Method and apparatus for providing dynamic user alert
US7353455B2 (en) 2002-05-21 2008-04-01 At&T Delaware Intellectual Property, Inc. Caller initiated distinctive presence alerting and auto-response messaging
US7467087B1 (en) 2002-10-10 2008-12-16 Gillick Laurence S Training and using pronunciation guessers in speech recognition
US20040114747A1 (en) 2002-12-12 2004-06-17 Trandal David S. Systems and methods for call processing
US7672439B2 (en) 2003-04-02 2010-03-02 Aol Inc. Concatenated audio messages
US7769811B2 (en) 2003-03-03 2010-08-03 Aol Llc Instant messaging sound control
US7412383B1 (en) 2003-04-04 2008-08-12 At&T Corp Reducing time for annotating speech data to develop a dialog application
US7957513B2 (en) 2003-11-21 2011-06-07 At&T Intellectual Property I, L.P. Method, system and computer program product for providing a no-ring telephone call service
US20090024183A1 (en) 2005-08-03 2009-01-22 Fitchmun Mark I Somatic, auditory and cochlear communication system and method

Patent Citations (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4441202A (en) * 1979-05-28 1984-04-03 The University Of Melbourne Speech processor
US4720848A (en) * 1983-12-05 1988-01-19 Nippo Communication Industrial Co. Communication system with voice announcement means
US4581491A (en) * 1984-05-04 1986-04-08 Research Corporation Wearable tactile sensory aid providing information on voice pitch and intonation patterns
US4682367A (en) * 1985-11-13 1987-07-21 General Electric Company Mobile radio communications system with join feature
US7206572B2 (en) * 1992-01-29 2007-04-17 Classco Inc. Calling party announcement apparatus
US5559860A (en) * 1992-06-11 1996-09-24 Sony Corporation User selectable response to an incoming call at a mobile station
US7257210B1 (en) * 1994-01-05 2007-08-14 Intellect Wireless Inc. Picture phone with caller id
US6160489A (en) * 1994-06-23 2000-12-12 Motorola, Inc. Wireless communication device adapted to generate a plurality of distinctive tactile alert patterns
WO1996012383A1 (en) * 1994-10-17 1996-04-25 The University Of Melbourne Multiple pulse stimulation
US5479489A (en) * 1994-11-28 1995-12-26 At&T Corp. Voice telephone dialing architecture
US5661788A (en) * 1995-01-25 1997-08-26 Samsung Electronics Co., Ltd. Method and system for selectively alerting user and answering preferred telephone calls
US6002966A (en) * 1995-04-26 1999-12-14 Advanced Bionics Corporation Multichannel cochlear prosthesis with flexible control of stimulus waveforms
US6501967B1 (en) * 1996-02-23 2002-12-31 Nokia Mobile Phones, Ltd. Defining of a telephone's ringing tone
US6178167B1 (en) * 1996-04-04 2001-01-23 Lucent Technologies, Inc. Customer telecommunication interface device having a unique identifier
US5740532A (en) * 1996-05-06 1998-04-14 Motorola, Inc. Method of transmitting emergency messages in a RF communication system
US6373925B1 (en) * 1996-06-28 2002-04-16 Siemens Aktiengesellschaft Telephone calling party announcement system and method
US6289085B1 (en) * 1997-07-10 2001-09-11 International Business Machines Corporation Voice mail system, voice synthesizing device and method therefor
US6385303B1 (en) * 1997-11-13 2002-05-07 Legerity, Inc. System and method for identifying and announcing a caller and a callee of an incoming telephone call
US6122347A (en) * 1997-11-13 2000-09-19 Advanced Micro Devices, Inc. System and method for self-announcing a caller of an incoming telephone call
US6353671B1 (en) * 1998-02-05 2002-03-05 Bioinstco Corp. Signal processing circuit and method for increasing speech intelligibility
US6573825B1 (en) * 1998-12-25 2003-06-03 Nec Corporation Communication apparatus and alerting method
US7062036B2 (en) * 1999-02-06 2006-06-13 Christopher Guy Williams Telephone call information delivery system
US6618474B1 (en) * 1999-03-08 2003-09-09 Morris Reese Method and apparatus for providing to a customer a promotional message between ringing signals or after a call waiting tone
US6636602B1 (en) * 1999-08-25 2003-10-21 Giovanni Vlacancich Method for communicating
US6621418B1 (en) * 1999-09-14 2003-09-16 Christophe Cayrol Device warning against the presence of dangerous objects
US6714637B1 (en) * 1999-10-19 2004-03-30 Nortel Networks Limited Customer programmable caller ID alerting indicator
US6628195B1 (en) * 1999-11-10 2003-09-30 Jean-Max Coudon Tactile stimulation device for use by a deaf person
US20030013432A1 (en) * 2000-02-09 2003-01-16 Kazunari Fukaya Portable telephone and music reproducing method
US6807259B1 (en) * 2000-06-09 2004-10-19 Nortel Networks, Ltd. Audible calling line identification
US20040082980A1 (en) * 2000-10-19 2004-04-29 Jaouhar Mouine Programmable neurostimulator
US20020118804A1 (en) * 2000-10-26 2002-08-29 Elisa Carroll Caller-identification phone without ringer
US20020156630A1 (en) * 2001-03-02 2002-10-24 Kazunori Hayashi Reading system and information terminal
US20020137553A1 (en) * 2001-03-22 2002-09-26 Kraemer Tim D. Distinctive ringing for mobile devices using digitized user recorded audio message
US20020196914A1 (en) * 2001-06-25 2002-12-26 Bellsouth Intellectual Property Corporation Audio caller identification
US7295656B2 (en) * 2001-06-25 2007-11-13 At&T Bls Intellectual Property, Inc. Audio caller identification
US20030016813A1 (en) * 2001-07-17 2003-01-23 Comverse Network Systems, Ltd. Personal ring tone message indicator
US20030061041A1 (en) * 2001-09-25 2003-03-27 Stephen Junkins Phoneme-delta based speech compression
US7483832B2 (en) * 2001-12-10 2009-01-27 At&T Intellectual Property I, L.P. Method and system for customizing voice translation of text to speech
US7315618B1 (en) * 2001-12-27 2008-01-01 At&T Bls Intellectual Property, Inc. Voice caller ID
US7418096B2 (en) * 2001-12-27 2008-08-26 At&T Intellectual Property I, L.P. Voice caller ID
US20070147601A1 (en) * 2002-01-18 2007-06-28 Tischer Steven N Audio alert system and method
US20030161454A1 (en) * 2002-02-26 2003-08-28 Shary Nassimi Self-contained distinctive ring, voice, facsimile, and internet device
US20040037403A1 (en) * 2002-03-29 2004-02-26 Koch Robert A. Audio delivery of caller identification information
US7136811B2 (en) * 2002-04-24 2006-11-14 Motorola, Inc. Low bandwidth speech communication using default and personal phoneme tables
US8086245B2 (en) * 2002-09-12 2011-12-27 Broadcom Corporation Advertising and controlling the advertisement of wireless hot spots
US7443967B1 (en) * 2003-09-29 2008-10-28 At&T Intellectual Property I, L.P. Second communication during ring suppression
US7881449B2 (en) * 2003-09-30 2011-02-01 At&T Intellectual Property Ii, L.P. Enhanced call notification service
US7366337B2 (en) * 2004-02-11 2008-04-29 Sbc Knowledge Ventures, L.P. Personal bill denomination reader
US7231019B2 (en) * 2004-02-12 2007-06-12 Microsoft Corporation Automatic identification of telephone callers based on voice characteristics
US20050243996A1 (en) * 2004-05-03 2005-11-03 Fitchmun Mark I System and method for providing particularized audible alerts
US7869588B2 (en) * 2004-05-03 2011-01-11 Somatek System and method for providing particularized audible alerts
US20110123017A1 (en) * 2004-05-03 2011-05-26 Somatek System and method for providing particularized audible alerts
US20060274144A1 (en) * 2005-06-02 2006-12-07 Agere Systems, Inc. Communications device with a visual ring signal and a method of generating a visual signal
US20100141467A1 (en) * 2007-02-07 2010-06-10 Gary John Kirkpatrick Apparatus for Providing Visual and/or Audible Alert Signals

Cited By (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10104226B2 (en) 2004-05-03 2018-10-16 Somatek System and method for providing particularized audible alerts
US7869588B2 (en) 2004-05-03 2011-01-11 Somatek System and method for providing particularized audible alerts
US20110123017A1 (en) * 2004-05-03 2011-05-26 Somatek System and method for providing particularized audible alerts
US20050243996A1 (en) * 2004-05-03 2005-11-03 Fitchmun Mark I System and method for providing particularized audible alerts
US8767953B2 (en) * 2004-05-03 2014-07-01 Somatek System and method for providing particularized audible alerts
US10694030B2 (en) 2004-05-03 2020-06-23 Somatek System and method for providing particularized audible alerts
US9544446B2 (en) 2004-05-03 2017-01-10 Somatek Method for providing particularized audible alerts
US11878169B2 (en) 2005-08-03 2024-01-23 Somatek Somatic, auditory and cochlear communication system and method
US10540989B2 (en) 2005-08-03 2020-01-21 Somatek Somatic, auditory and cochlear communication system and method
US20130023960A1 (en) * 2007-11-30 2013-01-24 Lockheed Martin Corporation Broad wavelength profile to homogenize the absorption profile in optical stimulation of nerves
US9011508B2 (en) * 2007-11-30 2015-04-21 Lockheed Martin Corporation Broad wavelength profile to homogenize the absorption profile in optical stimulation of nerves
US20100056950A1 (en) * 2008-08-29 2010-03-04 University Of Florida Research Foundation, Inc. System and methods for creating reduced test sets used in assessing subject response to stimuli
US9844326B2 (en) * 2008-08-29 2017-12-19 University Of Florida Research Foundation, Inc. System and methods for creating reduced test sets used in assessing subject response to stimuli
US20130054225A1 (en) * 2010-06-23 2013-02-28 Business Objects Software Limited Searching and matching of data
US8745077B2 (en) * 2010-06-23 2014-06-03 Business Objects Software Limited Searching and matching of data
US8321442B2 (en) * 2010-06-23 2012-11-27 Business Objects Software Limited Searching and matching of data
US20110320481A1 (en) * 2010-06-23 2011-12-29 Business Objects Software Limited Searching and matching of data
US20140207456A1 (en) * 2010-09-23 2014-07-24 Waveform Communications, Llc Waveform analysis of speech
EP2732641A1 (en) * 2011-07-11 2014-05-21 Med-El Elektromedizinische Geraete GmbH Test methods for cochlear implant stimulation strategies
US9162069B2 (en) 2011-07-11 2015-10-20 Med-El Elektromedizinische Geraete Gmbh Test method for cochlear implant stimulation strategies
EP2732641A4 (en) * 2011-07-11 2014-12-31 Med El Elektromed Geraete Gmbh Test methods for cochlear implant stimulation strategies
WO2013009805A1 (en) 2011-07-11 2013-01-17 Med-El Elektromedizinische Geraete Gmbh Test methods for cochlear implant stimulation strategies
US8840654B2 (en) * 2011-07-22 2014-09-23 Lockheed Martin Corporation Cochlear implant using optical stimulation with encoded information designed to limit heating effects
US20130023963A1 (en) * 2011-07-22 2013-01-24 Lockheed Martin Corporation Cochlear implant using optical stimulation with encoded information designed to limit heating effects
US9405742B2 (en) * 2012-02-16 2016-08-02 Continental Automotive Gmbh Method for phonetizing a data list and voice-controlled user interface
US20150012261A1 (en) * 2012-02-16 2015-01-08 Continetal Automotive Gmbh Method for phonetizing a data list and voice-controlled user interface
US20180239581A1 (en) * 2013-03-15 2018-08-23 Sonitum Inc. Topological mapping of control parameters
EP3056022A4 (en) * 2013-10-07 2017-05-31 Med-El Elektromedizinische Geraete GmbH Method for extracting temporal features from spike-like signals
US9913983B2 (en) 2013-10-25 2018-03-13 Cochlear Limited Alternate stimulation strategies for perception of speech
US10757516B2 (en) 2013-10-29 2020-08-25 Cochlear Limited Electromagnetic transducer with specific interface geometries
US20150289786A1 (en) * 2014-04-11 2015-10-15 Reginald G. Garratt Method of Acoustic Screening for Processing Hearing Loss Patients by Executing Computer-Executable Instructions Stored On a Non-Transitory Computer-Readable Medium
US10964179B2 (en) * 2014-05-16 2021-03-30 Not Impossible, Llc Vibrotactile control systems and methods
US20210366250A1 (en) * 2014-05-16 2021-11-25 Not Impossible, Llc Vibrotactile control systems and methods
US11625994B2 (en) * 2014-05-16 2023-04-11 Not Impossible, Llc Vibrotactile control systems and methods
US20230351868A1 (en) * 2014-05-16 2023-11-02 Not Impossible, Llc Vibrotactile control systems and methods
US10856091B2 (en) 2014-06-18 2020-12-01 Cochlear Limited Electromagnetic transducer with expanded magnetic flux functionality
US9800982B2 (en) 2014-06-18 2017-10-24 Cochlear Limited Electromagnetic transducer with expanded magnetic flux functionality
US9741342B2 (en) * 2014-11-26 2017-08-22 Panasonic Intellectual Property Corporation Of America Method and apparatus for recognizing speech by lip reading
US20160148616A1 (en) * 2014-11-26 2016-05-26 Panasonic Intellectual Property Corporation Of America Method and apparatus for recognizing speech by lip reading
US9911420B1 (en) * 2014-12-02 2018-03-06 Google Llc Behavior adjustment using speech recognition system
US20160155437A1 (en) * 2014-12-02 2016-06-02 Google Inc. Behavior adjustment using speech recognition system
US9570074B2 (en) * 2014-12-02 2017-02-14 Google Inc. Behavior adjustment using speech recognition system
US9899024B1 (en) 2014-12-02 2018-02-20 Google Llc Behavior adjustment using speech recognition system
US20160180155A1 (en) * 2014-12-22 2016-06-23 Fu Tai Hua Industry (Shenzhen) Co., Ltd. Electronic device and method for processing voice in video
US9901736B2 (en) * 2015-05-14 2018-02-27 Kuang-Chao Chen Cochlea hearing aid fixed on eardrum
US20160331965A1 (en) * 2015-05-14 2016-11-17 Kuang-Chao Chen Cochlea hearing aid fixed on eardrum
US20170098350A1 (en) * 2015-05-15 2017-04-06 Mick Ebeling Vibrotactile control software systems and methods
US10707836B2 (en) 2015-10-09 2020-07-07 Med-El Elektromedizinische Geraete Gmbh Estimation of harmonic frequencies for hearing implant sound coding using active contour models
CN108141201A (en) * 2015-10-09 2018-06-08 Med-El电气医疗器械有限公司 Estimated using movable contour model for the harmonic frequency of hearing implant acoustic coding
WO2017062701A1 (en) * 2015-10-09 2017-04-13 Med-El Elektromedizinische Geraete Gmbh Estimation of harmonic frequencies for hearing implant sound coding using active contour models
AU2016335681B2 (en) * 2015-10-09 2018-11-01 Med-El Elektromedizinische Geraete Gmbh Estimation of harmonic frequencies for hearing implant sound coding using active contour models
US10321247B2 (en) 2015-11-27 2019-06-11 Cochlear Limited External component with inductance and mechanical vibratory functionality
CN108883274A (en) * 2016-02-29 2018-11-23 领先仿生公司 The system and method for determining behavioral audiogram value for using the response of induction
CN108778410A (en) * 2016-03-11 2018-11-09 梅约医学教育与研究基金会 The cochlear stimulation system eliminated with surround sound and noise
US20170294086A1 (en) * 2016-04-12 2017-10-12 Andrew Kerdemelidis Haptic Communication Apparatus and Method
US10269223B2 (en) * 2016-04-12 2019-04-23 Andrew Kerdemelidis Haptic communication apparatus and method
US10854108B2 (en) * 2017-04-17 2020-12-01 Facebook, Inc. Machine communication system using haptic symbol set
US10943503B2 (en) 2017-04-17 2021-03-09 Facebook, Inc. Envelope encoding of speech signals for transmission to cutaneous actuators
US11011075B1 (en) 2017-04-17 2021-05-18 Facebook, Inc. Calibration of haptic device using sensor harness
US11355033B2 (en) 2017-04-17 2022-06-07 Meta Platforms, Inc. Neural network model for generation of compressed haptic actuator signal from audio input

Also Published As

Publication number Publication date
US20220370803A1 (en) 2022-11-24
WO2007019307A2 (en) 2007-02-15
US11878169B2 (en) 2024-01-23
US20200152223A1 (en) 2020-05-14
US10540989B2 (en) 2020-01-21
WO2007019307A9 (en) 2007-04-05
WO2007019307A3 (en) 2007-08-02
US20150194166A1 (en) 2015-07-09

Similar Documents

Publication Publication Date Title
US11878169B2 (en) Somatic, auditory and cochlear communication system and method
Denes et al. The speech chain
McDermott Music perception with cochlear implants: a review
Straatman et al. Advantage of bimodal fitting in prosody perception for children using a cochlear implant and a hearing aid
Lansford et al. A cognitive-perceptual approach to conceptualizing speech intelligibility deficits and remediation practice in hypokinetic dysarthria
Xu et al. Emotional expressions as communicative signals
CA2964906A1 (en) Systems, methods, and devices for intelligent speech recognition and processing
US9936308B2 (en) Hearing aid apparatus with fundamental frequency modification
Milczynski et al. Perception of Mandarin Chinese with cochlear implants using enhanced temporal pitch cues
Clarke et al. Pitch and spectral resolution: A systematic comparison of bottom-up cues for top-down repair of degraded speech
Turcott et al. Efficient evaluation of coding strategies for transcutaneous language communication
WO2021099834A1 (en) Scoring speech audiometry
Vojtech et al. The effects of modulating fundamental frequency and speech rate on the intelligibility, communication efficiency, and perceived naturalness of synthetic speech
Ball et al. Methods in clinical phonetics
Smith et al. Integration of partial information within and across modalities: Contributions to spoken and written sentence recognition
Ifukube Sound-based assistive technology
Ming et al. Efficient coding in human auditory perception
Rødvik et al. Consonant and vowel confusions in well-performing adult cochlear implant users, measured with a nonsense syllable repetition test
Alsius et al. Linguistic initiation signals increase auditory feedback error correction
Kuo Frequency importance functions for words and sentences in Mandarin Chinese: Implications for hearing aid prescriptions in tonal languages
Nieman The Effect of Breathy and Strained Vocal Quality on Vowel Perception
Lee Lombard Effect in Speech Production by Cochlear Implant Users: Analysis, Assessment and Implications
玉井湧太 et al. Demonstration of a novel speech-coding method for single-channel cochlear stimulation
Saimai et al. Speech synthesis algorithm for Thai cochlear implants
Butts Enhancing the perception of speech indexical properties of cochlear implants through sensory substitution

Legal Events

Date Code Title Description
AS Assignment

Owner name: SOMATEK, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FITCHMUN, MARK I.;REEL/FRAME:025612/0589

Effective date: 20101218

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION