Neural Prosthetic Engineering
N
B
Today- Oct. 26th
Question on projects?
Review
Telemetry
Formant based Speech Processing
Speech processing Strategies –continued
CA
Lessons learned
CIS
CIS based
Fine Structure
1
Neural Prosthetic Engineering
2
Review
Neural Prosthetic Engineering
3
telemetry
Neural Prosthetic Engineering
N
B
Data Telemetry – Inductive Link
Downlink using PWM scheme
Voltage at Implanted Coil
Recovered Data Generated Biphasic Current
Regulator, Rectifier
Envelope Detector/
Comprator
Load
PWM data
Modulated &
Amplified Signal Received Signal
Recovered Data
Recovered Power Data path
Power path
Neural Prosthetic Engineering
N
B
Backward telemetry
Gain = 1
+
- +
-
AMP
Vref COMP
schematic s1
s2
Vs s3
out
time Vref
s1
s2 s3
out Vs
Vin
Rectifier/
regulator Envelope
Detector
Load
PWM data
Modulated &
Amplified
Signal Received
Signal
Recovered Data
Recovered Power
AMP ON!
+ -
Neural Prosthetic Engineering
Strategies for Representing Speech Information
with Cochlear Implants
6
Neural Prosthetic Engineering
N
B
Making of voice: Vocal Tract
Hard Palate
Velum (Soft Palate)
Larynx
Glottis Vocal Folds
Alveolar Ridge
Nostril Lips
Nasal Cavity
Teeth Tongue
Vocal fold
Trachea
Ventricular fold
Aryepiglottic fold
• Sound source is in the Larynx (Vocal Fold)
• The vocal tract is the cavity where sound is filtered.
• The vocal tract consists of the laryngeal cavity, the pharynx, the oral cavity, and the nasal cavity.
• The average length of the vocal tract in adult humans is 17 cm (male) and 14 cm (female).
Neural Prosthetic Engineering
Vocal fold at low and high pitches
http://www.vowelsandconsonants3e.com/chapter_2.html https://www.youtube.com/watch?v=v9Wdf-RwLcs
120 Hz and 200 Hz
Neural Prosthetic Engineering
Sound: Voiced or unvoiced
• Voicing means air is forced into the vocal tract.
• All the vowels are voiced sounds.
• Consonants are voiced or unvoiced sounds.
• Voiced sounds are resonant (vibrant).
• Unvoiced sounds are noisy.
Neural Prosthetic Engineering
Articulation in Vocal Tract
• Place of articulation
• Where the vocal tract is shut off or narrowed
• Manner of articulation
• How the vocal tract is articulated
• Voicing
• Whether air is forced through
the larynx
Neural Prosthetic Engineering
N
B
Articulation for Vowels
Place of the articulation: High(u), Mid(o), Low(a)
Shape of the lips: Rounded (o) or not (i)
Wikipedia, Wikimedia 2016
Neural Prosthetic Engineering
N
B
Articulation for Consonants
Stop (plosive): A stop is a consonant in which airflow is completely blocked for a short time
[p], [t], [k] / [b], [d], [g]
Nasals: made by lowering the velum and allowing air to pass into the nasal cavity
[m], [n], [η]
Fricative: airflow is constricted but not cut off completely.
[s]/[z]
Affricative: Stops that are followed immediately by fricatives
[ts]/[dj]
Liquid –consonants in which the tongue produces a partial closure in the mouth, resulting in a resonant, vowel-like consonant,
[l], [r]
Glide –consonants with no stop or friction which consist of a glide (a quick, smooth movement) towards a following vowel.
[w], [y]
Neural Prosthetic Engineering
N
B
Formants in spectrogram
13
• Distinctive frequency components of the sound
• Peaks in the amplitude/frequency spectrum (spectrogram)
• The formant with the lowest frequency is called F1, the second F2, and the third F3.
• Most often the two first
formants, F1 and F2, are enough to disambiguate the vowel.
• An interactive demonstration of this can be found here.
• http://auditoryneuroscience.com/topics/two- formant-artificial-vowels
Neural Prosthetic Engineering
N
B
Formants of consonants
14
• Nasal and Liquid consonants have added formant (F3) at higher frequencies
• Plosives and Fricatives modify the placement of formants of the vowels
• Bilabial sounds (b, p) cause lowering of the formants
• Velar sounds (k and g) show F2 and F3 coming together
• Albeolar sounds (t and d) cause less systematic changes in neighboring vowel formants
Neural Prosthetic Engineering
N
B
Formants
• The component sounds that build up the phrase
"A bird in the hand is worth two in the bush".
http://www.vowelsandconsonants3e.com/chapter_7.html#
Neural Prosthetic Engineering
N
B
Frequencies of sounds
• C1 32.7 Hz (lowest C on a standard 88 key piano)
• C4 261.64 Hz (middle C on 88 key piano)
• C6 1046.50 Hz (Highest note reproducible by the average female human voice)
• C8 4186 Hz (highest note on 88 key piano) https://www.youtube.com/watch?v=qNf9nzvnd1k
Neural Prosthetic Engineering
Sound Waveforms:
Voiced or unvoiced
40 msec view
http://clas.mq.edu.au/speech/acoustics/waveforms/speech_waveforms.html
Neural Prosthetic Engineering
N
B
Vocoder
Vocoder (voice coder)
invented by Dudley in the 1930s
a means of reproducing an intelligible facsimile of a voice for recorded messages on telephone systems
Analysis (encoding) stage / decoding (synthesis) stage
A limited set of parameters from speech input in the analysis part transmitted to the receiver
The information rate required for transmission of the parameters is much less than that required for
transmission of the unprocessed speech signal
Neural Prosthetic Engineering
N
B
Model for Voice Coding
Vocal Tract
Random Noise Generator PeriodicWave
Generator
Voiced sound Unvoiced Sound
Fundamental Frequency
Neural Prosthetic Engineering
N
B
Channel Vocoder : analysis part
Bandpass
Filter Rectifier Lowpass
Filter A/D
Bandpass
Filter Rectifier Lowpass
Filter A/D
Pitch Detector
Voicing Detector
Multiplexer
Speech Channel
n channels
Fundamental Frequency
• Voicing detector determines whether the sound is voiced or not
• Pitch detector determines the frequency of the glottal openings for the voice sound
• Configuration of the vocal tract is found with a band of bandpass filters and envelopment detector (low pass filters).
• This analysis provides information of the vocal tract at 5-30 msec interval.
Neural Prosthetic Engineering
N
B
Channel Vocoder: synthesis part
D/A
D/A
Demultiplexer
Channel
Voicing Information
Pitch (Fund.Freq) F0
n channels
Bandpass Filter
Bandpass Filter Noise
Source Voice Source
• A synthesized speech signal is formed by summing the outputs of the band pass filters.
• Voicing information is a binary indication.
• Each output is a smoothed envelop energy.
Neural Prosthetic Engineering
Speech Processing Strategies
Neural Prosthetic Engineering
N
B
Formant based speech Processing Strategies
23
Vocoder theory and models played major roles in the early designs.
Fundamental Frequency (F0) and two formants (F1 and F2) are used
F0 is the fundamental frequency and determines the stimulation rate
F1 gives information about vowels
F2 gives information about consonants
Neural Prosthetic Engineering
N
B
Speech Processing Strategies – F0/F1/F2
300-1000 Hz
Filter Zero-Crossing Detector
Envelope Detector
270 Hz
Low-Pass Zero-Crossing Detector
1000-3000
Hz Filter Zero-Crossing Detector
Envelope Detector
Pulse Generator
Pulse Generator Pulse
Rate
AGC MIC
Automatic Gain Control
F1
A1
F0
F2
A2
(Apex)
(Base)
P.C.Loizou, (IEEE Engineering in Medicine and biology, 1999)
Neural Prosthetic Engineering
N
B
MPEAK Speech Processing Strategy
25
In addition to formant information, MPEAK extracts channels of higher frequency information from speech
MPEAK as well as F1/F2 strategies, tend to make errors in
formant extraction in noisy environment
Neural Prosthetic Engineering
N
B
Speech Processing Strategies – MPEAK
4-6 kHz
Filter Envelope Detector
800-4000
Hz Filter Zero-Crossing
Detector/Envelope Detector
Pulse Generator
Pulse Rate
AGC MIC
Automatic Gain Control
F0
2.8-4 kHz
Filter Envelope Detector 2-2.8 kHz
Filter Envelope Detector
300-1000
Hz Filter Zero-Crossing
Detector/Envelope Detector
270 Hz
Low-Pass Zero-Crossing
Detector/Envelope Detector
Electrodes
F1 A1 F2 A2
Electrode 7 Electrode 4 Electrode 1
P.C.Loizou, (IEEE Engineering in Medicine and biology, 1999)
Neural Prosthetic Engineering
N
B
Recent Speech Processing Strategies
27
Compressed Analog (CA)
Continuous Interleaved Sampling (CIS)
ACE and SPEAK (Cochlear)
Harmony HiRes Virtual Channels (Clarion)
Neural Prosthetic Engineering
N
B
Speech Processing Strategies - CA
AGC MIC
Automatic Gain Control
1 2
4
s(t) s’(t) 3
x(t) i(t)
Band-Pass Filter
Current Source
Frequency (kHz)
0.1 1.0 10
-12 -16 4
-8 -4 0
Magnitudein dB
1 2 3 4
B. Wilson et al., (Nature, 1991)
Neural Prosthetic Engineering
N
B Lessons learned
29
Lessons learned from the formant-based strategies and the CA strategy.
The amount of information perceived by CI users is much less.
Perception of electrical stimuli is different from acoustic stimuli.
Pitch saturation limit= typically around 300 pulses/s for electrical pulses or 300 Hz for
electrical sinusoids. Higher rates or frequencies do not produce increases in pitch.
In normal hearing, different pitches are heard
over much wider ranges of rates or frequencies
(up to ~5KHz), probably through combinations
of rate and place cues (‘Volley’ theory and Place
theory) .
Neural Prosthetic Engineering
N
B Theories
30
Place Code Theory
Time (Rate) Code Theory
Volley Theory
Wilipedia
File:Volley Principle of Hearing.png
Neural Prosthetic Engineering
N
B
CIS (Continuous Interleaved Sampling)
31
Pulsatile processing
Biphasic pulse trains are delivered the electrodes in a non-simultaneous (interleaved) pattern.
No Patent
Commercial devices use modified version of CIS
Neural Prosthetic Engineering
N
B
Speech Processing Strategies - CIS
Pre-amp
BPF 1 Rect./L
PF Nonlinea
r Map
BPF n Rect./L
PF Nonlinea
r Map
X
X
EL-1
EL-n Linear Filter
Band
Band
Envelope Compression Modulation
B. Wilson et al., (Nature, 1991)
Neural Prosthetic Engineering
N
B
Speech Processing Strategies – n of m ,SPEAK, ACE
MIC
Pre-amp
“n-of-m”
map : Select n
peaks from m bands in
a frame
X
X
V/I
V/I Band-Pass
Filters
Envelope Extraction
Amplitude Compression
Pulses
Current Source
Electrodes
m inputs n outputs
F.G.Zeng et al., (IEEE Reviews in Biomedical Engineering, 2008)
Neural Prosthetic Engineering
N
B
Speech Processing Strategies – n of m ,SPEAK, ACE
• The pre-processing is similar to the CIS strategy
• N-of-m strategy has greater number of bandpass filters
• The SPEAK strategy selects 6–8 largest peaks and has a fixed 250 Hz per channel rate
• The ACE strategy has a larger range of peak
selection (8-12) and higher rate (900-1200 Hz) than the SPEAK strategy
F.G.Zeng et al., (IEEE Reviews in Biomedical Engineering, 2008)
Neural Prosthetic Engineering
N
B
Summary
35
Speech Processing Strategies advance with time
Formant based
CA
CIS
Need to implement finer features (more detailed sounds)
• Tonal languages
• Music
Neural Prosthetic Engineering
N
B
Discussion: Fine Structure Representation
36
Typical Frequency range of CI frequency filters: 300-8000Hz
Normal audible frequency range: 20- 20,000Hz
Low frequency cues (20-50Hz) give prosody information (stress, syllabification)-”Envelope Cues”
Mid frequency cues (50-500Hz) give segmental information such as consonant manner, voicing, and intonation-”Periodicity Cues”
High frequency cues (600-10,000Hz) gives consonant place and vowel quality- “ Fine Structure Cues”
Advanced Bionics HiRes is an example of Speech Processing Strategy intended to provide better Fine Structure Cues
HiRes sample temporal fluctuations up to 2800 Hz across 16 channels
16 independent current sources enable simultaneous analog stimulation (SAS) as well as CIS
“current Steering” provides virtual channel capability (HiRes 120= 15 channels times 8 spectral bands per channel)
[1] HiResolutin Sound Processing, by Jill.B.Firszt, www.advancedbionics.com
[2} HiRes Fidelity 120 Sound Processing, Advanced Bionics Technical Report, www.advancedbionics.com
[3] Rosen, Temporal information in speech and its relevance for cochlear implants,
Cochlear Implnat: Acquisition and controversies, ed. B Fraysse, N. Couchard, pp3-26 (1989)
Neural Prosthetic Engineering
N
B
Related Videos
Hearing CI
https://www.youtube.com/watch?v=00WOao4kp wM
CI simulations
https://www.youtube.com/watch?v=iwbwhfCWs2 Q
A day of a CI user
https://www.youtube.com/watch?v=pk_7MVqpnI k
37