 Question on projects?

(1)

Neural Prosthetic Engineering

N

B

Today- Oct. 26th

 Question on projects?

 Review

 Telemetry

 Formant based Speech Processing

 Speech processing Strategies –continued

 CA

 Lessons learned

 CIS

 CIS based

 Fine Structure

1

(2)

2

Review

(3)

3

telemetry

(4)

N

B

Data Telemetry – Inductive Link

 Downlink using PWM scheme

Voltage at Implanted Coil

Recovered Data Generated Biphasic Current

Regulator, Rectifier

Envelope Detector/

Comprator

Load

PWM data

Modulated &

Amplified Signal Received Signal

Recovered Data

Recovered Power Data path

Power path

(5)

N

B

Backward telemetry

Gain = 1

+

- +

-

AMP

V_ref COMP

schematic s1

s2

V_s s3

out

time V_ref

s1

s2 s3

out V_s

Vin

Rectifier/

regulator Envelope

Detector

Load

PWM data

Modulated &

Amplified

Signal Received

Signal

Recovered Data

Recovered Power

AMP ON!

+ -

(6)

Strategies for Representing Speech Information

with Cochlear Implants

6

(7)

N

B

Making of voice: Vocal Tract

Hard Palate

Velum (Soft Palate)

Larynx

Glottis Vocal Folds

Alveolar Ridge

Nostril Lips

Nasal Cavity

Teeth Tongue

Vocal fold

Trachea

Ventricular fold

Aryepiglottic fold

• Sound source is in the Larynx (Vocal Fold)

• The vocal tract is the cavity where sound is filtered.

• The vocal tract consists of the laryngeal cavity, the pharynx, the oral cavity, and the nasal cavity.

• The average length of the vocal tract in adult humans is 17 cm (male) and 14 cm (female).

(8)

Vocal fold at low and high pitches

http://www.vowelsandconsonants3e.com/chapter_2.html https://www.youtube.com/watch?v=v9Wdf-RwLcs

120 Hz and 200 Hz

(9)

Sound: Voiced or unvoiced

• Voicing means air is forced into the vocal tract.

• All the vowels are voiced sounds.

• Consonants are voiced or unvoiced sounds.

• Voiced sounds are resonant (vibrant).

• Unvoiced sounds are noisy.

(10)

Articulation in Vocal Tract

• Place of articulation

• Where the vocal tract is shut off or narrowed

• Manner of articulation

• How the vocal tract is articulated

• Voicing

• Whether air is forced through

the larynx

(11)

N

B

Articulation for Vowels

 Place of the articulation: High(u), Mid(o), Low(a)

 Shape of the lips: Rounded (o) or not (i)

Wikipedia, Wikimedia 2016

(12)

N

B

Articulation for Consonants

 Stop (plosive): A stop is a consonant in which airflow is completely blocked for a short time

 [p], [t], [k] / [b], [d], [g]

 Nasals: made by lowering the velum and allowing air to pass into the nasal cavity

 [m], [n], [η]

 Fricative: airflow is constricted but not cut off completely.

 [s]/[z]

 Affricative: Stops that are followed immediately by fricatives

 [ts]/[dj]

 Liquid –consonants in which the tongue produces a partial closure in the mouth, resulting in a resonant, vowel-like consonant,

 [l], [r]

 Glide –consonants with no stop or friction which consist of a glide (a quick, smooth movement) towards a following vowel.

 [w], [y]

(13)

N

B

Formants in spectrogram

13

• Distinctive frequency components of the sound

• Peaks in the amplitude/frequency spectrum (spectrogram)

• The formant with the lowest frequency is called F1, the second F2, and the third F3.

• Most often the two first

formants, F1 and F2, are enough to disambiguate the vowel.

• An interactive demonstration of this can be found here.

• http://auditoryneuroscience.com/topics/two- formant-artificial-vowels

(14)

N

B

Formants of consonants

14

• Nasal and Liquid consonants have added formant (F3) at higher frequencies

• Plosives and Fricatives modify the placement of formants of the vowels

• Bilabial sounds (b, p) cause lowering of the formants

• Velar sounds (k and g) show F2 and F3 coming together

• Albeolar sounds (t and d) cause less systematic changes in neighboring vowel formants

(15)

N

B

Formants

• The component sounds that build up the phrase

"A bird in the hand is worth two in the bush".

http://www.vowelsandconsonants3e.com/chapter_7.html#

(16)

N

B

Frequencies of sounds

• C1 32.7 Hz (lowest C on a standard 88 key piano)

• C4 261.64 Hz (middle C on 88 key piano)

• C6 1046.50 Hz (Highest note reproducible by the average female human voice)

• C8 4186 Hz (highest note on 88 key piano) https://www.youtube.com/watch?v=qNf9nzvnd1k

(17)

Sound Waveforms:

Voiced or unvoiced

40 msec view

http://clas.mq.edu.au/speech/acoustics/waveforms/speech_waveforms.html

(18)

N

B

Vocoder

 Vocoder (voice coder)

 invented by Dudley in the 1930s

 a means of reproducing an intelligible facsimile of a voice for recorded messages on telephone systems

 Analysis (encoding) stage / decoding (synthesis) stage

 A limited set of parameters from speech input in the analysis part  transmitted to the receiver

 The information rate required for transmission of the parameters is much less than that required for

transmission of the unprocessed speech signal

(19)

N

B

Model for Voice Coding

Vocal Tract

Random Noise Generator PeriodicWave

Generator

Voiced sound Unvoiced Sound

Fundamental Frequency

(20)

N

B

Channel Vocoder : analysis part

Bandpass

Filter Rectifier Lowpass

Filter A/D

Bandpass

Filter Rectifier Lowpass

Filter A/D

Pitch Detector

Voicing Detector

Multiplexer

Speech Channel

 n channels

Fundamental Frequency

• Voicing detector determines whether the sound is voiced or not

• Pitch detector determines the frequency of the glottal openings for the voice sound

• Configuration of the vocal tract is found with a band of bandpass filters and envelopment detector (low pass filters).

• This analysis provides information of the vocal tract at 5-30 msec interval.

(21)

N

B

Channel Vocoder: synthesis part

D/A

Demultiplexer

Channel

Voicing Information

Pitch (Fund.Freq) F0



 n channels

Bandpass Filter

Bandpass Filter Noise

Source Voice Source

• A synthesized speech signal is formed by summing the outputs of the band pass filters.

• Voicing information is a binary indication.

• Each output is a smoothed envelop energy.

(22)

Speech Processing Strategies

(23)

N

B

Formant based speech Processing Strategies

23

 Vocoder theory and models played major roles in the early designs.

 Fundamental Frequency (F0) and two formants (F1 and F2) are used

 F0 is the fundamental frequency and determines the stimulation rate

 F1 gives information about vowels

 F2 gives information about consonants

(24)

N

B

Speech Processing Strategies – F0/F1/F2

300-1000 Hz

Filter Zero-Crossing Detector

Envelope Detector

270 Hz

Low-Pass Zero-Crossing Detector

1000-3000

Hz Filter Zero-Crossing Detector

Envelope Detector

Pulse Generator

Pulse Generator Pulse

Rate



AGC MIC

Automatic Gain Control

F1

A1

F0

F2

A2

(Apex)

(Base)

P.C.Loizou, (IEEE Engineering in Medicine and biology, 1999)

(25)

N

B

MPEAK Speech Processing Strategy

25

 In addition to formant information, MPEAK extracts channels of higher frequency information from speech

 MPEAK as well as F1/F2 strategies, tend to make errors in

formant extraction in noisy environment

(26)

N

B

Speech Processing Strategies – MPEAK

4-6 kHz

Filter Envelope Detector

800-4000

Hz Filter Zero-Crossing

Detector/Envelope Detector

Pulse Generator

Pulse Rate



AGC MIC

F0

2.8-4 kHz

Filter Envelope Detector 2-2.8 kHz

Filter Envelope Detector

300-1000

Hz Filter Zero-Crossing

270 Hz

Low-Pass Zero-Crossing

Electrodes

F1 A1 F2 A2

Electrode 7 Electrode 4 Electrode 1

P.C.Loizou, (IEEE Engineering in Medicine and biology, 1999)

(27)

N

B

Recent Speech Processing Strategies

27

 Compressed Analog (CA)

 Continuous Interleaved Sampling (CIS)

 ACE and SPEAK (Cochlear)

 Harmony HiRes Virtual Channels (Clarion)

(28)

N

B

Speech Processing Strategies - CA

AGC MIC

1 2

4

s(t) s’(t) 3

x(t) i(t)

Band-Pass Filter

Current Source

Frequency (kHz)

0.1 1.0 10

-12 -16 4

-8 -4 0

Magnitudein dB

1 2 3 4

B. Wilson et al., (Nature, 1991)

(29)

N

B Lessons learned

29

 Lessons learned from the formant-based strategies and the CA strategy.

 The amount of information perceived by CI users is much less.

 Perception of electrical stimuli is different from acoustic stimuli.

 Pitch saturation limit= typically around 300 pulses/s for electrical pulses or 300 Hz for

electrical sinusoids. Higher rates or frequencies do not produce increases in pitch.

 In normal hearing, different pitches are heard

over much wider ranges of rates or frequencies

(up to ~5KHz), probably through combinations

of rate and place cues (‘Volley’ theory and Place

theory) .

(30)

N

B Theories

30

 Place Code Theory

 Time (Rate) Code Theory

 Volley Theory

Wilipedia

File:Volley Principle of Hearing.png

(31)

N

B

CIS (Continuous Interleaved Sampling)

31

 Pulsatile processing

 Biphasic pulse trains are delivered the electrodes in a non-simultaneous (interleaved) pattern.

 No Patent

 Commercial devices use modified version of CIS

(32)

N

B

Speech Processing Strategies - CIS

Pre-amp

BPF 1 Rect./L

PF Nonlinea

r Map

BPF n Rect./L

PF Nonlinea

r Map



X

EL-1

EL-n Linear Filter

Band

Envelope Compression Modulation

B. Wilson et al., (Nature, 1991)

(33)

N

B

Speech Processing Strategies – n of m ,SPEAK, ACE

MIC

Pre-amp

 



“n-of-m”

map : Select n

peaks from m bands in

a frame

X

V/I

V/I Band-Pass

Filters

Envelope Extraction

Amplitude Compression

Pulses

Current Source

Electrodes

m inputs n outputs

F.G.Zeng et al., (IEEE Reviews in Biomedical Engineering, 2008)

(34)

N

B

Speech Processing Strategies – n of m ,SPEAK, ACE

• The pre-processing is similar to the CIS strategy

• N-of-m strategy has greater number of bandpass filters

• The SPEAK strategy selects 6–8 largest peaks and has a fixed 250 Hz per channel rate

• The ACE strategy has a larger range of peak

selection (8-12) and higher rate (900-1200 Hz) than the SPEAK strategy

F.G.Zeng et al., (IEEE Reviews in Biomedical Engineering, 2008)

(35)

N

B

Summary

35

 Speech Processing Strategies advance with time

 Formant based

 CA

 CIS

 Need to implement finer features (more detailed sounds)

• Tonal languages

• Music

(36)

N

B

Discussion: Fine Structure Representation

36

 Typical Frequency range of CI frequency filters: 300-8000Hz

 Normal audible frequency range: 20- 20,000Hz

 Low frequency cues (20-50Hz) give prosody information (stress, syllabification)-”Envelope Cues”

 Mid frequency cues (50-500Hz) give segmental information such as consonant manner, voicing, and intonation-”Periodicity Cues”

 High frequency cues (600-10,000Hz) gives consonant place and vowel quality- “ Fine Structure Cues”

 Advanced Bionics HiRes is an example of Speech Processing Strategy intended to provide better Fine Structure Cues

 HiRes sample temporal fluctuations up to 2800 Hz across 16 channels

 16 independent current sources enable simultaneous analog stimulation (SAS) as well as CIS

 “current Steering” provides virtual channel capability (HiRes 120= 15 channels times 8 spectral bands per channel)

 [1] HiResolutin Sound Processing, by Jill.B.Firszt, www.advancedbionics.com

 [2} HiRes Fidelity 120 Sound Processing, Advanced Bionics Technical Report, www.advancedbionics.com

 [3] Rosen, Temporal information in speech and its relevance for cochlear implants,

Cochlear Implnat: Acquisition and controversies, ed. B Fraysse, N. Couchard, pp3-26 (1989)