• 검색 결과가 없습니다.

 Question on projects?

N/A
N/A
Protected

Academic year: 2022

Share " Question on projects? "

Copied!
37
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

Neural Prosthetic Engineering

N

B

Today- Oct. 26th

 Question on projects?

 Review

 Telemetry

 Formant based Speech Processing

 Speech processing Strategies –continued

 CA

 Lessons learned

 CIS

 CIS based

 Fine Structure

1

(2)

Neural Prosthetic Engineering

2

Review

(3)

Neural Prosthetic Engineering

3

telemetry

(4)

Neural Prosthetic Engineering

N

B

Data Telemetry – Inductive Link

 Downlink using PWM scheme

Voltage at Implanted Coil

Recovered Data Generated Biphasic Current

Regulator, Rectifier

Envelope Detector/

Comprator

Load

PWM data

Modulated &

Amplified Signal Received Signal

Recovered Data

Recovered Power Data path

Power path

(5)

Neural Prosthetic Engineering

N

B

Backward telemetry

Gain = 1

+

- +

-

AMP

Vref COMP

schematic s1

s2

Vs s3

out

time Vref

s1

s2 s3

out Vs

Vin

Rectifier/

regulator Envelope

Detector

Load

PWM data

Modulated &

Amplified

Signal Received

Signal

Recovered Data

Recovered Power

AMP ON!

+ -

(6)

Neural Prosthetic Engineering

Strategies for Representing Speech Information

with Cochlear Implants

6

(7)

Neural Prosthetic Engineering

N

B

Making of voice: Vocal Tract

Hard Palate

Velum (Soft Palate)

Larynx

Glottis Vocal Folds

Alveolar Ridge

Nostril Lips

Nasal Cavity

Teeth Tongue

Vocal fold

Trachea

Ventricular fold

Aryepiglottic fold

• Sound source is in the Larynx (Vocal Fold)

• The vocal tract is the cavity where sound is filtered.

• The vocal tract consists of the laryngeal cavity, the pharynx, the oral cavity, and the nasal cavity.

• The average length of the vocal tract in adult humans is 17 cm (male) and 14 cm (female).

(8)

Neural Prosthetic Engineering

Vocal fold at low and high pitches

http://www.vowelsandconsonants3e.com/chapter_2.html https://www.youtube.com/watch?v=v9Wdf-RwLcs

120 Hz and 200 Hz

(9)

Neural Prosthetic Engineering

Sound: Voiced or unvoiced

• Voicing means air is forced into the vocal tract.

• All the vowels are voiced sounds.

• Consonants are voiced or unvoiced sounds.

• Voiced sounds are resonant (vibrant).

• Unvoiced sounds are noisy.

(10)

Neural Prosthetic Engineering

Articulation in Vocal Tract

• Place of articulation

• Where the vocal tract is shut off or narrowed

• Manner of articulation

• How the vocal tract is articulated

• Voicing

• Whether air is forced through

the larynx

(11)

Neural Prosthetic Engineering

N

B

Articulation for Vowels

 Place of the articulation: High(u), Mid(o), Low(a)

 Shape of the lips: Rounded (o) or not (i)

Wikipedia, Wikimedia 2016

(12)

Neural Prosthetic Engineering

N

B

Articulation for Consonants

 Stop (plosive): A stop is a consonant in which airflow is completely blocked for a short time

[p], [t], [k] / [b], [d], [g]

 Nasals: made by lowering the velum and allowing air to pass into the nasal cavity

[m], [n], [η]

 Fricative: airflow is constricted but not cut off completely.

[s]/[z]

 Affricative: Stops that are followed immediately by fricatives

[ts]/[dj]

 Liquid –consonants in which the tongue produces a partial closure in the mouth, resulting in a resonant, vowel-like consonant,

[l], [r]

 Glide –consonants with no stop or friction which consist of a glide (a quick, smooth movement) towards a following vowel.

[w], [y]

(13)

Neural Prosthetic Engineering

N

B

Formants in spectrogram

13

• Distinctive frequency components of the sound

• Peaks in the amplitude/frequency spectrum (spectrogram)

• The formant with the lowest frequency is called F1, the second F2, and the third F3.

• Most often the two first

formants, F1 and F2, are enough to disambiguate the vowel.

• An interactive demonstration of this can be found here.

http://auditoryneuroscience.com/topics/two- formant-artificial-vowels

(14)

Neural Prosthetic Engineering

N

B

Formants of consonants

14

• Nasal and Liquid consonants have added formant (F3) at higher frequencies

• Plosives and Fricatives modify the placement of formants of the vowels

• Bilabial sounds (b, p) cause lowering of the formants

• Velar sounds (k and g) show F2 and F3 coming together

• Albeolar sounds (t and d) cause less systematic changes in neighboring vowel formants

(15)

Neural Prosthetic Engineering

N

B

Formants

• The component sounds that build up the phrase

"A bird in the hand is worth two in the bush".

http://www.vowelsandconsonants3e.com/chapter_7.html#

(16)

Neural Prosthetic Engineering

N

B

Frequencies of sounds

• C1 32.7 Hz (lowest C on a standard 88 key piano)

• C4 261.64 Hz (middle C on 88 key piano)

• C6 1046.50 Hz (Highest note reproducible by the average female human voice)

• C8 4186 Hz (highest note on 88 key piano) https://www.youtube.com/watch?v=qNf9nzvnd1k

(17)

Neural Prosthetic Engineering

Sound Waveforms:

Voiced or unvoiced

40 msec view

http://clas.mq.edu.au/speech/acoustics/waveforms/speech_waveforms.html

(18)

Neural Prosthetic Engineering

N

B

Vocoder

 Vocoder (voice coder)

 invented by Dudley in the 1930s

 a means of reproducing an intelligible facsimile of a voice for recorded messages on telephone systems

 Analysis (encoding) stage / decoding (synthesis) stage

 A limited set of parameters from speech input in the analysis part  transmitted to the receiver

 The information rate required for transmission of the parameters is much less than that required for

transmission of the unprocessed speech signal

(19)

Neural Prosthetic Engineering

N

B

Model for Voice Coding

Vocal Tract

Random Noise Generator PeriodicWave

Generator

Voiced sound Unvoiced Sound

Fundamental Frequency

(20)

Neural Prosthetic Engineering

N

B

Channel Vocoder : analysis part

Bandpass

Filter Rectifier Lowpass

Filter A/D

Bandpass

Filter Rectifier Lowpass

Filter A/D

Pitch Detector

Voicing Detector

Multiplexer

Speech Channel

 n channels

Fundamental Frequency

• Voicing detector determines whether the sound is voiced or not

• Pitch detector determines the frequency of the glottal openings for the voice sound

• Configuration of the vocal tract is found with a band of bandpass filters and envelopment detector (low pass filters).

• This analysis provides information of the vocal tract at 5-30 msec interval.

(21)

Neural Prosthetic Engineering

N

B

Channel Vocoder: synthesis part

D/A

D/A

Demultiplexer

Channel

Voicing Information

Pitch (Fund.Freq) F0

n channels

Bandpass Filter

Bandpass Filter Noise

Source Voice Source

• A synthesized speech signal is formed by summing the outputs of the band pass filters.

• Voicing information is a binary indication.

• Each output is a smoothed envelop energy.

(22)

Neural Prosthetic Engineering

Speech Processing Strategies

(23)

Neural Prosthetic Engineering

N

B

Formant based speech Processing Strategies

23

Vocoder theory and models played major roles in the early designs.

Fundamental Frequency (F0) and two formants (F1 and F2) are used

F0 is the fundamental frequency and determines the stimulation rate

F1 gives information about vowels

F2 gives information about consonants

(24)

Neural Prosthetic Engineering

N

B

Speech Processing Strategies – F0/F1/F2

300-1000 Hz

Filter Zero-Crossing Detector

Envelope Detector

270 Hz

Low-Pass Zero-Crossing Detector

1000-3000

Hz Filter Zero-Crossing Detector

Envelope Detector

Pulse Generator

Pulse Generator Pulse

Rate





AGC MIC

Automatic Gain Control

F1

A1

F0

F2

A2

(Apex)

(Base)

P.C.Loizou, (IEEE Engineering in Medicine and biology, 1999)

(25)

Neural Prosthetic Engineering

N

B

MPEAK Speech Processing Strategy

25

 In addition to formant information, MPEAK extracts channels of higher frequency information from speech

 MPEAK as well as F1/F2 strategies, tend to make errors in

formant extraction in noisy environment

(26)

Neural Prosthetic Engineering

N

B

Speech Processing Strategies – MPEAK

4-6 kHz

Filter Envelope Detector

800-4000

Hz Filter Zero-Crossing

Detector/Envelope Detector

Pulse Generator

Pulse Rate



AGC MIC

Automatic Gain Control

F0

2.8-4 kHz

Filter Envelope Detector 2-2.8 kHz

Filter Envelope Detector

300-1000

Hz Filter Zero-Crossing

Detector/Envelope Detector

270 Hz

Low-Pass Zero-Crossing

Detector/Envelope Detector

Electrodes

F1 A1 F2 A2

Electrode 7 Electrode 4 Electrode 1

P.C.Loizou, (IEEE Engineering in Medicine and biology, 1999)

(27)

Neural Prosthetic Engineering

N

B

Recent Speech Processing Strategies

27

 Compressed Analog (CA)

 Continuous Interleaved Sampling (CIS)

 ACE and SPEAK (Cochlear)

 Harmony HiRes Virtual Channels (Clarion)

(28)

Neural Prosthetic Engineering

N

B

Speech Processing Strategies - CA

AGC MIC

Automatic Gain Control

1 2

4

s(t) s’(t) 3

x(t) i(t)

Band-Pass Filter

Current Source

Frequency (kHz)

0.1 1.0 10

-12 -16 4

-8 -4 0

Magnitudein dB

1 2 3 4

B. Wilson et al., (Nature, 1991)

(29)

Neural Prosthetic Engineering

N

B Lessons learned

29

 Lessons learned from the formant-based strategies and the CA strategy.

 The amount of information perceived by CI users is much less.

 Perception of electrical stimuli is different from acoustic stimuli.

 Pitch saturation limit= typically around 300 pulses/s for electrical pulses or 300 Hz for

electrical sinusoids. Higher rates or frequencies do not produce increases in pitch.

 In normal hearing, different pitches are heard

over much wider ranges of rates or frequencies

(up to ~5KHz), probably through combinations

of rate and place cues (‘Volley’ theory and Place

theory) .

(30)

Neural Prosthetic Engineering

N

B Theories

30

 Place Code Theory

 Time (Rate) Code Theory

 Volley Theory

Wilipedia

File:Volley Principle of Hearing.png

(31)

Neural Prosthetic Engineering

N

B

CIS (Continuous Interleaved Sampling)

31

 Pulsatile processing

 Biphasic pulse trains are delivered the electrodes in a non-simultaneous (interleaved) pattern.

 No Patent

 Commercial devices use modified version of CIS

(32)

Neural Prosthetic Engineering

N

B

Speech Processing Strategies - CIS

Pre-amp

BPF 1 Rect./L

PF Nonlinea

r Map

BPF n Rect./L

PF Nonlinea

r Map

X

X

EL-1

EL-n Linear Filter

Band

Band

Envelope Compression Modulation

B. Wilson et al., (Nature, 1991)

(33)

Neural Prosthetic Engineering

N

B

Speech Processing Strategies – n of m ,SPEAK, ACE

MIC

Pre-amp

 

“n-of-m”

map : Select n

peaks from m bands in

a frame

X

X

V/I

V/I Band-Pass

Filters

Envelope Extraction

Amplitude Compression

Pulses

Current Source

Electrodes

m inputs n outputs

F.G.Zeng et al., (IEEE Reviews in Biomedical Engineering, 2008)

(34)

Neural Prosthetic Engineering

N

B

Speech Processing Strategies – n of m ,SPEAK, ACE

• The pre-processing is similar to the CIS strategy

• N-of-m strategy has greater number of bandpass filters

• The SPEAK strategy selects 6–8 largest peaks and has a fixed 250 Hz per channel rate

• The ACE strategy has a larger range of peak

selection (8-12) and higher rate (900-1200 Hz) than the SPEAK strategy

F.G.Zeng et al., (IEEE Reviews in Biomedical Engineering, 2008)

(35)

Neural Prosthetic Engineering

N

B

Summary

35

 Speech Processing Strategies advance with time

 Formant based

 CA

 CIS

 Need to implement finer features (more detailed sounds)

• Tonal languages

• Music

(36)

Neural Prosthetic Engineering

N

B

Discussion: Fine Structure Representation

36

Typical Frequency range of CI frequency filters: 300-8000Hz

Normal audible frequency range: 20- 20,000Hz

Low frequency cues (20-50Hz) give prosody information (stress, syllabification)-”Envelope Cues”

Mid frequency cues (50-500Hz) give segmental information such as consonant manner, voicing, and intonation-”Periodicity Cues”

High frequency cues (600-10,000Hz) gives consonant place and vowel quality- “ Fine Structure Cues”

Advanced Bionics HiRes is an example of Speech Processing Strategy intended to provide better Fine Structure Cues

HiRes sample temporal fluctuations up to 2800 Hz across 16 channels

16 independent current sources enable simultaneous analog stimulation (SAS) as well as CIS

“current Steering” provides virtual channel capability (HiRes 120= 15 channels times 8 spectral bands per channel)

[1] HiResolutin Sound Processing, by Jill.B.Firszt, www.advancedbionics.com

[2} HiRes Fidelity 120 Sound Processing, Advanced Bionics Technical Report, www.advancedbionics.com

[3] Rosen, Temporal information in speech and its relevance for cochlear implants,

Cochlear Implnat: Acquisition and controversies, ed. B Fraysse, N. Couchard, pp3-26 (1989)

(37)

Neural Prosthetic Engineering

N

B

Related Videos

Hearing CI

https://www.youtube.com/watch?v=00WOao4kp wM

CI simulations

https://www.youtube.com/watch?v=iwbwhfCWs2 Q

A day of a CI user

https://www.youtube.com/watch?v=pk_7MVqpnI k

37

참조

관련 문서

 Gastrointestinal Series : Introduction of barium into the upper GI tract via mouth (upper GI series) or the lower GI tract via the rectum for the purposes of x-ray

It considers the energy use of the different components that are involved in the distribution and viewing of video content: data centres and content delivery networks

After first field tests, we expect electric passenger drones or eVTOL aircraft (short for electric vertical take-off and landing) to start providing commercial mobility

1 John Owen, Justification by Faith Alone, in The Works of John Owen, ed. John Bolt, trans. Scott Clark, "Do This and Live: Christ's Active Obedience as the

Finally this thesis provides an pragmatic analysis of the as constructions, which argues that the emergence of functional marking of as is attributable

Because a car black box device saves information about situations that occur while a car in question with the on-board black box travels to allow analysis of car accidents

To investigate the PL quenching sensitivity behaviors of the organic vapor, an organic vapor sensor based on rugate PSi filters was fabricated and tested

 if error can be distributed more equally over the range 0     , we may achieve a better overall compromise between ripple levels,. transition bandwidth