Computer Processing of Human Language
-Computational Linguistics : concerns with interactions of human language and computers 1. discourse analysis - analysis of written texts and spoken discourse
2. translation - of text and speech
3. communication - between computers and people 4. linguistic theories - modeling and testing
Frequency analysis, Concordances, and Collocations Frequency analysis : analysis of word frequency
Concordance : specifying the word location within the text and its surrounding context Frequency + location / context
Collocations : ckecks occurrence of two or more words within a short space
to find evidence that the presence of one word affects the occurrence of other words
Information Retrieval and Summarization
Information retrieval : search for items on a particular topic 1. web sites are returned
2. even ranked according to frequency
--> Data mining : advanced analysis (highly evolved retrieval system)
summarization : 1. eliminate redundancy
2. identify the most salient features of a body of information --> Concept vectors : list of meaningful keywords
indicator whether the content should be included in summarization
Spell checkers
-not perfect yet : if spell is right, it cannot detect wrong meaning in the context
Machine Translation
-put a written text in the source language and receive equivalent in target language -difficulties : 1. when no equivalent word in the target language
ex) idioms, metaphors, jargons
2. lexical and syntactic ambiguities, structural disparities, morphological complexities, cross-linguistic differences
Computers that talk and listen
-Ideal computer is multilingual, but do not yet exist
-Computational phonetics and phonology
speech recognition --> speech synthesis
signal is analyzed into phone / phoneme --> electronic speaker to pronounce word
-Computational morphology
computer needs to 1. break words correctly into morphemes, 2. understand the meaning,
3. and know where to put words in a sentence
-Computational syntax
ELIZA : earliest attempt at human-machine communication with syntax only typed into and printed output
Circuit Fix-It Shop : later advance
accept speech input and spoken response
parser : uses grammar to assign PS to a string of words uses garden path for ambiguity
try every parse simultaneously
-> but Human does better
-Computational pragmatics
-interaction of "real world" with the language system -situational knowledge is needed to disambiguate