Date: Tue 12 Jul 1988 00:35-EDT From: AIList Moderator Nick Papadakis Reply-To: AIList@mc.lcs.mit.edu Us-Mail: MIT Mail Stop 38-390, Cambridge MA 02139 Phone: (617) 253-2737 Subject: AIList Digest V8 #4 To: AIList@mc.lcs.mit.edu Status: RO AIList Digest Tuesday, 12 Jul 1988 Volume 8 : Issue 4 Today's Topics: Queries: Soundex algorithm (3 responses) Syllables of English (3 responses) ---------------------------------------------------------------------- Date: 8 Jul 88 17:18:30 GMT From: hubcap!shorne@gatech.edu (Scott Horne) Subject: Soundex algorithm Does anyone have a reference to info on the design of the Soundex algorithm? Source code (whatever language) would be helpful, too. Advance thanks. (BTW, it's probably best to post, as mail is at best shaky at this site.) --Scott Horne BITNET: PHORNE@CLEMSON (not working; please use another address) uucp: ....!gatech!hubcap!scarle!{hazel,citron,amber}!shorne (If that doesn't work, send to cchang@hubcap.clemson.edu) SnailMail: Scott Horne 812 Eleanor Dr. Florence, SC 29501 VoiceNet: 803 667-9848 ------------------------------ Date: 9 Jul 88 04:20:40 GMT From: wesommer@athena.mit.edu (William Sommerfeld) Subject: Re: Soundex algorithm Sorry for the length of this posting.. In article <2130@hubcap.UUCP> shorne@citron writes: >Does anyone have a reference to info on the design of the Soundex algorithm? This one is a somewhat superficial article; it contains a short Apple ][+ BASIC program which implements the soundex algorithm. @article{soundx, AUTHOR="Jacob R. Jacobs", TITLE="Finding Words That Sound Alike: The Soundex Algorithm", YEAR="1982", MONTH="March", JOURNAL="Byte" } Fortunately, it references the following, which talks about many algorithms other than just Soundex: @article{acmsoundex, AUTHOR="Patrick A. V. Hill and Geoff R. Dowling", TITLE="Approximate String Matching", JOURNAL="ACM Computing Surveys", VOLUME="12", MONTH="December", YEAR="1980" } >Source code (whatever language) would be helpful, too. You asked for it, you got it. Don't ask me why it's in BCPL; I didn't write it (but I'm going to have to convert it to C Real Soon Now (before DECSYSTEM-20 that it runs on turns into scrap metal). structure { SoundXCode^1^4 char } SoundX(Str) := valof { let Value := 0 let S := vec 40 CopyString(Str, S) RaiseString(S) Value<>String.C^1 let N := 2 and PreviousSoundX := -1 for i := 2 to S>>String.N do { let Ch := S>>String.C^i let ThisSoundX := selecton Ch into { default: 0 case $F: case $V: 1 case $C: case $G: case $J: case $K: case $Q: case $S: case $X: case $Z: 2 case $B: case $P: case $D: case $T: 3 case $L: 4 case $M: case $N: 5 case $R: 6 } if ThisSoundX=0 \ ThisSoundX=PreviousSoundX loop Value<From article <125@gollum.UUCP>, by rolandi@gollum.UUCP (Walter Rolandi): > > Can anyone provide me with a list of all the constituent syllables of English? I've read that there are more than 8000 such syllables (DeFrancis, _The Chinese Language: Fact and Fantasy_, U. of Hawaii). Good luck compiling a list! (N.B.: Those are phonetically distinct syllabes, not graphically distinct.) Incidentally, Japanese has just over 100 syllables. --Scott Horne BITNET: PHORNE@CLEMSON (not working; please use another address) uucp: ....!gatech!hubcap!scarle!{hazel,citron,amber}!shorne (If that doesn't work, send to cchang@hubcap.clemson.edu) SnailMail: Scott Horne 812 Eleanor Dr. Florence, SC 29501 VoiceNet: 803 667-9848 ------------------------------ Date: 7 Jul 88 16:07:17 GMT From: uhccux!stampe@humu.nosc.mil (David Stampe) Subject: Re: syllables of English If it's possible, rather than occurring, English syllables you want, you might look at diagrams for possible monosyllables, as in Zellig Harris, Methods in Structural Linguistics, U. Chicago Press, 195?. Stressed syllables in polysyllables are a subset of those in monosyllables. Unstressed syllables are a subset of stressed syllables, unless you take the consonantal nuclei in rubber, rubble, ribbon, rub'm to be distinct from the nuclei of brr, bull, bun, bum. Such diagrams are approximations, since the number of phonemes and especially the number of possible combinations into syllables differs somewhat among dialects and individuals. They usually admit hundreds of pronounceable but very peculiar syllables like trart, klilk, kwuw, smamp, oyj, awb. David (stampe@uhccux.uhcc.hawaii.edu) ------------------------------ Date: 8 Jul 88 19:22:34 GMT From: att!ihlpa!krista@bloom-beacon.mit.edu (Anderson) Subject: Re: syllables of English <> To Walter R.: I tried to send mail, but it bounced. I don't have a list of English syllables, but I do have a list of consonant clusters and vowels. If you want it, I'll post it; however, it is about 250 lines. Actually, I made the list when I was trying to understand why a Navajo friend was having trouble with some English words. I wrote all the English consonant clusters I could think of, including those that occur only in the *final* positions of words. I came up with about 197 consonants and consonant clusters! And the list is probably not be conclusive. Since Navajo has only about 35 consonants and clusters, of which about 15 intersect the English set, I gained a lot of sympathy for anybody learning English as a second language. I've heard that Polish has a lot of clusters; anybody know how many? Cherokee has only 13 consonants (no clusters), I seem to recall. Tlingit (related to Navajo) is reputed to have a great many phonemes (50 compared to English 35); but these figures do not include clusters. By the way, Cherokee is about the prettiest language I've ever heard. It was once a tonal language, but the tones lost their meaning in most words, at least in the western dialect. However, a light, musical quality remains. Shut me up, please! If you want the list, let me know. Krista Anderson, ihnp4!ihlpa!krista, but we may be shutting down email? ------------------------------ End of AIList Digest ********************