Date: Mon 18 Jul 1988 00:28-EDT From: AIList Moderator Nick Papadakis Reply-To: AIList@mc.lcs.mit.edu Us-Mail: MIT Mail Stop 38-390, Cambridge MA 02139 Phone: (617) 253-2737 Subject: AIList Digest V8 #15 To: AIList@mc.lcs.mit.edu Status: R AIList Digest Monday, 18 Jul 1988 Volume 8 : Issue 15 Today's Topics: More on the Soundex algorithm ---------------------------------------------------------------------- Date: 11 Jul 88 18:21:16 GMT From: sundc!netxcom!sdutcher@seismo.css.gov (Sylvia Dutcher) Subject: Re: Soundex algorithm In article <12520@sunybcs.UUCP> stewart@sunybcs.UUCP (Norman R. Stewart) writes: > > The source I've used for Soundex (developed by the >Remington Rand Corp., I believe), is > > Huffman, Edna K. (1972) Medical Record Management. > Berwyn, Illonois: Physicians' Record Company. I've written a soundex program based on the rules in Knuth's _Searching_and_ Sorting. These are also the rules used at the National Archives to sort census data. These rules differ slightly from the ones posted by Mr. Stewart. If you don't need to match anyone else's soundex, the most important rule is to be consistent. I will insert Knuth's rules below. >The algorithm is very simple, 1. Retain the first letter of the name, and drop all occurrances of a, e, h, i, o, u, w, y in other positions. >1: Assign number values to all but the first letter of the >word, using this table > 1 - B P F V 2 - C S K G J Q X Z > 3 - D T 4 - L > 5 - M N 6 - R > 7 - A E I O U W H Y 2. Assign number values as above, except for 7. >2: Apply the following rules to produce a code of one letter and > three numbers. > A: The first letter of the word becomes the initial character > in the code. > B: When two or more letters from the same group occur together > only the first is coded. > C: If two letters from the same group are seperated by an H or > a W, code only the first. 3. If two or more letters with the same code were adjacent in the original name (begore step 1), omit all but the first. > D: Group 7 letters are never coded (this does not include the > first letter in the word, which is always coded). 4. Convert to the form "letter, digit, digit, digit" by adding trailing zeros or dropping rightmost digits. BTW according to the reference in Knuth's book, this algorithm was developed by Margaret Odell and Robert Russell in 1922. >Norman R. Stewart Jr. * How much more suffering is >C.S. Grad - SUNYAB * caused by the thought of death >internet: stewart@cs.buffalo.edu * than by death itself! >bitnet: stewart@sunybcs.bitnet * Will Durant -- Sylvia Dutcher * The likeliness of things NetExpress Communications, Inc. * to go wrong is in direct 1953 Gallows Rd. * proportion to the urgency Vienna, Va. 22180 * with which they shouldn't. ------------------------------ Date: Wed, 13 Jul 88 10:35:06 EDT From: "William J. Joel" Subject: Re: Soundex algorithm /* The following is source code for a Soundex algorithm written in */ /* Waterloo Prolog. */ /* William J. Joel*/ /* Marist College */ /* Poughkeepsie, NY */ /* jzem@marist.bitnet */ key(a,-1). key(b,1). key(c,2). key(d,3). key(e,-1). key(f,1). key(g,2). key(h,-2). key(i,0). key(j,2). key(k,2). key(l,4). key(m,5). key(n,5). key(o,-1). key(p,1). key(q,2). key(r,6). key(s,2). key(t,3). key(u,-1). key(v,1). key(w,-3). key(x,2). key(y,-2). key(z,2). soundex(Name,Code)<- string(Name,Code1) & write(Code1) & soundex1(Code1,A.B.C.D.Rem) & string(Code,A.B.C.D.nil). soundex1(Head.Code1,Head.Code)<- keycode(Head.Code1,Code2) & write(Code2) & reduce(Code2,T.Code3) & write(T.Code3) & eliminate(Code3,Code4) & write(Code4) & append(Code4,0.0.0.nil,Code). reduce(X.(-2).X.Rem,List)<- reduce(X.Rem,List). reduce(X.X.Rem,List)<- reduce(X.Rem,List). reduce(X.Y.Z.Rem,X.List)<- ^X==Z & reduce(Y.Z.Rem,List). reduce(X.Y.Rem,X.List)<- ^X==Y & reduce(Y.Rem,List). reduce(X.nil,X.nil). reduce(nil,nil). eliminate(X.Rem,List)<- lt(X,0) & eliminate(Rem,List). eliminate(X.Rem,X.List)<- gt(X,0) & eliminate(Rem,List). eliminate(nil,nil). keycode(H.T,N.CodeList)<- key(H,N) & keycode(T,CodeList). keycode(nil,nil). append(Head.Tail,List,Head.NewList)<- append(Tail,List,NewList). append(nil,List,List). ------------------------------ Date: 13 Jul 88 17:02:31 GMT From: ihnp4!twitch!anuck!jrl@bloom-beacon.mit.edu (j.r.lupien) Subject: Re: Soundex algorithm >From article <1552@randvax.UUCP>, by leverich@randvax.UUCP (Brian Leverich): > Incidentally, does anyone know if there's been any genealogy applications > built using Prolog or the like? Looks like a logic programming approach > to maintaining relations between individuals might be a big win. -B Not really a C question anymore, but there is such a beast. The United Kingdom imigration and naturalization department has a Prolog implementation for their citizenship status analysis system. Apparently the rules involved are very complicated and perversely interconnected, having to do with date and place of birth, date and place of birth of both parents and their citizenship at the time of your birth as well as now, and probably the phase of the moon etc. I can't give you a reference for this offhand, since I heard about it in a keynote lecture by the MIT AI lab. They should be able to point you in the right direction. John Lupien twitch!mvuxe!anuxh!jrl ------------------------------ End of AIList Digest ********************