Previous detailUpNext detail
Detail 6.1.2Detail 6.1Detail 6.1.4

Detail 6.1.3: Phoneticization

Soundex

Soundex is a phoneticization algorithm first patented in 1918 by Robert C. Russel. It has been standardized since then and is now used extensively as technique for finding phonetic matches of English-language strings.

Soundex builds an N-character phonetic code for a String, where N is 4 by default. The slightly simplified procedure is:

  • Convert the String to upper case
  • Keep the first letter
  • Delete any vowels plus a few consonants (A, E, I, O, U, Y, W, H)
  • Map labial consonants (B, F, P, V) to 1
  • Map guttural and sibilant consonants (C, G, J, K, Q, S, X, Z) to 2
  • Map dental-mute consonants (D, T) to 3
  • Map the palatal-frictive consonant (L) to 4
  • Map the labio-nasal (M) and the lingua-nasal (N) consonants to 5
  • Map the dental-frictive consonant (R) to 6
  • Append zero (0) as necessary to pad the encoding to N characters.

Examples:

  • Mauricio Hernandez -> M620 H653
  • Peter Christen -> P360 C623
  • Agus Pudjijono -> A200 P325
  • Andrew Borthwick -> A536 B632

Basic construction:

  String name = "somePredicateName";
  Method m = Cdc1Record.class.getMethod("getLastName", (Class<?>[])null);
  Predicate<Cdc1Record> p = new SoundexPredicate<Cdc1Record>(name, m);

Nysiis

The New York State Identification and Intelligence System Phonetic Code, commonly known as NYSIIS, is a phonetic algorithm devised in 1970 that works well for English-language strings.

Nysiss is a much more complicated algorithm than Soundex than builds more precise phonetic codes. In brief, the procedure is:

  • Vowels are usually retained and are mapped to the letter A
  • Similar sounding sequences are mapped to the same code; for example:
    • PH is mapped to F
    • MAC and MC are mapped to MC

Examples:

  • Mauricio Hernandez -> MARAC HARNAND
  • Peter Christen -> PATAR CRASTAN
  • Agus Pudjijono -> AG PADJAJAN
  • Andrew Borthwick -> ANDR BARTWAC

Basic construction:

  String name = "somePredicateName";
  Method m = Cdc1Record.class.getMethod("getLastName", (Class<?>[])null);
  Predicate<Cdc1Record> p = new NysiisPredicate<Cdc1Record>(name, m);

Metaphone

Double metaphone


Previous detailUpNext detail
Detail 6.1.2Detail 6.1Detail 6.1.4