org.apache.commons.codec.language
Class Nysiis

java.lang.Object
  extended by org.apache.commons.codec.language.Nysiis
All Implemented Interfaces:
Encoder, StringEncoder

public class Nysiis
extends java.lang.Object
implements StringEncoder

Encodes a string into a NYSIIS value. NYSIIS is an encoding used to relate similar names, but can also be used as a general purpose scheme to find word with similar phonemes.

NYSIIS features an accuracy increase of 2.7% over the traditional Soundex algorithm.

Algorithm description:

 1. Transcode first characters of name
   1a. MAC ->   MCC
   1b. KN  ->   NN
   1c. K   ->   C
   1d. PH  ->   FF
   1e. PF  ->   FF
   1f. SCH ->   SSS
 2. Transcode last characters of name
   2a. EE, IE          ->   Y
   2b. DT,RT,RD,NT,ND  ->   D
 3. First character of key = first character of name
 4. Transcode remaining characters by following these rules, incrementing by one character each time
   4a. EV  ->   AF  else A,E,I,O,U -> A
   4b. Q   ->   G
   4c. Z   ->   S
   4d. M   ->   N
   4e. KN  ->   N   else K -> C
   4f. SCH ->   SSS
   4g. PH  ->   FF
   4h. H   ->   If previous or next is nonvowel, previous
   4i. W   ->   If previous is vowel, previous
   4j. Add current to key if current != last key character
 5. If last character is S, remove it
 6. If last characters are AY, replace with Y
 7. If last character is A, remove it
 8. Collapse all strings of repeated characters
 9. Add original first character of name as first character of key
 

This class is immutable and thread-safe.

Since:
1.7
Version:
$Id: Nysiis.java 1380309 2012-09-03 18:53:34Z tn $
See Also:
NYSIIS on Wikipedia, NYSIIS on dropby.com, Soundex

Field Summary
private static char[] CHARS_A
           
private static char[] CHARS_AF
           
private static char[] CHARS_C
           
private static char[] CHARS_FF
           
private static char[] CHARS_G
           
private static char[] CHARS_N
           
private static char[] CHARS_NN
           
private static char[] CHARS_S
           
private static char[] CHARS_SSS
           
private static java.util.regex.Pattern PAT_DT_ETC
           
private static java.util.regex.Pattern PAT_EE_IE
           
private static java.util.regex.Pattern PAT_K
           
private static java.util.regex.Pattern PAT_KN
           
private static java.util.regex.Pattern PAT_MAC
           
private static java.util.regex.Pattern PAT_PH_PF
           
private static java.util.regex.Pattern PAT_SCH
           
private static char SPACE
           
private  boolean strict
          Indicates the strict mode.
private static int TRUE_LENGTH
           
 
Constructor Summary
Nysiis()
          Creates an instance of the Nysiis encoder with strict mode (original form), i.e.
Nysiis(boolean strict)
          Create an instance of the Nysiis encoder with the specified strict mode: true: encoded strings have a maximum length of 6 false: encoded strings may have arbitrary length
 
Method Summary
 java.lang.Object encode(java.lang.Object obj)
          Encodes an Object using the NYSIIS algorithm.
 java.lang.String encode(java.lang.String str)
          Encodes a String using the NYSIIS algorithm.
 boolean isStrict()
          Indicates the strict mode for this Nysiis encoder.
private static boolean isVowel(char c)
          Tests if the given character is a vowel.
 java.lang.String nysiis(java.lang.String str)
          Retrieves the NYSIIS code for a given String object.
private static char[] transcodeRemaining(char prev, char curr, char next, char aNext)
          Transcodes the remaining parts of the String.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CHARS_A

private static final char[] CHARS_A

CHARS_AF

private static final char[] CHARS_AF

CHARS_C

private static final char[] CHARS_C

CHARS_FF

private static final char[] CHARS_FF

CHARS_G

private static final char[] CHARS_G

CHARS_N

private static final char[] CHARS_N

CHARS_NN

private static final char[] CHARS_NN

CHARS_S

private static final char[] CHARS_S

CHARS_SSS

private static final char[] CHARS_SSS

PAT_MAC

private static final java.util.regex.Pattern PAT_MAC

PAT_KN

private static final java.util.regex.Pattern PAT_KN

PAT_K

private static final java.util.regex.Pattern PAT_K

PAT_PH_PF

private static final java.util.regex.Pattern PAT_PH_PF

PAT_SCH

private static final java.util.regex.Pattern PAT_SCH

PAT_EE_IE

private static final java.util.regex.Pattern PAT_EE_IE

PAT_DT_ETC

private static final java.util.regex.Pattern PAT_DT_ETC

SPACE

private static final char SPACE
See Also:
Constant Field Values

TRUE_LENGTH

private static final int TRUE_LENGTH
See Also:
Constant Field Values

strict

private final boolean strict
Indicates the strict mode.

Constructor Detail

Nysiis

public Nysiis()
Creates an instance of the Nysiis encoder with strict mode (original form), i.e. encoded strings have a maximum length of 6.


Nysiis

public Nysiis(boolean strict)
Create an instance of the Nysiis encoder with the specified strict mode:

Parameters:
strict - the strict mode
Method Detail

isVowel

private static boolean isVowel(char c)
Tests if the given character is a vowel.

Parameters:
c - the character to test
Returns:
true if the character is a vowel, false otherwise

transcodeRemaining

private static char[] transcodeRemaining(char prev,
                                         char curr,
                                         char next,
                                         char aNext)
Transcodes the remaining parts of the String. The method operates on a sliding window, looking at 4 characters at a time: [i-1, i, i+1, i+2].

Parameters:
prev - the previous character
curr - the current character
next - the next character
aNext - the after next character
Returns:
a transcoded array of characters, starting from the current position

encode

public java.lang.Object encode(java.lang.Object obj)
                        throws EncoderException
Encodes an Object using the NYSIIS algorithm. This method is provided in order to satisfy the requirements of the Encoder interface, and will throw an EncoderException if the supplied object is not of type String.

Specified by:
encode in interface Encoder
Parameters:
obj - Object to encode
Returns:
An object (or a String) containing the NYSIIS code which corresponds to the given String.
Throws:
EncoderException - if the parameter supplied is not of a String
java.lang.IllegalArgumentException - if a character is not mapped

encode

public java.lang.String encode(java.lang.String str)
Encodes a String using the NYSIIS algorithm.

Specified by:
encode in interface StringEncoder
Parameters:
str - A String object to encode
Returns:
A Nysiis code corresponding to the String supplied
Throws:
java.lang.IllegalArgumentException - if a character is not mapped

isStrict

public boolean isStrict()
Indicates the strict mode for this Nysiis encoder.

Returns:
true if the encoder is configured for strict mode, false otherwise

nysiis

public java.lang.String nysiis(java.lang.String str)
Retrieves the NYSIIS code for a given String object.

Parameters:
str - String to encode using the NYSIIS algorithm
Returns:
A NYSIIS code for the String supplied


commons-codec version 1.7-SNAPSHOT - Copyright © 2002-2013 - Apache Software Foundation