org.apache.commons.codec.language
Class MatchRatingApproachEncoder

java.lang.Object
  extended by org.apache.commons.codec.language.MatchRatingApproachEncoder
All Implemented Interfaces:
Encoder, StringEncoder

public class MatchRatingApproachEncoder
extends java.lang.Object
implements StringEncoder

Match Rating Approach Phonetic Algorithm Developed by Western Airlines in 1977. This class is immutable and thread-safe.

Since:
1.8
See Also:
Wikipedia - Match Rating Approach

Field Summary
private static java.lang.String[] DOUBLE_CONSONANT
           
private static int EIGHT
          Constants used mainly for the min rating value.
private static int ELEVEN
          Constants used mainly for the min rating value.
private static java.lang.String EMPTY
           
private static int FIVE
          Constants used mainly for the min rating value.
private static int FOUR
          Constants used mainly for the min rating value.
private static int ONE
          Constants used mainly for the min rating value.
private static java.lang.String PLAIN_ASCII
          The plain letter equivalent of the accented letters.
private static int SEVEN
          Constants used mainly for the min rating value.
private static int SIX
          Constants used mainly for the min rating value.
private static java.lang.String SPACE
           
private static int THREE
          Constants used mainly for the min rating value.
private static int TWELVE
          Constants used mainly for the min rating value.
private static int TWO
          Constants used mainly for the min rating value.
private static java.lang.String UNICODE
          Unicode characters corresponding to various accented letters.
 
Constructor Summary
MatchRatingApproachEncoder()
           
 
Method Summary
(package private)  java.lang.String cleanName(java.lang.String name)
          Cleans up a name: 1.
 java.lang.Object encode(java.lang.Object pObject)
          Encodes an Object using the Match Rating Approach algo.
 java.lang.String encode(java.lang.String name)
          Encodes a String using the Match Rating Approach (MRA) algorithm.
(package private)  java.lang.String getFirst3Last3(java.lang.String name)
          Gets the first & last 3 letters of a name (if > 6 characters) Else just returns the name.
(package private)  int getMinRating(int sumLength)
          Obtains the min rating of the length sum of the 2 names.
 boolean isEncodeEquals(java.lang.String name1, java.lang.String name2)
          Determines if two names are homophonous via Match Rating Approach (MRA) algorithm.
(package private)  boolean isVowel(java.lang.String letter)
          Determines if a letter is a vowel.
(package private)  int leftToRightThenRightToLeftProcessing(java.lang.String name1, java.lang.String name2)
          Processes the names from left to right (first) then right to left removing identical letters in same positions.
(package private)  java.lang.String removeAccents(java.lang.String accentedWord)
          Removes accented letters and replaces with non-accented ascii equivalent Case is preserved.
(package private)  java.lang.String removeDoubleConsonants(java.lang.String name)
          Replaces any double consonant pair with the single letter equivalent.
(package private)  java.lang.String removeVowels(java.lang.String name)
          Deletes all vowels unless the vowel begins the word.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SPACE

private static final java.lang.String SPACE
See Also:
Constant Field Values

EMPTY

private static final java.lang.String EMPTY
See Also:
Constant Field Values

ONE

private static final int ONE
Constants used mainly for the min rating value.

See Also:
Constant Field Values

TWO

private static final int TWO
Constants used mainly for the min rating value.

See Also:
Constant Field Values

THREE

private static final int THREE
Constants used mainly for the min rating value.

See Also:
Constant Field Values

FOUR

private static final int FOUR
Constants used mainly for the min rating value.

See Also:
Constant Field Values

FIVE

private static final int FIVE
Constants used mainly for the min rating value.

See Also:
Constant Field Values

SIX

private static final int SIX
Constants used mainly for the min rating value.

See Also:
Constant Field Values

SEVEN

private static final int SEVEN
Constants used mainly for the min rating value.

See Also:
Constant Field Values

EIGHT

private static final int EIGHT
Constants used mainly for the min rating value.

See Also:
Constant Field Values

ELEVEN

private static final int ELEVEN
Constants used mainly for the min rating value.

See Also:
Constant Field Values

TWELVE

private static final int TWELVE
Constants used mainly for the min rating value.

See Also:
Constant Field Values

PLAIN_ASCII

private static final java.lang.String PLAIN_ASCII
The plain letter equivalent of the accented letters.

See Also:
Constant Field Values

UNICODE

private static final java.lang.String UNICODE
Unicode characters corresponding to various accented letters. For example: Ú is U acute etc...

See Also:
Constant Field Values

DOUBLE_CONSONANT

private static final java.lang.String[] DOUBLE_CONSONANT
Constructor Detail

MatchRatingApproachEncoder

public MatchRatingApproachEncoder()
Method Detail

cleanName

java.lang.String cleanName(java.lang.String name)
Cleans up a name: 1. Upper-cases everything 2. Removes some common punctuation 3. Removes accents 4. Removes any spaces.

API Usage

Consider this method private, it is package protected for unit testing only.

Parameters:
name - The name to be cleaned
Returns:
The cleaned name

encode

public final java.lang.Object encode(java.lang.Object pObject)
                              throws EncoderException
Encodes an Object using the Match Rating Approach algo. Method is here to satisfy the requirements of the Encoder interface Throws an EncoderException if input object is not of type java.lang.String.

Specified by:
encode in interface Encoder
Parameters:
pObject - Object to encode
Returns:
An object (or type java.lang.String) containing the Match Rating Approach code which corresponds to the String supplied.
Throws:
EncoderException - if the parameter supplied is not of type java.lang.String

encode

public final java.lang.String encode(java.lang.String name)
Encodes a String using the Match Rating Approach (MRA) algorithm.

Specified by:
encode in interface StringEncoder
Parameters:
name - String object to encode
Returns:
The MRA code corresponding to the String supplied

getFirst3Last3

java.lang.String getFirst3Last3(java.lang.String name)
Gets the first & last 3 letters of a name (if > 6 characters) Else just returns the name.

API Usage

Consider this method private, it is package protected for unit testing only.

Parameters:
name - The string to get the substrings from
Returns:
Annexed first & last 3 letters of input word.

getMinRating

int getMinRating(int sumLength)
Obtains the min rating of the length sum of the 2 names. In essence the larger the sum length the smaller the min rating. Values strictly from documentation.

API Usage

Consider this method private, it is package protected for unit testing only.

Parameters:
sumLength - The length of 2 strings sent down
Returns:
The min rating value

isEncodeEquals

public boolean isEncodeEquals(java.lang.String name1,
                              java.lang.String name2)
Determines if two names are homophonous via Match Rating Approach (MRA) algorithm. It should be noted that the strings are cleaned in the same way as encode(String).

Parameters:
name1 - First of the 2 strings (names) to compare
name2 - Second of the 2 names to compare
Returns:
true if the encodings are identical false otherwise.

isVowel

boolean isVowel(java.lang.String letter)
Determines if a letter is a vowel.

API Usage

Consider this method private, it is package protected for unit testing only.

Parameters:
letter - The letter under investiagtion
Returns:
True if a vowel, else false

leftToRightThenRightToLeftProcessing

int leftToRightThenRightToLeftProcessing(java.lang.String name1,
                                         java.lang.String name2)
Processes the names from left to right (first) then right to left removing identical letters in same positions. Then subtracts the longer string that remains from 6 and returns this.

API Usage

Consider this method private, it is package protected for unit testing only.

Parameters:
name1 - name2
Returns:

removeAccents

java.lang.String removeAccents(java.lang.String accentedWord)
Removes accented letters and replaces with non-accented ascii equivalent Case is preserved. http://www.codecodex.com/wiki/Remove_accent_from_letters_%28ex_.%C3%A9_to_e%29

Parameters:
accentedWord - The word that may have accents in it.
Returns:
De-accented word

removeDoubleConsonants

java.lang.String removeDoubleConsonants(java.lang.String name)
Replaces any double consonant pair with the single letter equivalent.

API Usage

Consider this method private, it is package protected for unit testing only.

Parameters:
name - String to have double consonants removed
Returns:
Single consonant word

removeVowels

java.lang.String removeVowels(java.lang.String name)
Deletes all vowels unless the vowel begins the word.

API Usage

Consider this method private, it is package protected for unit testing only.

Parameters:
name - The name to have vowels removed
Returns:
De-voweled word


commons-codec version 1.8 - Copyright © 2002-2013 - Apache Software Foundation