Previous detail | Up | Next detail |
Detail 6.1.1 | Detail 6.1 | Detail 6.1.3 |
Edit distance computes the smallest number of character insertions, deletions and substitutions required to convert one string into another. There are two variations of this proceedure implemented in the ChoiceMaker libraries:
Insertions, deletions and substitutions are the only edit steps.
Transpositions of two characters count as one edit step.
Edit distance is an approximate string matching technique that works well both for English and non-English strings.
A EditDistancePredicate returns true if the edit distance between two strings is less than some (exclusive) maximum. A useful value for the maximum distance that works well in many applications is 3. (This value is represented by the manifest constant DEFAULT_DISTANCE). By default, the type TWO algorithm is used to calculate distances, although the type ONE variation may be specified in alternate constructor methods.
Examples:
Basic construction:
final int maxDistance = EditDistancePredicate.DEFAULT_DISTANCE; String name = "somePredicateName"; Method m = Cdc1Record.class.getMethod("getLastName", (Class<?>[])null); Predicate<Cdc1Record> p = new EditDistancePredicate<Cdc1Record>(name, m, maxDistance);
Jaro-Winkler is a similarity metric that was developed by the U.S. Census. It weights agreement at the start of two strings more heavily than agreement at the end of the strings. In essence, the algorithm does the following.
d = (1/3)*(c/s1) + (1/3)*(c/s2) + (1/3)*(t/c) where c = number of matching characters s1 = length of the first string s2 = length of the second string t = number of transpositions
A JaroWinklerPredicate returns true if the Jaro-Winkler similarity between two strings is greater than some (exclusive) minimum. A useful value for the minimum value that works well in many applications is 0.90. (This value is represented by the manifest constant DEFAULT_SIMILARITY).
Examples:
Basic construction:
String name = "somePredicateName"; Method m = Cdc1Record.class.getMethod("getLastName", (Class<?>[])null); Predicate<Cdc1Record> p = new JaroWinklerPredicate<Cdc1Record>(name, m);
Previous detail | Up | Next detail |
Detail 6.1.1 | Detail 6.1 | Detail 6.1.3 |