Previous detailUpNext detail
Step 6 overviewDetail 6.2

Detail 6.1: Available predicates

Java representation

A correlation evaluator is an object that computes a correlation signature for a pair of records. In turn, a correlation signature is String that represents the degree of correlation between two records.

public interface CorrelationEvaluator<R> {

  String computeCorrelationSignature(CMPair<? extends R> pair)
    throws EvaluationException;

}

A systematic way of breaking down the degree of correlation between records is to analyze types of correlation between the fields of two records. A predicate is a simple boolean test on a pair of records that checks some condition on one or a small number of fields of the records. For example, a predicate might check whether two fields agree exactly or approximately or not at all.

It is convenient to be able to assign names to predicates. Names are typically used to identify predicates when they grouped together into collections such as ordered sets.

/**
 * @param <R>
 *            the record type
 */
public interface Predicate<R> {

  String getName();

  boolean evaluate(CMPair<? extends R> pair) throws EvaluationException;

}

A simple way to form a correlation evaluator is to create a LinkedHashSet from a regular set of predicates. A LinkedHashSet maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, making it reproducible, so that a signature may be computed just by iterating through the predicates contained in the set.

/**
 * @param <R>
 *            the record type to be evaluated (for example, Person)
 */
public class SimpleCorrelationEvaluator<R> implements CorrelationEvaluator<R> {

  public static final char ZERO = '0';
  public static final char ONE = '1';
  private final Set<Predicate<R>> predicates;

  /**
   * Constructs an evaluator from the specified set of iterators. The
   * iteration order of the predicates will be preserved by this evaluator.
   */
  public SimpleCorrelationEvaluator(Set<Predicate<R>> predicates) {
    this.predicates = new LinkedHashSet<Predicate<R>>(predicates);
  }

  public Set<Predicate<R>> getPredicates() {
    return Collections.unmodifiableSet(getPredicatesInternal());
  }

  protected Set<Predicate<R>> getPredicatesInternal() {
    return this.predicates;
  }

  public String computeCorrelationSignature(CMPair<? extends R> pair)
      throws EvaluationException {
    StringBuilder sb = new StringBuilder();
    for (Predicate<R> p : getPredicatesInternal()) {
      boolean b = p.evaluate(pair);
      char c = b ? ONE : ZERO;
      sb.append(c);
    }
    return sb.toString();
  }

}

Summary of available predicates

The definition of correlation evaluators and predicates is intentionally very general. This project uses ChoiceMaker libraries to implement correlation evaluators and predicates, but the interfaces are designed to accomodate other implementations as well.

Here is a summary of predicates offered by ChoiceMaker and other libraries.


Previous detailUpNext detail
Step 6 overviewDetail 6.2