Previous stepUpNext step
Step 6Tutorial overview

Step 7: Compare the generated pairs to real data

This step describes how to compare generated pairs with real data, and in particular, how to tune the generation process so that it produces pairs similar to some reference set. In our case, the reference set is a set of 224 pairs that were identified by as valid matches in the CDC IIS Duplication Test Case.

In general, this step involves an interative process in which pairs are generated, compared to the reference set, and the results are used to tune the generation process. We'll demonstrate this iterative approach in a sequence of iterations that are described below. The final results of the process are described below.

Final results


Previous stepUpNext step
Step 6Tutorial overview