This paper describes the problem of merging multiple databases of information. It introduced and used the dbgen duplicate generation program.
This paper presents the Febrl data generator for personal information, which allows for the generation of realistic synthetic data based on frequency tables and attribute generation rules.
This patent describes a method of training a record linkage system from examples of record pairs that match each other, or are different from each other, or are simply too ambiguous or sparse to make a match or differ decision. Although framed in the language of machine learning, the essence of this method is a weighted comparison of field correlations between a pair of records. This patent, and the two patents that are described next, may be licensed from the Open Invention Network under terms that are designed to promote collaborative exchanges of intellectual property.
Blocking is a technique of generating a set (or a block, hence the name blocking) of potentially matching records to some input (or query) record. This patent describes describes an automated blocking technique that is used in online or interactive systems where a fast, comprehensive and limited response is important. The block of potential matches is then passed to a matching process to evaluate which records match the input record. Applications include but are not limited to individual matching such as student identification, householding, business matching, supply chain matching, financial matching, news or text matching, and other applications. Like the patent described above, this patent may be licensed from the Open Invention Network under terms that are designed to promote collaborative exchanges of intellectual property.
In contrast to online or interactive blocking, batch blocking takes a set of input records and generates sets (plural) of potentially matching records for the entire input set. Batch blocking is used in situations where throughout is important and interactive response is not relevant. Like the patents described above, this patent may be licensed from the Open Invention Network under terms that are designed to promote collaborative exchanges of intellectual property.
ChoiceMaker record matching sofware uses the techniques described in the three preceding patents.
Febrl record matching uses Felligi Sunter and a variety of other record matching techniques.
Last accessed for this reference on 2012-02-19.