Previous detailUpNext detail
Step 2 overviewStep 2.2

Step 2.1: Create value-frequency files

A value-frequency map is a comma-separated value (CSV) file that lists each value that may occur for a field and the relative number of times it will appear compared to other values for the field. A value-frequency map may be constructed from a list of records by using text utilities like cut, sort, and uniq. See the script create_CDC_value_frequency_files.sh in the src/main/scripts directory of the adg-cdc1-example project for details. The first few lines of the resulting value-frequency files are shown below.

==> DOB.csv <==
1091996,1
18980504,1
19950611,1
19951102,1
19960109,1

==> FirstName.csv <==
AARON,4
ABIGAIL,2
ADAM,5
ADRIANA,1
AHMED,3

==> LastName.csv <==
ABBOTT,2
ACOSTA,2
ADAMS,7
ALBERT,1
ALEXANDER,2

==> MiddleName.csv <==
,197
ABIGAIL,3
ALBERT,2
ALEJANDRO,1
ALEXANDER,6

==> MomFirst.csv <==
ALEXANDRA,2
ALICE,3
ALYSSA,2
AMANDA,6
AMY,6

==> MomLast.csv <==
ABBOTT,2
ADAMS,7
ALBERT,2
ALEXANDER,2
ALLEN,2

==> MomMaiden.csv <==
,83
ADAMS,2
ALEXANDER,1
ALLEN,7
ANDERSON,4

==> MomMiddle.csv <==
,208
AIKO,3
ALLISON,1
AMANDA,2
AN,2

==> Sex.csv <==
F,310
M,239
U,1

==> Suffix.csv <==
,531
II,2
III,5
IL,1
IV,1

==> VacCode.csv <==
1,13
10,36
2,57
20,86
21,14

==> VacDate.csv <==
19960305,2
19960306,1
19960321,1
19960411,1
19960518,1

==> VacMfr.csv <==
MSD,112
PMC,100
SKB,118
UNK,5
WAL,215

==> VacName.csv <==
DTAP,86
DTP,13
DTP-HIB,14
HEP-B,131
HIB-HbOC,132

Previous detailUpNext detail
Step 2 overviewStep 2.2