A value-frequency map is a comma-separated value (CSV) file that lists each value that may occur for a field and the relative number of times it will appear compared to other values for the field. A value-frequency map may be constructed from a list of records by using text utilities like cut, sort, and uniq. See the script create_CDC_value_frequency_files.sh in the src/main/scripts directory of the adg-cdc1-example project for details. The first few lines of the resulting value-frequency files are shown below.
==> DOB.csv <==
1091996,1
18980504,1
19950611,1
19951102,1
19960109,1
==> FirstName.csv <==
AARON,4
ABIGAIL,2
ADAM,5
ADRIANA,1
AHMED,3
==> LastName.csv <==
ABBOTT,2
ACOSTA,2
ADAMS,7
ALBERT,1
ALEXANDER,2
==> MiddleName.csv <==
,197
ABIGAIL,3
ALBERT,2
ALEJANDRO,1
ALEXANDER,6
==> MomFirst.csv <==
ALEXANDRA,2
ALICE,3
ALYSSA,2
AMANDA,6
AMY,6
==> MomLast.csv <==
ABBOTT,2
ADAMS,7
ALBERT,2
ALEXANDER,2
ALLEN,2
==> MomMaiden.csv <==
,83
ADAMS,2
ALEXANDER,1
ALLEN,7
ANDERSON,4
==> MomMiddle.csv <==
,208
AIKO,3
ALLISON,1
AMANDA,2
AN,2
==> Sex.csv <==
F,310
M,239
U,1
==> Suffix.csv <==
,531
II,2
III,5
IL,1
IV,1
==> VacCode.csv <==
1,13
10,36
2,57
20,86
21,14
==> VacDate.csv <==
19960305,2
19960306,1
19960321,1
19960411,1
19960518,1
==> VacMfr.csv <==
MSD,112
PMC,100
SKB,118
UNK,5
WAL,215
==> VacName.csv <==
DTAP,86
DTP,13
DTP-HIB,14
HEP-B,131
HIB-HbOC,132