Merge – Combine Files
Files with different layouts and formats can be combined into a new data file. It's called a “merge”.
Merging can be useful for several reasons such as:
- finding duplicates between a number of files
- preparing data to be imported into a new database
- achieving economies of scale by performing the same tasks on multiple files at
the same time
During the merge, our software can handle quite a number of different situations in terms of “dirty data” and will perform certain standardizations automatically.
Sample:
| |
File - A: |
| |
|
| Name: |
Mr John Smith Jr |
| Addr: |
123 Any Street |
| CityProv: |
Van B.C. |
| Postal: |
v7v 7v7 |
| Tel: |
604-988-9999 |
|
| |
File - B: |
| |
|
| Name: |
Krieger, Axel |
| Addr: |
456 Any Street |
| City: |
N Vancouver |
| Prov: |
B C |
| Postal: |
V7J7J4 |
|
Merged Results:
| |
File - A: |
| |
|
| Name Prefix: |
Mr |
| First Name: |
John |
| Last Name: |
Smith |
| Name Suffix: |
Jr |
| Street: |
123 Any Street |
| City: |
Vancouver |
| Prov: |
BC |
| Postal: |
V7V 7V7 |
| Tel: |
604-988-9999 |
|
| |
File - B: |
| |
|
| Name Prefix: |
|
| First Name: |
Axel |
| Last Name: |
Krieger |
| Name Suffix: |
|
| Street: |
456 Any Street |
| City: |
North Vancouver |
| Prov: |
BC |
| Postal: |
V7J 7J4 |
| Tel: |
|
|
What changed?
File-A:
- Name is broken down into parts.
- “Van” becomes “Vancouver”... we have a list of common city abbreviations and misspellings which gets applied against your data.
- “B.C.” is recognized as a Canadian province and is stripped from the CityProv field. It is then standardized to the 2 digit province code “BC” and placed in the Prov field.
- The postal code is converted to upper case.
File-B:
- Name is broken down into parts.
- “N Vancouver” becomes “North Vancouver”.
- “B C” is recognized as a Canadian province and is standardized to 2 characters.
- The postal code is reformatted with a space.
Generally speaking, we usually run all files we receive through our
merge program; even if it's just one file. We've got all sorts of tricks up our
sleeves to help with problematic data.