Any Master Data Management (MDM) system that masters customer/vendor/partner data requires address standardization to be effective in matching addresses. Standardizing address terms and enriching the address with additional information is key to greater automated intelligent match and merge of customers and reducing data steward effort.
The need of address standardization stems from multiple issues. Address data from different data sources may have different structures. Some may contain only one street address line, while others may contain up to three. Textual address data may contain abbreviations like “Ave for Avenue or Rd for Road”. Sometimes non-standard street names like those named after individuals may be spelled differently in different sources. A landmark may be included in some address sources like “Near Quincy Market”. The specificity of the address also calls for standardization. An address may include the suite, room number and wing in one source whereas another source may only contain the street address. Since some sources have human entered addresses, cities may be flipped for the borough or county like “Queens” instead of “New York City”. More serious data issues include the zip code being specified wrongly or blank or one source specifying a 5 digit zip code vs the other specifying a 9 digit one.
These variations in addresses makes address matching, a process that is already computation heavy due to the fuzzy matching nature of more cumbersome. Many times the address standardization itself will reduce two raw addresses to the same standardized address and an exact match is easily obtained. Without address standardization the number of potential matches increase significantly since we have to relax the match thresholds to account for these variations. With enrichment of addresses the extra components can help develop more effective address matching algorithms to improve match-merge efficiency.
Multiple options are available to standardize addresses. Some are libraries like the python postal-address (https://github.com/scaleway/postal-address) library that uses rules and algorithms to standardize addresses. These libraries however have no idea about the real world and lack detailed information about a city or a state. They’re mostly address cleansing tools. You can read the documentation of postal-address here https://postal-address.readthedocs.io/en/develop/ In contrast there are web services that offer APIs to standardize addresses. These web services actually provide better results because they’re geography aware and usually have some GIS backend source for verifying validity of addresses. SmartyStreets (https://smartystreets.com/) is an official CASS-certified (CASS is a certification system from the USPS for address validation. Read more https://smartystreets.com/articles/cass-processing). There are several benefits of using SmartyStreets for address standardization:- - It parses the address and provides additional address components like “Primary Number”, “Street Name”, “Street Suffix”, etc. which can then be used in more advanced address matching algorithms. - It corrects address errors like zip code mismatches, missing zip codes or other missing components. - It provides various types of APIs which supports single as well as bulk address standardization in python.
Address Parsed by Smarty Street
Additional Address component parse by SmartyStreet
Most MDM systems performing customer match and merge requires standardization of the addresses for effective address matching and reduced data steward effort. Web APIs like SmartyStreets are highly recommended for the address standardization process since it provides error correction, additional address components, easy to consume APIs and even international address validations. Look to using an address standardization service in your next DWH/Data Lake process if you are rolling your own MDM system. Incentius can help you set up a MDM system for your customer data and even a standalone address standardization process for other enterprise processes. Drop us a note at firstname.lastname@example.org