Skip to content
This repository has been archived by the owner on May 9, 2023. It is now read-only.

Latest commit

 

History

History
46 lines (28 loc) · 3.11 KB

README.md

File metadata and controls

46 lines (28 loc) · 3.11 KB

postal

container build status

Using

If my_address_file.csv is a file in the current working directory with an address column named address, then the DeGAUSS command:

docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/postal:0.1.4 my_address_file.csv

will produce my_address_file_postal_0.1.4.csv with added columns:

  • cleaned_address: address with non-alphanumeric characterics and excess whitespace removed (with dht::clean_address())
  • parsed.{address_component}: multiple columns, one for each parsed address component (e.g., parsed.road, parsed.state, parsed.house_number)
  • parsed_address: a "parsed" address created by pasting together available parsed.house_number, parsed.road, parsed.city, parsed.state, and the first five digits of the parsed.postcode address components

Optional Argument

After parsing, the parsed addresses can be expanded into several possible normalized addresses using libpostal. This can be useful for matching of these addresses with other messy, real world addresses.

If any value is provided as an argument (e.g., "expand"), then the DeGAUSS command:

docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/postal:0.1.4 my_address_file.csv expand

will produce my_address_file_postal_0.1.4_expand.csv with the above columns plus:

  • expanded_addresses: the expanded addresses for parsed_address

Because each parsed_address will likely result in more than one expanded_addresses, each input row is duplicated to accomodate several expanded_addresses. This means that when expanding addresses, the input CSV file is "expanded" too by duplicating the input rows.

Geomarker Methods

Input addresses are parsed/normalized using libpostal by:

  1. removing non-alphanumeric characters (except -) and excess whitespace (with dht::clean_address())
  2. parsing addresses into components using libpostal/scr/address_parser (a machine learning model trained on OpenStreetMap and OpenAddresses)
  3. (with an optional argument) expanding the parsed address into several possible normalized addresses

DeGAUSS Details

For detailed documentation on DeGAUSS, including general usage and installation, please see the DeGAUSS homepage.