Skip to content
This repository has been archived by the owner on Feb 22, 2020. It is now read-only.

Geocoding with DeGAUSS

Cole Brokamp edited this page Feb 5, 2019 · 5 revisions

Input File

The input file must be a CSV file with a column containing an address string. Other columns may be present and will be returned in the output file, but should be kept to a minimum to reduce file size.

An example input CSV file (called my_address_file.csv) might look like:

id,address
001,3333 Burnet Ave Cincinnati OH 45229
002,660 Lincoln Avenue Cincinnati OH 45229
003,2800 Winslow Avenue Cincinnati OH 45206

Address String Formatting

If your address components are in different columns, you will need to paste them together into a single string. Below are some tips that will help optimize geocoding accuracy and precision:

  • separate the different address components with a space
  • do not include apartment numbers or "second address line" (but its okay if you can't remove them)
  • spelling should be as accurate as possible, but the program does complete "fuzzy matching" so an exact match is not necessary
  • capitalization does not affect results
  • abbreviations may be used (i.e. St. instead of Street or OH instead of Ohio)
  • use arabic numerals instead of written numbers (i.e. 13 instead of thirteen)
  • do not try to geocode "P.O. box" addresses; these are really not addresses based on a phyiscal location and the geocoder will likely return incorrect matches
  • do not try to geocode addresses without a valid 5 digit zip code; this is used by the geocoder to complete its initial searches and if attempted, it will likely return incorrect matches
  • plus4 zip codes are ignored, but if they must be included make sure to separate them with a dash (i.e. 37209-0000 instead of 372090000)
  • address strings with out of order items could return NA (i.e. 3333 Burnet Ave Cincinnati 45229 OH)

Geocoding

After opening a shell, navigate to the directory where the CSV file to be geocoded is located. See here for help on navigating a filesystem using the command line.

For those unfamiliar with the command line, the simplest approach might be to put the file to be geocoded on the desktop and then navigate to your desktop folder after starting the Docker Quickstart Terminal with cd Desktop.

Run:

docker run --rm=TRUE -v "$PWD":/tmp degauss/geocoder <name-of-file> <address-column-name>

replacing <name-of-file> with the name of the CSV file to be geocoded and <address-column-name> with the name of the column in the CSV file that contains the address strings.

Continuing on our example address file above, we can use:

docker run --rm=TRUE -v "$PWD":/tmp degauss/geocoder my_address_file.csv address

To avoid headaches don't use a file with spaces in the filename or address column name. When issuing the geocoding docker command make sure to include the .csv filename extension even if they don't show up in your system file browser.

If run successfully, the shell should show a progress bar while geocoding and the geocoded file will be written to the current working directory named similarly to the input file but with _geocoded appended to the file name.

Don't forget that if calling this image for the first time, Docker will have to download the image before starting the geocoding process. Although it is quite a large download (~ 6 GB), this only has to happen one time.

Output File

Our output file is written to the same directory and in our example, will be called my_address_file_geocoded.csv:

"address","id","street","zip","city","state","lat","lon","score","prenum","number","precision"
"2800 Winslow Avenue Cincinnati OH 45206","003","Winslow Ave","45206","Cincinnati","OH",39.130586,-84.49631,0.941,"","2800","range"
"3333 Burnet Ave Cincinnati OH 45229","001","Burnet Ave","45229","Cincinnati","OH",39.14089,-84.500402,0.949,"","3333","range"
"660 Lincoln Avenue Cincinnati OH 45229","002","Lincoln Ave","45206",NA,NA,39.13282,-84.494724,0.805,"","660","range"

This output file will also contain diagnostic information on the precision and method used for geocoding each address. See the Interpreting Geocoding Results page for more details on interpreting the output. These geocodes can be used to create maps of subject locations or can be further passed onto other DeGAUSS containers for geomarker assessment.

Clone this wiki locally