Skip to content
This repository has been archived by the owner on Dec 20, 2022. It is now read-only.

This repo is for the workshop on Euros for Doctors at DataHarvest/EIJC 2017. It uses Poland as an example country but with only two companies.

Notifications You must be signed in to change notification settings

correctiv/eurosfordoctors-dataharvest2017

Repository files navigation

Run your own Euros for Doctors

This repo is for the workshop on Euros for Doctors at DataHarvest/EIJC 2017. It uses Poland as an example country but with only two companies.

Install

Have Python 3.5+ and install requirements:

pip install -r requirements.txt

Usage

  • Download originals (likely PDF)
  • Setup some tables to track all companies and their data
  • Convert to proper tables, use this schema
  • Fix table structure
  • Place CSVs in data/pl/raw_csv
  • Run 01_load_data.ipynb to do all of the following:
    • Clean and standardise data:
      • clean names: order, extract title, split name, etc.
      • clean addresses
      • clean money
    • Review samples, repeat cleaning
    • Geocode
    • Deduplicate
    • Combine entities
  • Run 02_check_data.ipynb to do some more checking
  • Run 03_analysis.ipynb to get some analysis on your data

About

This repo is for the workshop on Euros for Doctors at DataHarvest/EIJC 2017. It uses Poland as an example country but with only two companies.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published