This library provides means to find an institution matching an affiliation string, a string consisting of for example a name, an address or similar information associated with the institution. Its main features are:
- Search for an institution by its name in a list based on Wikidata and GeoNames.
- Parse affiliation strings using grobid to retrieve the corresponding name and address.
- Geocode the parsed data to enrich it with geographical coordinates based on GeoNames.
To install instmatcher simply clone the git repository and install it using pip:
git clone https://github.com/qtux/instmatcher.git cd instmatcher pip install .
The match
function may be used to search for a matching institution for a given affiliation string.
Note that this example assumes a grobid server listening on http://0.0.0.0:8080.
import instmatcher
response = instmatcher.match('TU Berlin, Institute of Mathematics, Berlin, Germany')
print(response)
Depending on how well grobid is trained, executing the code above will most likely print:
{'name': 'Technical University of Berlin', 'lat': '52.511944444444', 'lon': '13.326388888889',...
In order to run the tests execute:
python setup.py test
In order to build the documentation install the required packages
pip install .[docs]
and use the Makefile in the docs folder to build the documentation.
- The list of institutions is queried from Wikidata (available under CC0).
- The list of institutions is enhanced using the reverse-geocoder library which contains GeoNames data (available under CC BY 3.0).
- The list of cities and the list of countries are taken from GeoNames (available under CC BY 3.0).
This software is licensed under the Apache License, Version 2.0.