-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
30ad2c5
commit 19cfc03
Showing
1 changed file
with
106 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,45 +1,132 @@ | ||
# PROV-O-Matic | ||
### Python Provenance Tracer | ||
## Python Provenance Tracer | ||
|
||
**Author:** Rinke Hoekstra, VU University Amsterdam, <[email protected]> | ||
**Author:** Rinke Hoekstra, VU University Amsterdam, <mailto:rinke.hoekstra@vu.nl>/<mailto:hoekstra@uva.nl> | ||
|
||
Provenance is key in improving the transparency of scientific data publishing. But most people use multiple very different systems to manipulate and analyse data. The goal of the [Data2Semantics](http://www.data2semantics) [COMMIT/](http://www.commit-nl.nl) project is to use the [W3C PROV Standard](http://www.w3.org/TR/prov-overview/), that we helped develop, to integrate provenance tracking within and across these systems. | ||
|
||
PROV-O-Matic is a library that integrates with the [IPython interpreter](http://ipython.org/), an interpreter that works with all Python programs, and in particular the [IPython Notebook](http://ipython.org/notebook.html) environment. IPython notebook is a popular data science environment, similar to R. | ||
|
||
PROV-O-Matic does three things: | ||
PROV-O-Matic does the following: | ||
|
||
* It wraps Python functions and methods using a decorator, that builds an RDF PROV-O representation of the inputs and outputs of the respective function. The provenance trace is persistent within a Python session. And, | ||
* it integrates provenance tracing in IPython Notebook, a tool frequently used by scientists for analysing data, and reporting on it. All functions defined in the notebook are automatically decorated, and all executions of steps in the notebook are recorded as well (including changing variable values). And | ||
* it integrates a [PROV-O-Viz](http://provoviz.org) instance for interactive visualization of the provenance graph, and integrates it into IPython notebook. | ||
* Existing provenance traces can be loaded into the notebook, and PROV entities can be *revived* as Python variables. Use and manipulation of these new variables, will build a provenance trace that connects to the previous trace. | ||
|
||
### Requirements | ||
#### Credits | ||
|
||
* RDFLib >= v4.2-dev | ||
* IPython >= 2.0.0-dev | ||
* An internet connection (for connecting to http://provoviz.org/service), or a locally running PROV-O-Viz service. | ||
This work is supported by the Dutch national programme COMMIT/ under the Data2Semantics project. See <http://www.data2semantics.org> and <http://www.commit-nl.nl> | ||
|
||
This is all still quite experimental. You're probably safest off if you set everything up in a separate virtualenv, running PROV-O-Matic directly from the source distribution. | ||
## Download | ||
|
||
### Usage | ||
PROV-O-Matic can be downloaded from GitHub at: <https://github.com/Data2Semantics/prov-o-matic> | ||
|
||
Start an IPython notebook from inside the `src` directory of the PROV-O-Matic source distribution. | ||
#### License | ||
|
||
Load the IPython extension in the usual way (provided that `provomatic.extension` is in your python path), by typing the following in your IPython Notebook: | ||
PROV-O-Matic is released under the MIT License. See LICENCE.txt for details. | ||
|
||
```%load_ext provomatic.extension``` | ||
## Installation | ||
|
||
Provenance tracking is automatic once you load the extension. | ||
To start, you will need `git`, `Python 2.7`, `pip` and `virtualenv` (MacOS users, please use [Homebrew](http://brew.sh/) to install a clean Python environment). | ||
|
||
You can visualize using [PROV-O-Viz](http://provoviz.org) by calling `view_prov()` | ||
Startup your favourite terminal environment (we'll be using forward slashes, sorry Windows users) | ||
|
||
If you want to connect to a locally running PROV-O-Viz service, you can set its URL using `set_provoviz_url()`. | ||
##### Cloning PROV-O-Matic | ||
|
||
### Credits | ||
Do a *recursive* clone of the PROV-O-Matic git repository to a directory of your choice, e.g. `/example/provomatic`: | ||
|
||
This work is supported by the Dutch national programme COMMIT/ under the Data2Semantics project. See <http://www.data2semantics.org> and <http://www.commit-nl.nl> | ||
git clone https://github.com/Data2Semantics/prov-o-matic.git /example/provomatic --recursive | ||
|
||
This will create the `/example/provomatic` directory, if needed, and automatically checks out the latest version of PROV-O-Matic, and the git submodule for PROV-O-Viz. | ||
|
||
(Obviously, if you clone to a different directory every occurrence of `/example/provomatic` must be replaced with the proper path) | ||
|
||
Enter the directory | ||
|
||
cd /example/provomatic | ||
|
||
##### Setup the Virtualenv environment | ||
|
||
Initialize a virtual Python environment | ||
|
||
virtualenv . | ||
|
||
Start your favourite text-editor and open the `activate-replacement` file in the `/example/provomatic` directory. Make the following changes. | ||
|
||
**Step 1**: Set the `VIRTUALENV` variable to point to the root directory of the provomatic installation. In our case, replace the line | ||
|
||
VIRTUAL_ENV="/absolute/path/to/your/provomatic/clone/directory" | ||
|
||
with | ||
|
||
VIRTUAL_ENV="/example/provomatic/" | ||
|
||
**Step 2**: Set the `PYTHONPATH` variable to also point to the `lib/provoviz` directory in the directory of the provomatic installation. In our case, replace the line | ||
|
||
PYTHONPATH="$PYTHONPATH:/absolute/path/to/your/provomatic/clone/directory/lib/provoviz/src" | ||
|
||
with | ||
|
||
PYTHONPATH="$PYTHONPATH:/example/provomatic/lib/provoviz/src" | ||
|
||
Save the file, and overwrite the `bin/activate` file with the edited `activate-replacement` file: | ||
|
||
cp activate-replacement bin/activate | ||
|
||
You can now safely activate the virtual environment: | ||
|
||
source bin/activate | ||
|
||
##### Install the Necessary Libraries | ||
|
||
The `requirements.txt` file lists all required libraries. Use | ||
|
||
pip -r requirements.txt | ||
|
||
from your activated virtualenv to install the dependencies. | ||
|
||
The full list of requirements is as follows: | ||
|
||
Jinja2==2.7.3 | ||
MarkupSafe==0.23 | ||
SPARQLWrapper==1.6.4 | ||
backports.ssl-match-hostname==3.4.0.2 | ||
certifi==14.05.14 | ||
chardet==2.3.0 | ||
decorator==3.4.0 | ||
gnureadline==6.3.3 | ||
html5lib==0.999 | ||
ipython==2.3.0 | ||
isodate==0.5.0 | ||
networkx==1.9.1 | ||
numpy==1.9.0 | ||
pandas==0.14.1 | ||
pyparsing==1.5.7 | ||
python-dateutil==2.2 | ||
pytz==2014.7 | ||
pyzmq==14.3.1 | ||
rdfextras==0.4 | ||
rdflib==4.1.2 | ||
requests==2.4.3 | ||
six==1.8.0 | ||
tornado==4.0.2 | ||
wsgiref==0.1.2 | ||
|
||
##### Ready to go! | ||
|
||
You can now start the IPython notebook by entering the `src` directory | ||
|
||
cd src | ||
|
||
and running | ||
|
||
ipython notebook | ||
|
||
This should open your browser at the address `http://127.0.0.1:8888/tree | ||
|
||
Open the `PROV-O-Matic Examples` notebook and follow the instructions. This should give you enough information to use PROV-O-Matic in your own notebooks. | ||
|
||
Have fun! | ||
|
||
### License | ||
|
||
See LICENCE.txt |