Skip to content

Source code for paper titled "From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research"

License

Notifications You must be signed in to change notification settings

IMI-COMBINE/template2graphs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research

This repository contains the Python scripts for converting Excel-based Lab Data Templates to property-based knowledge graphs to assist lab scientists in systematically navigating the experimental space.

Publication

If you like the work and want to use it, please cite our pre-print:

Gadiya, Y., Abbassi-Daloii, T., Ioannidis, V., Juty, N., Stie Kallesoe, C., Attwood, M., Kohler, M., Gribbon, P. and Witt, G., 2024. From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research. bioRxiv, pp.2024-07. https://doi.org/10.1101/2024.07.18.604030

More datasets and templates can be found on Zenodo:

Witt, G., Gadiya, Y., Abbassi-Daloii, T., Ioannidis, V., Juty, N., Kallesøe, C. S., Attwood, M., Kohler, M., & Gribbon, P. (2024). Supplementary data files for the manuscript titled "From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research" [Data set]. Zenodo. https://doi.org/10.5281/zenodo.12720580

How to use the repository?

The repository is a workflow to generate a knowledge graph from lab data template allowing for querying and visualizing experimental data in a meaningful and efficient manner.

Directory overview

├── GNA-NOW Graph schema.pdf
├── LICENSE
├── README.md
├── data
│   ├── exps
│   │   ├── dummy
│   │   │   ├── invitro_dummy_data.xlsx
│   │   │   └── invivo_dummy_data.xlsx
│   │   └── dummy2
│   │   │   ├── invitro_dummy2_data.xlsx
│   │   │   └── invivo_dummy2_data.xlsx
│   ├── mapping_files
│   │   ├── bacterial_strain.tsv
│   │   ├── biomaterials.tsv
│   │   ├── experimental_type.tsv
│   │   ├── gna_ontology.tsv
│   │   ├── medium.tsv
│   │   ├── result_unit.tsv
│   │   ├── roa.tsv
│   │   ├── sex.tsv
│   │   ├── species.tsv
│   │   ├── statistical_method.tsv
│   │   └── study_type.tsv
│   ├── node_dict.json
│   ├── processed_invitro_data.tsv
│   └── processed_invivo_data.tsv
├── requirements.txt
└── src
    ├── constants.py
    ├── data_preprocessing.py
    ├── main.py
    ├── nodes.py
    └── relations.py
  • The exps directory consists of a list of experiment directorys with pre-filled template for in-vitro and in-vivo studies. Here we just have "dummy" template, but have also shown "dummy2" as an example for adding future experimental runs.
  • The mapping directory consist of ontology mapped terms of the template. This is catered towards the experiments developed and performed in the project and can be easily adapted for other usecases using the ontology service OLS.
  • The src directory consists of all the python scripts required to transform the lab data template to knowledge graphs.

Prerequisite

The graph is being built on Neo4J. Hence, it is recommended to open a Neo4J instance Desktop version prior to running the scripts.

Step-by-step process

  1. Getting the base python environment ready
git clone https://github.com/IMI-COMBINE/template2graphs.git
cd template2graphs
conda create --name=template_graph python=3.9
conda activate template_graph
pip install -r requirements.txt
  1. Making the Lab data templates ready for graph ingestion
  • Ensure that all your experiments are located in a directory form under exps folder. This way each experiment can be cataloged into specific directory using FAIR data management guidelines
  • Go to the main.py file and either change the credential details to your login detials or add a new user with the credentials listed and provide admin access to the database.
cd src
python main.py

The Neo4J graph is now populated and can be explored by the users.

Funding

This work and the authors were primarily funded by the following projects: FAIRplus (IMI 802750), COMBINE (IMI 853967), and GNA NOW (IMI 853979).

About

Source code for paper titled "From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research"

Topics

Resources

License

Stars

Watchers

Forks

Languages