OMAMO is a tool that suggests the best model organism to study a biological process based on orthologous relationship between a species and human.
The user can consider several species as potential model organisms and the algorithm will rank them and report the output for a given biological process (searched as a GO term or a GO ID) is produced in the dataframe format.
Following Python packages are needed: numpy, matplotlib, pickle and pandas. Besides, you need to install pyOMA.
Firstly, download the OMA dataset:
wget https://omabrowser.org/All/OmaServer.h5 -O data/OmaServer.h5 #caution: 94GB
Secondly, using the file data/oma-species.txt
find the five-letter UniProt code for species of interest. For example, consider three species Dicdyostelium discodeium , Neurospora crassa and Schizosaccharomyces pombe. Their UniProt codes are DICDI
, NEUCR
and SCHPO
, respectively.
Install omamo. The easiest way is through PyPI:
pip install omamo
Once the package is installed, you should be able to run omamo
as a command. With omamo -h
see the available options:
usage: omamo [-h] --db DB [--query QUERY] [--ic IC] [--h5-out H5_OUT] [--tsv-out TSV_OUT] --models MODELS [MODELS ...]
Run omamo for a set of model organisms
optional arguments:
-h, --help show this help message and exit
--db DB Path to the HDF5 database
--query QUERY Name of the Query species, defaults to HUMAN
--ic IC Path to the information content file (tsv format)
--h5-out H5_OUT Path to the HDF5 output file. If omitted, not stored in this format
--tsv-out TSV_OUT Path to the TSV output file. If omitted, not stored in this format
--models MODELS [MODELS ...]
List of model species, or a path to a txt file with the model species
In order to create the omamo data for Dicdyostelium discodeium, Neurospora crassa and Schizosaccharomyces pombe, we would run omamo with the following parameters:
omamo --db OmaServer.h5 --query HUMAN --tsv-out omamo_output_df.csv --models DICDI NEUCR SCHPO
You might face an error about OSError: ``OmaServer.h5.idx`` does not exist
and pyoma.browser.db.DBConsistencyError: Suffix index for protein sequences is not available
which you can ignore them.
Finally, the output data frame is ready as a TSV file omamo_output_df.csv
. For example, for the GO ID of GO0000472
, "endonucleolytic cleavage to generate mature 5'-end of SSU-rRNA", OMAMO provides the following ranking for potential model organisms:
head -n 1 omamo_output_df.csv > ranked_organisms.csv
awk '$1 == 472' omamo_output_df.csv >> ranked_organisms.csv
cat ranked_organisms.csv
GOnr Species QuerySpeciesGenes ModelSpeciesGenes NrOrthologs FuncSim_Mean FuncSim_Std Score
472 DICDI NOP9;TBL3;ABT1 Q551Y5;Q7KWS8;esf2 3 0.9095 0.1567 2.7286
472 NEUCR NOP9;TBL3 nop9;pod-5 2 1.0000 0.0000 2.0000
472 SCHPO NOP9;TBL3 nop9;utp13 2 1.0000 0.0000 2.0000
You can also visit the OMAMO website, where you can browse biological processes to study in 50 unicellular species.
- store ic values in hdf5 database
- Overhaul and creating pip package
- Initial release
Alina Nicheperovich, Adrian M Altenhoff, Christophe Dessimoz, Sina Majidian, "OMAMO: orthology-based model organism selection", submitted to Bioinformatics journal, preprint.
OMAMO is a free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
OMAMO is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with OMAMO. If not, see http://www.gnu.org/licenses/.