Skip to content
forked from pallucs/PTMGPT2

GPT-based protein language model for PTM site prediction

License

Notifications You must be signed in to change notification settings

aiproteins/PTMGPT2

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PTMGPT2

DOI

Here, we introduce PTMGPT2, a suite of models capable of generating tokens that signify modified protein sequences, crucial for identifying PTM sites. At the core of this platform is PROTGPT2, an autoregressive transformer model. We have adapted PROTGPT2, utilizing it as a pre-trained model, and further fine-tuned it for the spe cific task of generating classification labels for a given PTM type. Uniquely, PTMGPT2 utilizes a decoder-only architecture, which eliminates the need for a task-specific clas- sification head during training. Instead, the final layer of the decoder functions as a projection back to the vocabulary space, effectively generating the next possible token based on the learned patterns among tokens in the input prompt.

PTMGPT2 model and workflow

Download sample model for inference

Link - (https://nsclbio.jbnu.ac.kr/GPT_model/)

Contact us directly at [email protected] for bulk predictions and trained models

PTMGPT2 Webserver

Link - (https://nsclbio.jbnu.ac.kr/tools/ptmgpt2/)

PTMGPT2 Models

Link - (https://doi.org/10.5281/zenodo.11371883)

Link - (https://zenodo.org/records/11362322)

PTMGPT2 Datasets

Link - (https://doi.org/10.5281/zenodo.11377398)

Requirements

python 3.11.3
transformers 4.29.2
scikit-learn 1.2.2
pytorch 2.0.1
pytorch-cuda 11.7

Basic Usage

• Model: This folder hosts a sample model designed to predict PTM sites from given protein sequences, illustrating PTMGPT2’s application.
• Tokenizer: This folder contains a sample tokenizer responsible for tokenizing protein sequences, including handcrafted tokens for specific amino acids or motifs.
• Inference.ipynb: This file provides executable code for applying PTMGPT2 model and tokenizer to predict PTM sites, serving as a practical guide for users to apply the model to their datasets.

About

GPT-based protein language model for PTM site prediction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%