Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better explanation of the Motif Scaffolding Problem Definition Header #8

Open
alessandro-maverick opened this issue Jul 26, 2024 · 2 comments

Comments

@alessandro-maverick
Copy link

Hello,

First off, great job on this publication. I really enjoyed reading the paper.

I've been trying to run the model myself, however I'm having trouble understanding the structure of the pdb file header needed to run scaffolding on a protein. Based on the table in the readme, I would think that this:

REMARK 999 NAME   P40881_TRUNCATED_partial_fix
REMARK 999 PDB    P40881_TRUNCATED
REMARK 999 INPUT   5   5                  # Minimum and maximum length of scaffold segment is 5 aa
REMARK 999 INPUT  A 5  10   A             # The motif segment is on chain A, from 5 to 10
REMARK 999 MINIMUM TOTAL LENGTH      100  # I want the protein to be 100 amino acids, so both min and max
REMARK 999 MAXIMUM TOTAL LENGTH      100  # total length is 100.

(just to be clear, I don't have the comments in the actual file)

Is the header needed in order to set as motif the 5 amino acid sequence from amino acid 5 to 10 in a 100 amino acid long pdb file. However, this leads to errors in feature creation.

Would it be possible to get a more in depth explanation of how the pdb header works?

Thank you.

@yeqinglin
Copy link
Collaborator

Thank you very much. For our current motif scaffolding specification file, we assume that the configuration is valid and we are currently working on a more user-friendly specification format. For our current format, to generate a protein with 100 amino acids and containing the motif, you would need this as the header

REMARK 999 INPUT     5   5                  
REMARK 999 INPUT  A 5  10   A      
REMARK 999 INPUT   89  89           
REMARK 999 MINIMUM TOTAL LENGTH      100 
REMARK 999 MAXIMUM TOTAL LENGTH      100

such that the number of residues adds up to 100. Hope this help to clarify things a bit and feel free to let us know if you have further questions.

@jrom99
Copy link

jrom99 commented Aug 21, 2024

Hello, I'd like to ask some questions regarding the multi-motif scaffolding problem definition:

REMARK 999 NAME   4JHW+5WN9
REMARK 999 PDB    4JHW+5WN9
REMARK 999 INPUT     10  40
REMARK 999 INPUT  A 254 278 A
REMARK 999 INPUT     20  50
REMARK 999 INPUT  A 170 189 B
REMARK 999 INPUT     10  40
REMARK 999 MINIMUM TOTAL LENGTH      85
REMARK 999 MAXIMUM TOTAL LENGTH      175

Of note, the provided configuration states the presence of motif groups A and B and chain A, but the table in the paper states that you used 10-40, 4JHW/F254-278{1}, 20-50, 5WN9/A170-189{2}, 10-40.

I'd like to ask if the only way to generate a multi-motif scaffolding is by first combining both PDB files and their chains into the same chain, or if they can come from different chains in the same file.

If possible, I'd also like to ask what does "motif group" represent, and how it impacts the scaffolding process (their relative rotation and distance? other aspect?).

And if possible more details about why the single-motiff scaffolding task mentioned in the paper as

"We exclude one task, 6VW1, as its motif consists of segments from multiple protein chains, a requirement not supported by Genie 2."

is not supported.

I'd also like to thank the authors for this project, it seems really promising and the code is a lot easier to parse compared to similar projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants