Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Download along with sequence metadata #25

Open
ChongLC opened this issue Oct 26, 2021 · 6 comments
Open

Suggestion: Download along with sequence metadata #25

ChongLC opened this issue Oct 26, 2021 · 6 comments

Comments

@ChongLC
Copy link

ChongLC commented Oct 26, 2021

Dear developer,

I just have a suggestion here. Sequence metadata usually is quite useful for data analysis for the downloaded sequence. Perhaps, you may consider adding that feature.

Looking forward to the feature.

Best regards,
Chong

@StuntsPT
Copy link
Owner

Hello @ChongLC ,
What kind of metadata do you have in mind? It might be possible, depending on what you mean.

Best,
Francisco

@ChongLC
Copy link
Author

ChongLC commented Oct 26, 2021

Dear Franciso,

Perhaps the record from genpept (.gp)? Although from my side, I would like to have the details (source/organism information) of the sequences. While downloading a huge dataset at once, it is hard to deep mine the souce/organism information of each sequence.

Best regards,
Chong

@StuntsPT
Copy link
Owner

StuntsPT commented Nov 3, 2021

Dear Chong,

Sorry about the delay. I somehow missed the notification.
Are you aware of any API that can be used to get the .gp files?

Best,
Francisco

@ChongLC
Copy link
Author

ChongLC commented Nov 5, 2021

Dear Developer,

I knew that they have the E-utilies function. You may want to refer to their documentation (https://www.ncbi.nlm.nih.gov/books/NBK25501/).

If I do not understand wrongly, you may download using the E-Utilities perl script by having a slight change.
my $db = "protein";
my $query = "txid10239[Organism]";
my $report = "genpept";

Past three years, I downloaded using the batch Entrez function. However, I noticed there are some empty batches. Just for your information in case you are not aware of it.

As I also missed the notification sometimes, perhaps we can have further conversation through email ([email protected]) if you don't mind.

Best regards,
CHONG

@ChongLC
Copy link
Author

ChongLC commented Jan 14, 2022

Dear Prof. Francisco,

It looks great while trying with a small dataset download (txid: 12637).
The command used:
python3 NCBI_downloader.py -q "txid12637[Organism:exp]" -d "protein" -nv -o denv.txt -f gb

Do close the issue if required. Thank you.

Best regards,
CHONG

@StuntsPT
Copy link
Owner

Dear @ChongLC,
Great to hear it seems to be working. I will keep the issue open until I turn this makeshift version into a real integrated part of the program.

Best,
Francisco

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants