Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I download the data from EpiFlu database? #31

Open
virologist opened this issue Sep 19, 2022 · 8 comments
Open

How can I download the data from EpiFlu database? #31

virologist opened this issue Sep 19, 2022 · 8 comments
Labels
enhancement New feature or request
Milestone

Comments

@virologist
Copy link

Hi, @Wytamma

How can I download the data (e.g., H1N1) from the GISAID EpiFlu database using GISAIDR?

Best,
Yang

@Wytamma
Copy link
Owner

Wytamma commented Sep 20, 2022

Hi @virologist! Unfortunately GISAIDR doesn't support EpiFlu at this time. But I will add this to the version 2.0 milestone.

@Wytamma Wytamma added the enhancement New feature or request label Sep 20, 2022
@Wytamma Wytamma added this to the 2.0 milestone Sep 20, 2022
@abuendia
Copy link

abuendia commented Oct 6, 2023

Thanks for the great package! Seconding this request.

@dmontecino
Copy link

dmontecino commented Oct 23, 2023

@Wytamma, if you could explain the main limitations to not including EpiFlu yet, I might be able to help...

@Wytamma
Copy link
Owner

Wytamma commented Oct 26, 2023

Hey @dmontecino,

Thanks for the offer! Unfortunately EpiFlu uses a completely different interface to the other Epi platforms so you’d need to create all new methods just for EpiFlu.

I think supporting EpiFlu would be a good excuse to refactor GISAIDR. I will create a version 2 branch soon and start working on this update. If you’re still keen to help out I’d appreciate it! Always happy to accept PRs.

-W

@helmingstay
Copy link

Does EpiFlu have public-facing API docs? I got a question about this today, and without a GISAID login I wasn't able to find any information about the EpiFlu interface (asides from some youtube videos demonstrating the web platform).

Related question: does implementing EpiFlu require reverse-engineering a public interface, or are there real docs available?
In particular, I was wondering if "vanilla" wget and/or httr2 might provide a complimentary avenue to bulk access...

@Wytamma
Copy link
Owner

Wytamma commented Nov 3, 2023

Hey @helmingstay,

Unfortunately there's no API docs for GISAID :/ or any public API (this is why I made GISAIDR).

Yes implementing EpiFlu would require making the same HTTP requests the frontend makes to the backend. Another option would be to use a web driver like selenium. However, the web interface is restricted based on user access level and limited in the amount of data your can return. For example you were able to modify the nrows return using query params to get more data but GISAID removed this option (limiting query to 50 rows at a time). You are also unable to parallelise downloads as GISAID uses a stateful API meaning that you can only have one state per access token at a time.

There are many many issues with GISAID that make it difficult to use and interface with. It would be fantastic if they just rebuilt the site with a modern and open API, but looking at the track history of GISAID I doubt that will be any time soon.

@Wytamma Wytamma closed this as completed Nov 3, 2023
@Wytamma Wytamma reopened this Nov 3, 2023
@Wytamma
Copy link
Owner

Wytamma commented Dec 15, 2023

Hi All,

I'm working on epiflu in this brach -> https://github.com/Wytamma/GISAIDR/tree/EpiFlu.

It's currently limited to querying. Will hopefully add sequence download soon but you can do things like this:

> credentials <- login(username = username, password = password, database="EpiFlu")
> df <- epiflu_query(credentials, type = "A", h = 3, n = 2, from = "2023-12-01")
> df
        id selected                                               edit                    virus_name     accession_id subtype passage_details_history PB2 PB1  PA    HA    NP    NA    MP  NS  HE  P3
1 18629904    FALSE /epi3/app_entities/entities/help/pencil_noedit.png           A/Cardiff/5116/2023 EPI_ISL_18629904    H3N2                         --- --- --- 1,762 1,572 1,453 1,027 895 --- ---
2 18619994    FALSE /epi3/app_entities/entities/help/pencil_noedit.png      A/Kostroma/CRIE/193/2023 EPI_ISL_18619994    H3N2                         --- --- --- 1,718   ---   ---   --- --- --- ---
3 18619990    FALSE /epi3/app_entities/entities/help/pencil_noedit.png        A/Moscow/CRIE/176/2023 EPI_ISL_18619990    H3N2                         --- --- --- 1,718   ---   ---   --- --- --- ---
4 18619987    FALSE /epi3/app_entities/entities/help/pencil_noedit.png       A/Lipetsk/CRIE/172/2023 EPI_ISL_18619987    H3N2                         --- --- --- 1,718   ---   ---   --- --- --- ---
5 18619986    FALSE /epi3/app_entities/entities/help/pencil_noedit.png        A/Ryazan/CRIE/171/2023 EPI_ISL_18619986    H3N2                         --- --- --- 1,718   ---   ---   --- --- --- ---
6 18619985    FALSE /epi3/app_entities/entities/help/pencil_noedit.png A/Moscow oblast/CRIE/170/2023 EPI_ISL_18619985    H3N2                         --- --- --- 1,718   ---   ---   --- --- --- ---

Would be grateful for any feed back or PRs to add additional features / tests / docs.

@RahilRyder
Copy link

Thank you for this great tool! Thirding (?) this request especially as the cattle outbreak is taking off.
Not only do I have to individually download a flu sequence so it goes into one file, but I also have to figure out how to reorder segments to get them to be a whole consensus genome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants