-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The number of recovered cases is lower than expected #177
Comments
Is it possible to retrieve recovered data from the primary sources? |
WHO does not provide this information, so it is not possible cc: @jim-sheldon @Mougk |
WHO data:
For my own edification, are there techniques to estimate the number of recovered cases, or is that a bad idea? |
We can estimate cumulative number of recovery cases if "recovery period" [days] (the time period between case confirmation and recovery) is available. recovery_period = 10 # For COVID-19, 7 - 21 days in my analysis
df["Recovered"] = (df["Confirmed"] - df["Fatal"]).shift(periods=recovery_period, freq="D") https://github.com/lisphilar/covid19-sir/blob/2ae6e194884475b02fe418334b287d813d7f6550/covsirphy/engineering/_complement.py#L296 Regarding Monkeypox, we may need to find papers later to select a specific value, but
https://www.who.int/news-room/fact-sheets/detail/monkeypox Is this not a role of a database? |
Ah, cool, thanks for that! Could you clarify "is this not a role of a database"? Happy to add to our db if you want :) |
Thank you for your positive comment! One weak point of this method is that it highly depends on the accuracy of recovery period estimation. Because only five cases are listed in the (deprecated) line list, it is very difficult to decide the exact value of recovery period. Users may regard the recovery data as raw data mistakenly. Alternatives:
|
"Users may regard the recovery data as raw data mistakenly" In general this is a trap we try to avoid, while still providing as much data as possible. We also find it frustrating that this leads to incomplete data sets. For the purposes of estimating numbers of recovered cases, we could make a script. Feel free to make a ticket with requirements and assign it to me. Of course, I want to be careful about how we label and share any forecasts and/or estimates. If we added this to our website we would need to clearly indicate to users what is data and what is probability. |
How about the following changes of directory tree at this repository? (I may not understand the current tree, please correct the followings.) Currently:
(Where is the time series data regarding the number of cases?) One idea:
This project has many tasks and I would be happy to make a pull request, including new Python script for creating
For example, |
Unless it is marked as deprecated, everything in the repo is in use. s3_ui is for accessing archived data. agency_ingestion takes CDC and WHO data and puts it in our database and S3 bucket, and puts ECDC data in our S3 bucket. gh_data_update takes CDC and WHO data and turns it into G.h data and stores it in the database and S3. map_timeseries creates a timeseries data file that our map visualization reads from S3. latest.csv is the most up to date G.h data, using the most up to date WHO and CDC data. Regarding forecasting and estimation, unfortunately I don't think documentation will suffice. I am interested in doing this work, but also want to take every possible precaution to ensure viewers of our outputs do not take it as data or prediction. The right balance might be doing that work in a private repo, only sharing it with trusted people, and any publicly-shared results would be carefully labelled (or limited to something like a timeseries graph with confidence intervals). @aimeehan1 @ksewalk @abhidg What do y'all think? |
@lisphilar @jim-sheldon We should not put estimated data in this repository. Furthermore using a single recovery_period will give incomplete estimates without knowing the standard deviation of the underlying distribution. The distribution of recovery times is necessary to produce a timeseries with confidence intervals. Modelling epidemics is usually done on variations of the SIR https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology#The_SIR_model_2 which are stochastic models to estimate number of infectious (and recovered) people. |
@jim-sheldon @abhidg (Additionally, in (To simulate the number of cases with SIR-like models, ODE parameter values, including beta and sigma, are necessary. To estimate ODE parameter values, we require linelist or the set of cumulative number of confirmed/recovered/fatal cases. Regarding COVID-19, UK does not provide recovery data. The method and recovery period estimated with the other countries' data are very helpful to analyse UK data with SIR-like model and the same workflow as that for the other countries, surely with caution when analysing.) @jim-sheldon |
@lisphilar all good, no need to apologize; thank you for the background and explanation! |
The number of recovered cases is only 5 in total now. This is lower than expected because there were around 30,000 confirmed/suspected cases on 01Aug2022.
Notebook for calculation:
https://gist.github.com/lisphilar/ae24c369d21cfeb89a673de1f6edb2b9
Recovery period is estimated as 2-4 weeks and case fatality rate was 3-6%. This means about 30,000 cases could be recovered as-of 01Sep2022 with simple calculation.
@aimeehan1 indicated as follows on #127 (comment).
We may need to revise data dictionary or curation system.
The text was updated successfully, but these errors were encountered: