Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More benchmarks #25

Open
kvelleby opened this issue Oct 13, 2023 · 3 comments
Open

More benchmarks #25

kvelleby opened this issue Oct 13, 2023 · 3 comments
Assignees

Comments

@kvelleby
Copy link
Contributor

  1. We need to ensure that the benchmarks we have produced are correct and shared (there is a Dropbox shared_competition_data folder). I think simple benchmarks such as all zeroes, and last historical value should have team name "benchmark" to separate these from the ViEWS models. The code for simple benchmarks (i.e., deterministic models that are either pure functions or functions of the dependent variable) should be in this repository, as command line tools that could be run on "shared_competition_data/Features/*.parquet" and "shared_competition_data/Actuals" (e.g., you'll need the structure of the Actuals to produce predictions with all zeroes. You'll need the features to know the last historical value).
  2. Then we need to think whether there should be more benchmarks. Here are some ideas:
    a) For all units, just collect all historical values (or the last n-months) as the forecast distribution.
    b) A model with forecasts with equal probability assigned to each bin (I think as a rule use the closest to zero number in each bin). With e.g., 11 bins, that is ca.9% 0s, 9% 1s, 9% 3s, etc. (Please see the discussion on bins first.)
    c) As a), but swap every forecast-value with the closest to zero possible value given the bin it would be in given the binning-scheme. Motivation: Compared to a), does it help to "game" the metrics?
@hhegre
Copy link
Collaborator

hhegre commented Oct 13, 2023

I agree to 2(a) and 2(c). Last 120 months seems like an ok window?
2(b) is also great. But would it be better to use the log mean in each bin than the closest to zero? Or a draw from a Poisson given the log mean of the bin? For the maximum bin, we have define something. I see what I suggest looks more complex than the lowest value Jonas suggests, and simpler is better if we don't have good reasons for complexity.

@kvelleby
Copy link
Contributor Author

The closest to zero idea was motivated by the fact that finding a suitable value for the edge-bins would be difficult otherwise.

@kvelleby
Copy link
Contributor Author

We should also be looking into other ways to bin. Bayesian blocks is one approach that is quite interesting. https://docs.astropy.org/en/stable/api/astropy.stats.bayesian_blocks.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants