-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootstrap prediction interval #323
Comments
If e has seasonality in it, then CBB and I don't think the second (inside) set of shocks need to be from the same bootstrap as the original one, so you could use an independent bootstrap here. If you have seasonality in your residuals, then you need to make sure they are always aligned in both the series generation and the forecast errors. Conceptually you could think of a seasonal time series in seasonal time. Suppose you had 2400 obs with a 24-hour seasonality so that you have 100 full periods. You can think of this as 100 observations of 24 hours. You could then CBB the 100 observations, use this to get the index of the start of the seasonality, and then sample from original data. For example, if a CBB gave you 4,5,6,7,1,2,3 Then you would use obs
|
If you don't have seasonality, then I think you could do something like
This would turn it into a "standard" problem. Essentially you would create all of the required shocks
in a single go from the bootstrap sampler. This would then be run R times to produce the confidence interval(s). |
Thanks,I realised I made an assumption as to the property of the CBB which is incorrect. My data does have daily seasonality. I incorrectly assumed the CBB essentially did a block random shuffle a bit like sklearn.model_selection.GroupKFold Hence if each block represents 24 hours the start of each block will always be 0 hours. If I understand you correctly - I think this is equivalanet to what you suggest? I convert from a vector of length 24*365 (i.e. 1 year of hourly observations) to a 24x365 matrix, that way the CBB will behave like the GroupKFold i.e.
[with a block size of one is this basically the same as IIDBootstrap?] |
This is correct - vector sampling the data will capture the seasonality. The only other change is that you need to adjust the bandwidth so that it is in seasonal periods, rather than in observation periods. |
Thanks, I'm not sure what you mean by adjusting the bandwidth - do you mean the first argument to CircularBlockBootstrap (which I have set to 1)? I will have to adjust ensure that I align the series generation and the forecast errors - I beleive I can do this by rotating the vector I'm sampling from before reshaping into matrix form so that the first element has the same time of day as the first element in the corresponding x* and x+ matrices |
Yes, the BW is 1 in your example. a BW of 1 is an IID bootstrap. In your problem, BW should reflect any data dependence across seasonal periods. If days are effectively independent, when you can use 1 (or IIDBootstrap on the array). If there is some heteroskedasticity or other dependence across days, then you should use a larger number. Yes, you will need to align them in your function that outputs this quantity: |
If you have the time, it would be great to have a writeup in a notebook of the key steps. You can use fake data (simulate as part of the notebook) it is makes things easier. |
Thanks, yes that makes sense there is a small multi-day dependency. In the end I think even when there's misalignment between x and x+ I still only need a single bootstrap - I just have to select and aligned n samples using I've attached a simple example - I've hardcoded the lengths to keep things simple. |
Also for the e* bootstrap, if len(e) isn't a multiple of the seasonablity I think it makes (some) sense to pad the tail with the head (i.e. wrap it around) and then when sampling e* you have to select len(x) values from the bootstap (since it's now longer than the original length)
It seems like there's multiple cases where it would be useful if we could specify the length of the sample? |
I'm trying to bootstrap the prediction interval for an sklearn style non-parametric estimator (specifically catboost regression). I want the prediction interval for the sum of n predictions (the data has time order). I'm using the method from Davison and Hinkley "Bootstrap Methods and their Applications" algorithm 6.4 i.e.
As far as I can see, there's no convience wrapper for a prediction interval similar bs.conf_int()?
Also like in #165 I only want the first n samples for e+ - I assume I still have to use the workaround from this issue?
My main question though is how to implement the two loops? I don't think creating two seperate CircularBlockBootstrap() instances is correct though since the random state of the inner bootstrap will be reset to whatever the default seed is? I think what I should do is create one bootstrap instance and use it as follows:
My issue though is when I sample estar and eplus - because the data is blocked I need to ensure the block boundaries are the same - specifically if the block size is one week and X and Xplus don't both start on the same hour of the week then sampling from the same CircularBlockBootstrap() object will result in misalignment of the eplus samples?
Edit: As is usually the case, taking the time to write out the problem gave me enough time to think it out. I compute the offset from the start of e to the same time of week of the start of the X+ and then simply sample as follows:
The text was updated successfully, but these errors were encountered: