-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scan metadata updates #357
Conversation
Looks good to merge, with one suggestion for a comment above. I think we can now close #61 as the new data-raw script should allow us to keep up with any new updates made in linkages between WCIS and LDM. I have linked that issue to this PR per the comment in script, unlink it if you don't think issue is fully resolved. Not sure whether there will be efforts to characterize sites that lack pedons, or if there is more detective work needed to link more pedons, but if so, from {soilDB} perspective, looks like @jskovlin and @dylanbeaudette have "solved" it and we can now just re-run update script periodically. I was curious how many sites still lack linked pedon data, see below. library(soilDB)
x <- SCAN_site_metadata()
aggregate(
x$upedonid,
by = list(Network = x$Network),
FUN = \(i) c(no_pedon = sum(is.na(i)), n = length(i))
)
#> Network x.no_pedon x.n
#> 1 CSCAN 20 21
#> 2 SCAN 45 210
#> 3 SNTL 689 910
#> 4 SNTLT 42 45 Also probably pretty close to closing #184 with timezone and SNOWLITE updates. Not sure if there is much we want to do with the remaining TODOs in that issue. |
Also, no issue with this that I can see, but just a thought: when combining hourly data for stations in multiple time zones the Dissimilar timezones appear to be correctly converted when the data are combined--but curious if we want to give the user the ability to specify a target time zone explicitly (rather than implicitly through the choice of their first station ID). |
Thanks for looking over the PR and commentary. I agree that we can close #61, with the understanding that there is still work to be done fixing station information in WCIS and in NASIS. Jay is working on a batch of those right now.
Yes, it will get progressively "better".
I'll close it, and open a new ticket for the question about multiple timezones. |
Good points, I hadn't thought about that carefully and there are no tests in place to ensure that data are reasonable. I also hadn't considered what "happens" when combining multiple timezones. I had to convince myself that it was reasonable, and it is, just as you describe: x <- c(
as.POSIXct('2024-01-01 12:00:00', tz = 'etc/gmt+8'),
as.POSIXct('2024-01-01 12:00:00', tz = 'etc/gmt+10')
)
# correctly converted to single TZ
x
# new TZ is the first one
table(format(x, format = '%Z')) I'll open a new issue to implement selection of a timezone. I'll also add a note in the manual + tutorial. |
Multiple TZ is slightly more complicated than I thought, as until now, the ordering of stations in the results was the numerically-sorted |
See above review comment on what you just deleted, I think that you need to check for |
…"` (or user-specified via `tz` for #358)
After discussion on above test added in 7822889#diff-129da79a9437e382a28369436cd7042f113ce828f5a421f7132d59f2cfbded0d which failed on Linux I implemented standardization of the timezone based on The default behavior ( To ensure that a common offset is returned, users need to specify EDIT: UTC may be a better default due to above caveats |
Just pushed a small change in d96cf6f due to an oversight I made, the issue was only evident for |
Thanks for the more nuanced solution to the TZ and multiple above ground sensor problems. Are we good to merge ? |
Yep, I think this is good to go in that it does what is expected. The only remaining "issue" is perhaps the choice of default timezone. I noticed that with the default "US/Central" timezone (or any timezone that observes DST) you will get 2 records for the same hour slot when To get stability in times and DST need to use "UTC" or a specific offset as you had originally set up. The caveat to specific offset being that the exact output will differ based on platform. It seems that on the Linux systems tested (recent Ubuntu and Manjaro) that the offset gets applied to the time (converted to UTC), whereas on Windows and Mac the offset is preserved with the local time. Either is a "correct" representation of time, but final output varies with platform. Choosing a consistent timezone default at least prevents multiple queries across different extents or different groups of stations from returning different results silently. I suppose this is not a huge problem because if combining POSIXct results from multiple queries the first timezone is used. We probably want to make sure the tutorials cover this in some regard. The |
Do we want to add a message that indicates to users requests that span multiple |
Yes, a message when >1 TZ result from the selected stations is a good idea. I can do that this morning if you are busy. |
Did it yesterday, just didnt push. See 309e357 |
This PR addresses several TODO items for
fetchSCAN()
and related functions:fetchSCAN
upgrades #184 -- timezone is now available in the station metadata, and a newdatetime
column initialized as timestamp with timezoneThe latest metadata creation code and intermediate files are at
data-raw/scan-snotel-current
. This code and resulting data replace all previous efforts to combine WCIS (station information) and LDM (user pedon ID and pedlabsampnum) sources.