You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
This is a pretty low priority, but it I can see it having some obscure effects on, ex., identifying gaps/availability in a bank.
When a waveform file gets updated (in this specific case, everything about the files were identical, just the modified time changed), a duplicate entry gets put into the WaveBank.
To Reproduce
I admittedly haven't deliberately reproduced this, but I'm fairly certain this is what happened:
Expected behavior
The duplicate entries shouldn't get created if all else is equal about the files. (This raises a bigger issue about what happens if the file's contents change substantially, but that's a problem for another day.)
I'm not entirely sure if this would work with the structure of update_index, but simply dropping duplicates should resolve the issue.
Versions (please complete the following information):
OS: Ubuntu 22.04
ObsPlus Version: 0.2.3.dev10+g8e231f4
Python Version: 3.8.12
The text was updated successfully, but these errors were encountered:
Ya, I have seen this before. I guess it begs the question: why is the mtime changing, and is that easier to avoid than implementing a solution?
One advantage of the bank is that we can update/query the index without having to load all the contents into memory. If we did check for duplicates, we would have to load the entire index into memory and check for duplicates (could be kinda costly if the index is large) before updates. I am not sure how much slower it would make file indexing, but at least a bit.
In this case I wanted to make sure that the existing files were overwritten because there was a change in how obspy reads/writes Kinemetrics EVT files and I wanted to make sure that any data in the bank that originated from an EVT file was consistent (number of samples/time window was the same, it was a matter of storing the data with/without the instrument response removed).
An alternative could have been to just delete and recreate the index, but the index is huge and it would have taken 24 hours+ (though I will probably end up doing that now, anyways).
I know you don't really like adding additional kwargs, but maybe it could be an optional step if it's a case where we know it will be necessary?
Description
This is a pretty low priority, but it I can see it having some obscure effects on, ex., identifying gaps/availability in a bank.
When a waveform file gets updated (in this specific case, everything about the files were identical, just the modified time changed), a duplicate entry gets put into the WaveBank.
To Reproduce
I admittedly haven't deliberately reproduced this, but I'm fairly certain this is what happened:
Expected behavior
The duplicate entries shouldn't get created if all else is equal about the files. (This raises a bigger issue about what happens if the file's contents change substantially, but that's a problem for another day.)
I'm not entirely sure if this would work with the structure of update_index, but simply dropping duplicates should resolve the issue.
Versions (please complete the following information):
The text was updated successfully, but these errors were encountered: