-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: consider reworking rangedel, megingIter and levelIter interactions #2863
Comments
cc @sumeerbhola |
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Jan 25, 2024
For a couple years now, Pebble has not allowed the creation of untruncated range tombstones outside the context of virtual sstables. Additionally, since v22.2 untruncated range tombstones should all have been compacted and rewritten as truncated range tombstones. Virtual SSTables (which similarly allow a physical range deletion to exceed the bounds of the containing logical sstable) handle this case by truncating the tombstones at iteration time in the iterator returned by keyspan.Truncate. This leaves the merging iterator's delicate handling of untruncated range tombstones obsolete. This commit removes that complexity. In addition, as an extra safety precaution, the table stats collector's invariants-build assertion that range deletions are contained within their files' bounds is lifted into production builds as well. This provides a guarantee during store open that all tombstones are appropriately truncated. Relates to cockroachdb#2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Jan 25, 2024
For a couple years now, Pebble has not allowed the creation of untruncated range tombstones outside the context of virtual sstables. Additionally, since v22.2 untruncated range tombstones should all have been compacted and rewritten as truncated range tombstones. Virtual SSTables (which similarly allow a physical range deletion to exceed the bounds of the containing logical sstable) handle this case by truncating the tombstones at iteration time in the iterator returned by keyspan.Truncate. This leaves the merging iterator's delicate handling of untruncated range tombstones obsolete. This commit removes that complexity. In addition, as an extra safety precaution, the table stats collector's invariants-build assertion that range deletions are contained within their files' bounds is lifted into production builds as well. This provides a guarantee during store open that all tombstones are appropriately truncated. Relates to cockroachdb#2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Jan 26, 2024
For a couple years now, Pebble has not allowed the creation of untruncated range tombstones outside the context of virtual sstables. Additionally, since v22.2 untruncated range tombstones should all have been compacted and rewritten as truncated range tombstones. Virtual SSTables (which similarly allow a physical range deletion to exceed the bounds of the containing logical sstable) handle this case by truncating the tombstones at iteration time in the iterator returned by keyspan.Truncate. This leaves the merging iterator's delicate handling of untruncated range tombstones obsolete. This commit removes that complexity. In addition, as an extra safety precaution, the table stats collector's invariants-build assertion that range deletions are contained within their files' bounds is lifted into production builds as well. This provides a guarantee during store open that all tombstones are appropriately truncated. Relates to cockroachdb#2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Jan 26, 2024
For a couple years now, Pebble has not allowed the creation of untruncated range tombstones outside the context of virtual sstables. Additionally, since v22.2 untruncated range tombstones should all have been compacted and rewritten as truncated range tombstones. Virtual SSTables (which similarly allow a physical range deletion to exceed the bounds of the containing logical sstable) handle this case by truncating the tombstones at iteration time in the iterator returned by keyspan.Truncate. This leaves the merging iterator's delicate handling of untruncated range tombstones obsolete. This commit removes that complexity. In addition, as an extra safety precaution, the table stats collector's invariants-build assertion that range deletions are contained within their files' bounds is lifted into production builds as well. This provides a guarantee during store open that all tombstones are appropriately truncated. Relates to cockroachdb#2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Jan 26, 2024
This commit refactors the interface of the table cache `newIters` method to support creation of any of a sstable's iterators, including its point iterator, range deletion iterator and range key iterator. This is motivated by the refactor planned in cockroachdb#2863, which in turn requires refactoring some of the newIters call sites that determine whether ingested sstables overlap existing tables. There's also a minor benefit that a compaction setup no longer needs to open a point iterator and then immediately close it when initializing the range deletion iterators. ``` goos: linux goarch: amd64 pkg: github.com/cockroachdb/pebble cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ sec/op │ sec/op vs base │ NewItersAlloc 427.0n ± 1% 412.1n ± 1% -3.47% (p=0.000 n=10) │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ B/op │ B/op vs base │ NewItersAlloc 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ allocs/op │ allocs/op vs base │ NewItersAlloc 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal ```
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Jan 26, 2024
This commit refactors the interface of the table cache `newIters` method to support creation of any of a sstable's iterators, including its point iterator, range deletion iterator and range key iterator. This is motivated by the refactor planned in cockroachdb#2863, which in turn requires refactoring some of the newIters call sites that determine whether ingested sstables overlap existing tables. There's also a minor benefit that a compaction setup no longer needs to open a point iterator and then immediately close it when initializing the range deletion iterators. ``` goos: linux goarch: amd64 pkg: github.com/cockroachdb/pebble cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ sec/op │ sec/op vs base │ NewItersAlloc 427.0n ± 1% 412.1n ± 1% -3.47% (p=0.000 n=10) │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ B/op │ B/op vs base │ NewItersAlloc 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ allocs/op │ allocs/op vs base │ NewItersAlloc 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal ```
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Jan 26, 2024
This commit refactors the interface of the table cache `newIters` method to support creation of any of a sstable's iterators, including its point iterator, range deletion iterator and range key iterator. This is motivated by the refactor planned in cockroachdb#2863, which in turn requires refactoring some of the newIters call sites that determine whether ingested sstables overlap existing tables. There's also a minor benefit that a compaction setup no longer needs to open a point iterator and then immediately close it when initializing the range deletion iterators. ``` goos: linux goarch: amd64 pkg: github.com/cockroachdb/pebble cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ sec/op │ sec/op vs base │ NewItersAlloc 427.0n ± 1% 412.1n ± 1% -3.47% (p=0.000 n=10) │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ B/op │ B/op vs base │ NewItersAlloc 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ allocs/op │ allocs/op vs base │ NewItersAlloc 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal ```
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Jan 26, 2024
This commit refactors the interface of the table cache `newIters` method to support creation of any of a sstable's iterators, including its point iterator, range deletion iterator and range key iterator. This is motivated by the refactor planned in cockroachdb#2863, which in turn requires refactoring some of the newIters call sites that determine whether ingested sstables overlap existing tables. There's also a minor benefit that a compaction setup no longer needs to open a point iterator and then immediately close it when initializing the range deletion iterators. ``` goos: linux goarch: amd64 pkg: github.com/cockroachdb/pebble cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ sec/op │ sec/op vs base │ NewItersAlloc 427.0n ± 1% 412.1n ± 1% -3.47% (p=0.000 n=10) │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ B/op │ B/op vs base │ NewItersAlloc 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ allocs/op │ allocs/op vs base │ NewItersAlloc 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal ```
jbowens
added a commit
that referenced
this issue
Jan 29, 2024
For a couple years now, Pebble has not allowed the creation of untruncated range tombstones outside the context of virtual sstables. Additionally, since v22.2 untruncated range tombstones should all have been compacted and rewritten as truncated range tombstones. Virtual SSTables (which similarly allow a physical range deletion to exceed the bounds of the containing logical sstable) handle this case by truncating the tombstones at iteration time in the iterator returned by keyspan.Truncate. This leaves the merging iterator's delicate handling of untruncated range tombstones obsolete. This commit removes that complexity. In addition, as an extra safety precaution, the table stats collector's invariants-build assertion that range deletions are contained within their files' bounds is lifted into production builds as well. This provides a guarantee during store open that all tombstones are appropriately truncated. Relates to #2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Jan 31, 2024
This commit refactors the interface of the table cache `newIters` method to support creation of any of a sstable's iterators, including its point iterator, range deletion iterator and range key iterator. This is motivated by the refactor planned in cockroachdb#2863, which in turn requires refactoring some of the newIters call sites that determine whether ingested sstables overlap existing tables. There's also a minor benefit that a compaction setup no longer needs to open a point iterator and then immediately close it when initializing the range deletion iterators. ``` goos: linux goarch: amd64 pkg: github.com/cockroachdb/pebble cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ sec/op │ sec/op vs base │ NewItersAlloc 427.0n ± 1% 412.1n ± 1% -3.47% (p=0.000 n=10) │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ B/op │ B/op vs base │ NewItersAlloc 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ allocs/op │ allocs/op vs base │ NewItersAlloc 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal ```
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Jan 31, 2024
This commit refactors the interface of the table cache `newIters` method to support creation of any of a sstable's iterators, including its point iterator, range deletion iterator and range key iterator. This is motivated by the refactor planned in cockroachdb#2863, which in turn requires refactoring some of the newIters call sites that determine whether ingested sstables overlap existing tables. There's also a minor benefit that a compaction setup no longer needs to open a point iterator and then immediately close it when initializing the range deletion iterators. ``` goos: linux goarch: amd64 pkg: github.com/cockroachdb/pebble cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ sec/op │ sec/op vs base │ NewItersAlloc 427.0n ± 1% 412.1n ± 1% -3.47% (p=0.000 n=10) │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ B/op │ B/op vs base │ NewItersAlloc 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ allocs/op │ allocs/op vs base │ NewItersAlloc 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal ```
jbowens
added a commit
that referenced
this issue
Jan 31, 2024
This commit refactors the interface of the table cache `newIters` method to support creation of any of a sstable's iterators, including its point iterator, range deletion iterator and range key iterator. This is motivated by the refactor planned in #2863, which in turn requires refactoring some of the newIters call sites that determine whether ingested sstables overlap existing tables. There's also a minor benefit that a compaction setup no longer needs to open a point iterator and then immediately close it when initializing the range deletion iterators. ``` goos: linux goarch: amd64 pkg: github.com/cockroachdb/pebble cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ sec/op │ sec/op vs base │ NewItersAlloc 427.0n ± 1% 412.1n ± 1% -3.47% (p=0.000 n=10) │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ B/op │ B/op vs base │ NewItersAlloc 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal │ old-NewItersAlloc.txt │ new-NewItersAlloc.txt │ │ allocs/op │ allocs/op vs base │ NewItersAlloc 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ ¹ all samples are equal ```
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Mar 29, 2024
The levelIter has a concept of ignorable boundary keys. When a levelIter exhausts a file's point keys, it's possible that the same file still contains range deletions relevant to other levels of the LSM. The levelIter will, under some conditions, interleave the largest boundary key of the sstable into iteration as an 'ignorable boundary key,' so that the file's range deletions remain accessible until all other levels progress beyound the file's boundary. When block-property filters are in use, a file's point key iterator may become exhausted early, before the file's range deletions are irrelevant, even if the file's largest key is not a range deletion. To work around this subtlety, the sstable iterator previously would surface knowledge of whether any point keys may have been skipped through a MaybeFilteredKeys method. The levelIter used this method to determine when to interleave an ignorable largest boundary in case there may be additional relevant range deletions. This commit removes the conditioning on the output of MaybeFilteredKeys, instead unconditionally interleaving a synthetic boundary at a file's largest point key if the levelIter user is using the file's range deletion iterator. This simplifies the logic and removes a fragile reliance on the sstable's iterators accounting of when it may have filtered keys. Future work (cockroachdb#2863) will remove the need to interleave synthetic boundaries at all, instead interleaving the range deletion bounds themselves among point keys. Informs cockroachdb#2863. Close cockroachdb#3334.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Mar 29, 2024
The levelIter has a concept of ignorable boundary keys. When a levelIter exhausts a file's point keys, it's possible that the same file still contains range deletions relevant to other levels of the LSM. The levelIter will, under some conditions, interleave the largest boundary key of the sstable into iteration as an 'ignorable boundary key,' so that the file's range deletions remain accessible until all other levels progress beyound the file's boundary. When block-property filters are in use, a file's point key iterator may become exhausted early, before the file's range deletions are irrelevant, even if the file's largest key is not a range deletion. To work around this subtlety, the sstable iterator previously would surface knowledge of whether any point keys may have been skipped through a MaybeFilteredKeys method. The levelIter used this method to determine when to interleave an ignorable largest boundary in case there may be additional relevant range deletions. This commit removes the conditioning on the output of MaybeFilteredKeys, instead unconditionally interleaving a synthetic boundary at a file's largest point key if the levelIter user is using the file's range deletion iterator. This simplifies the logic and removes a fragile reliance on the sstable's iterators accounting of when it may have filtered keys. Future work (cockroachdb#2863) will remove the need to interleave synthetic boundaries at all, instead interleaving the range deletion bounds themselves among point keys. Informs cockroachdb#2863. Close cockroachdb#3334.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Mar 29, 2024
The levelIter has a concept of ignorable boundary keys. When a levelIter exhausts a file's point keys, it's possible that the same file still contains range deletions relevant to other levels of the LSM. The levelIter will, under some conditions, interleave the largest boundary key of the sstable into iteration as an 'ignorable boundary key,' so that the file's range deletions remain accessible until all other levels progress beyound the file's boundary. When block-property filters are in use, a file's point key iterator may become exhausted early, before the file's range deletions are irrelevant, even if the file's largest key is not a range deletion. To work around this subtlety, the sstable iterator previously would surface knowledge of whether any point keys may have been skipped through a MaybeFilteredKeys method. The levelIter used this method to determine when to interleave an ignorable largest boundary in case there may be additional relevant range deletions. This commit removes the conditioning on the output of MaybeFilteredKeys, instead unconditionally interleaving a synthetic boundary at a file's largest point key if the levelIter user is using the file's range deletion iterator. This simplifies the logic and removes a fragile reliance on the sstable's iterators accounting of when it may have filtered keys. Future work (cockroachdb#2863) will remove the need to interleave synthetic boundaries at all, instead interleaving the range deletion bounds themselves among point keys. Informs cockroachdb#2863. Close cockroachdb#3334.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Apr 1, 2024
The levelIter has a concept of ignorable boundary keys. When a levelIter exhausts a file's point keys, it's possible that the same file still contains range deletions relevant to other levels of the LSM. The levelIter will, under some conditions, interleave the largest boundary key of the sstable into iteration as an 'ignorable boundary key,' so that the file's range deletions remain accessible until all other levels progress beyound the file's boundary. When block-property filters are in use, a file's point key iterator may become exhausted early, before the file's range deletions are irrelevant, even if the file's largest key is not a range deletion. To work around this subtlety, the sstable iterator previously would surface knowledge of whether any point keys may have been skipped through a MaybeFilteredKeys method. The levelIter used this method to determine when to interleave an ignorable largest boundary in case there may be additional relevant range deletions. This commit removes the conditioning on the output of MaybeFilteredKeys, instead unconditionally interleaving a synthetic boundary at a file's largest point key if the levelIter user is using the file's range deletion iterator. This simplifies the logic and removes a fragile reliance on the sstable's iterators accounting of when it may have filtered keys. Future work (cockroachdb#2863) will remove the need to interleave synthetic boundaries at all, instead interleaving the range deletion bounds themselves among point keys. Informs cockroachdb#2863. Close cockroachdb#3334.
jbowens
added a commit
that referenced
this issue
Apr 5, 2024
The levelIter has a concept of ignorable boundary keys. When a levelIter exhausts a file's point keys, it's possible that the same file still contains range deletions relevant to other levels of the LSM. The levelIter will, under some conditions, interleave the largest boundary key of the sstable into iteration as an 'ignorable boundary key,' so that the file's range deletions remain accessible until all other levels progress beyound the file's boundary. When block-property filters are in use, a file's point key iterator may become exhausted early, before the file's range deletions are irrelevant, even if the file's largest key is not a range deletion. To work around this subtlety, the sstable iterator previously would surface knowledge of whether any point keys may have been skipped through a MaybeFilteredKeys method. The levelIter used this method to determine when to interleave an ignorable largest boundary in case there may be additional relevant range deletions. This commit removes the conditioning on the output of MaybeFilteredKeys, instead unconditionally interleaving a synthetic boundary at a file's largest point key if the levelIter user is using the file's range deletion iterator. This simplifies the logic and removes a fragile reliance on the sstable's iterators accounting of when it may have filtered keys. Future work (#2863) will remove the need to interleave synthetic boundaries at all, instead interleaving the range deletion bounds themselves among point keys. Informs #2863. Close #3334.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Apr 12, 2024
This commit refactors the implementation of getIter to be a little more understandable and avoid the unnecessary use of levelIter. Current supported format major versions guarantee that a user key is not split across sstables within a level. This ensures that Get (which only retrieves one individual user key) need only consult 1 sstable per level. This is somewhat motivated by cockroachdb#2863. Removing getIter's dependency on levelIter will make that refactor easier.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Apr 12, 2024
This commit refactors the implementation of getIter to be a little more understandable and avoid the unnecessary use of levelIter. Current supported format major versions guarantee that a user key is not split across sstables within a level. This ensures that Get (which only retrieves one individual user key) need only consult 1 sstable per level. This is somewhat motivated by cockroachdb#2863. Removing getIter's dependency on levelIter will make that refactor easier.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Apr 12, 2024
This commit refactors the implementation of getIter to be a little more understandable and avoid the unnecessary use of levelIter. Current supported format major versions guarantee that a user key is not split across sstables within a level. This ensures that Get (which only retrieves one individual user key) need only consult 1 sstable per level. This is somewhat motivated by cockroachdb#2863. Removing getIter's dependency on levelIter will make that refactor easier.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Apr 22, 2024
Adapt the InterleavingIter to support a mode that interleaves span end keys. Informs cockroachdb#2863.
jbowens
added a commit
that referenced
this issue
Apr 22, 2024
Adapt the InterleavingIter to support a mode that interleaves span end keys. Informs #2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Apr 24, 2024
The merging iterator and the level iterator are more tightly coupled than desired. Most of this coupling is a consequence of range deletions. The merging iterator is responsible for applying range deletions across levels and requires that the level iterator not progress to a new file until the file's range deletions are no longer relevant to other levels. It does this by interleaving "synthetic" keys. When these keys reach the top of the heap, it signals that all other levels have progressed to a point where the current file's range deletions are no longer relevant. Previously, knowledge of which keys were interleaved "synthetic" keys was propagated indirectly—outside the internal iterator interface—through a pointer to a levelIterBoundaryContext struct with boolean fields. This commit instead always interleaves synthetic keys as range deletion exclusive sentinels. The merging iterator can infer which keys are synthetic by calling InternalKey.IsExclusiveSentinel. Propagating this information directly through the internal iterator interface is more direct and understandable. Beyond needing to know which keys are synthetic, the merging iterator also must know when user-imposed iteration bounds have been reached. Ordinarily when bounds have been reached, internal iterators become exhausted. Since the levelIter's file's range deletions may still be relevant, it cannot and it interleaves a "synthetic" key at the iteration bound. When this key reaches the top of the merging iterator's heap, there are no more keys within bounds and the merging iterator is exhausted. To make this determination, we add a simple key comparison. The use of a key comparison is expected to be acceptable, because it's only performed when a synthetic key reaches the top of the heap. This additional key comparison should ultimately be eliminated when cockroachdb#2863 is complete. Informs cockroachdb#2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Apr 24, 2024
The merging iterator and the level iterator are more tightly coupled than desired. Most of this coupling is a consequence of range deletions. The merging iterator is responsible for applying range deletions across levels and requires that the level iterator not progress to a new file until the file's range deletions are no longer relevant to other levels. It does this by interleaving "synthetic" keys. When these keys reach the top of the heap, it signals that all other levels have progressed to a point where the current file's range deletions are no longer relevant. Previously, knowledge of which keys were interleaved "synthetic" keys was propagated indirectly—outside the internal iterator interface—through a pointer to a levelIterBoundaryContext struct with boolean fields. This commit instead always interleaves synthetic keys as range deletion exclusive sentinels. The merging iterator can infer which keys are synthetic by calling InternalKey.IsExclusiveSentinel. Propagating this information directly through the internal iterator interface is more direct and understandable. Beyond needing to know which keys are synthetic, the merging iterator also must know when user-imposed iteration bounds have been reached. Ordinarily when bounds have been reached, internal iterators become exhausted. Since the levelIter's file's range deletions may still be relevant, it cannot and it interleaves a "synthetic" key at the iteration bound. When this key reaches the top of the merging iterator's heap, there are no more keys within bounds and the merging iterator is exhausted. To make this determination, we add a simple key comparison. The use of a key comparison is expected to be acceptable, because it's only performed when a synthetic key reaches the top of the heap. This additional key comparison should ultimately be eliminated when cockroachdb#2863 is complete. Informs cockroachdb#2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Apr 25, 2024
The merging iterator and the level iterator are more tightly coupled than desired. Most of this coupling is a consequence of range deletions. The merging iterator is responsible for applying range deletions across levels and requires that the level iterator not progress to a new file until the file's range deletions are no longer relevant to other levels. It does this by interleaving "synthetic" keys. When these keys reach the top of the heap, it signals that all other levels have progressed to a point where the current file's range deletions are no longer relevant. Previously, knowledge of which keys were interleaved "synthetic" keys was propagated indirectly—outside the internal iterator interface—through a pointer to a levelIterBoundaryContext struct with boolean fields. This commit instead always interleaves synthetic keys as range deletion exclusive sentinels. The merging iterator can infer which keys are synthetic by calling InternalKey.IsExclusiveSentinel. Propagating this information directly through the internal iterator interface is more direct and understandable. Beyond needing to know which keys are synthetic, the merging iterator also must know when user-imposed iteration bounds have been reached. Ordinarily when bounds have been reached, internal iterators become exhausted. Since the levelIter's file's range deletions may still be relevant, it cannot and it interleaves a "synthetic" key at the iteration bound. When this key reaches the top of the merging iterator's heap, there are no more keys within bounds and the merging iterator is exhausted. To make this determination, we add a simple key comparison. The use of a key comparison is expected to be acceptable, because it's only performed when a synthetic key reaches the top of the heap. This additional key comparison should ultimately be eliminated when cockroachdb#2863 is complete. Informs cockroachdb#2863.
jbowens
added a commit
that referenced
this issue
Apr 26, 2024
The merging iterator and the level iterator are more tightly coupled than desired. Most of this coupling is a consequence of range deletions. The merging iterator is responsible for applying range deletions across levels and requires that the level iterator not progress to a new file until the file's range deletions are no longer relevant to other levels. It does this by interleaving "synthetic" keys. When these keys reach the top of the heap, it signals that all other levels have progressed to a point where the current file's range deletions are no longer relevant. Previously, knowledge of which keys were interleaved "synthetic" keys was propagated indirectly—outside the internal iterator interface—through a pointer to a levelIterBoundaryContext struct with boolean fields. This commit instead always interleaves synthetic keys as range deletion exclusive sentinels. The merging iterator can infer which keys are synthetic by calling InternalKey.IsExclusiveSentinel. Propagating this information directly through the internal iterator interface is more direct and understandable. Beyond needing to know which keys are synthetic, the merging iterator also must know when user-imposed iteration bounds have been reached. Ordinarily when bounds have been reached, internal iterators become exhausted. Since the levelIter's file's range deletions may still be relevant, it cannot and it interleaves a "synthetic" key at the iteration bound. When this key reaches the top of the merging iterator's heap, there are no more keys within bounds and the merging iterator is exhausted. To make this determination, we add a simple key comparison. The use of a key comparison is expected to be acceptable, because it's only performed when a synthetic key reaches the top of the heap. This additional key comparison should ultimately be eliminated when #2863 is complete. Informs #2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
May 2, 2024
Refactor the overlap checking to avoid using the levelIter to populate the range deletion iterator. This was subtle and confusing. This refactor will also benefit the implementation of cockroachdb#2112. Informs cockroachdb#2112. Informs cockroachdb#2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
May 2, 2024
Refactor the overlap checking to avoid using the levelIter to populate the range deletion iterator. This was subtle and confusing. This refactor will also benefit the implementation of cockroachdb#2112. Informs cockroachdb#2112. Informs cockroachdb#2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
May 3, 2024
Refactor the overlap checking to avoid using the levelIter to populate the range deletion iterator. This was subtle and confusing. This refactor will also benefit the implementation of cockroachdb#2112. Informs cockroachdb#2112. Informs cockroachdb#2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
May 6, 2024
Previously when the interleaving iterator interleaved a span boundary, the semantics of calling NextPrefix were indeterminate. Ordinarily NextPrefix guarantees that it advances to a key with a new prefix. The existing interleaving iterator implementation didn't guarantee this when positioned on a span boundary. Guaranteeing it would require otherwise unnecessary key comparisons. In existing use cases (for range keys, which always begin and end at prefix keys), we never call NextPrefix while positioned at a span boundary. With the planned refactor of cockroachdb#2863, we'll begin to use the interleaving iterator to interleave range deletions which may have bounds within the keys of a prefix. This assertion will ensure we don't unintentionally rely on currently undefined behavior of calling NextPrefix on a span boundary.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
May 6, 2024
Previously, levelIter would sometimes interleave fake keys at file boundaries or user-provided iteration boundaries when a file contains range deletions. This commit reworks the levelIter to instead interleave the individual bounds of the range deletions themselves, using an interleaving iterator. This simplifies the levelIter. Because range deletions are now read in two places: once for interleaving bounds and once to expose a rangeDelIter to the mergingIter, this commit requires opening a file's range deletion iterator twice. This is an intermediary state until cockroachdb#2863 is completed when the mergingIter will consult the levelIter's interleaving iterator to determine which range deletions overlap the current iterator position. Informs cockroachdb#2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
May 6, 2024
Previously, levelIter would sometimes interleave fake keys at file boundaries or user-provided iteration boundaries when a file contains range deletions. This commit reworks the levelIter to instead interleave the individual bounds of the range deletions themselves, using an interleaving iterator. This simplifies the levelIter. Because range deletions are now read in two places: once for interleaving bounds and once to expose a rangeDelIter to the mergingIter, this commit requires opening a file's range deletion iterator twice. This is an intermediary state until cockroachdb#2863 is completed when the mergingIter will consult the levelIter's interleaving iterator to determine which range deletions overlap the current iterator position. Informs cockroachdb#2863.
jbowens
added a commit
that referenced
this issue
May 6, 2024
Previously when the interleaving iterator interleaved a span boundary, the semantics of calling NextPrefix were indeterminate. Ordinarily NextPrefix guarantees that it advances to a key with a new prefix. The existing interleaving iterator implementation didn't guarantee this when positioned on a span boundary. Guaranteeing it would require otherwise unnecessary key comparisons. In existing use cases (for range keys, which always begin and end at prefix keys), we never call NextPrefix while positioned at a span boundary. With the planned refactor of #2863, we'll begin to use the interleaving iterator to interleave range deletions which may have bounds within the keys of a prefix. This assertion will ensure we don't unintentionally rely on currently undefined behavior of calling NextPrefix on a span boundary.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
May 7, 2024
Previously, levelIter would sometimes interleave fake keys at file boundaries or user-provided iteration boundaries when a file contains range deletions. This commit reworks the levelIter to instead interleave the individual bounds of the range deletions themselves, using an interleaving iterator. This simplifies the levelIter. Because range deletions are now read in two places: once for interleaving bounds and once to expose a rangeDelIter to the mergingIter, this commit requires opening a file's range deletion iterator twice. This is an intermediary state until cockroachdb#2863 is completed when the mergingIter will consult the levelIter's interleaving iterator to determine which range deletions overlap the current iterator position. Informs cockroachdb#2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
May 8, 2024
Previously, levelIter would sometimes interleave fake keys at file boundaries or user-provided iteration boundaries when a file contains range deletions. This commit reworks the levelIter to instead interleave the individual bounds of the range deletions themselves, using an interleaving iterator. This simplifies the levelIter. Because range deletions are now read in two places: once for interleaving bounds and once to expose a rangeDelIter to the mergingIter, this commit requires opening a file's range deletion iterator twice. This is an intermediary state until cockroachdb#2863 is completed when the mergingIter will consult the levelIter's interleaving iterator to determine which range deletions overlap the current iterator position. Informs cockroachdb#2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
May 9, 2024
Previously, levelIter would sometimes interleave fake keys at file boundaries or user-provided iteration boundaries when a file contains range deletions. This commit reworks the levelIter to instead interleave the individual bounds of the range deletions themselves, using an interleaving iterator. This simplifies the levelIter. Because range deletions are now read in two places: once for interleaving bounds and once to expose a rangeDelIter to the mergingIter, this commit requires opening a file's range deletion iterator twice. This is an intermediary state until cockroachdb#2863 is completed when the mergingIter will consult the levelIter's interleaving iterator to determine which range deletions overlap the current iterator position. Informs cockroachdb#2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
May 10, 2024
Previously, levelIter would sometimes interleave fake keys at file boundaries or user-provided iteration boundaries when a file contains range deletions. This commit reworks the levelIter to instead interleave the individual bounds of the range deletions themselves, using an interleaving iterator. This simplifies the levelIter. Because range deletions are now read in two places: once for interleaving bounds and once to expose a rangeDelIter to the mergingIter, this commit requires opening a file's range deletion iterator twice. This is an intermediary state until cockroachdb#2863 is completed when the mergingIter will consult the levelIter's interleaving iterator to determine which range deletions overlap the current iterator position. Informs cockroachdb#2863.
jbowens
added a commit
that referenced
this issue
May 10, 2024
Previously, levelIter would sometimes interleave fake keys at file boundaries or user-provided iteration boundaries when a file contains range deletions. This commit reworks the levelIter to instead interleave the individual bounds of the range deletions themselves, using an interleaving iterator. This simplifies the levelIter. Because range deletions are now read in two places: once for interleaving bounds and once to expose a rangeDelIter to the mergingIter, this commit requires opening a file's range deletion iterator twice. This is an intermediary state until #2863 is completed when the mergingIter will consult the levelIter's interleaving iterator to determine which range deletions overlap the current iterator position. Informs #2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
May 10, 2024
Rework the merging iterator's handling of range deletions to use the interleaved range deletion boundary keys to determine when the iterator is positioned within a level's range deletion. This removes the direct manipulation of a range deletion keyspan.FragmentIterator from the mergingIter, delegating that to the child iterator's keyspan.InterleavingIter. This factoring is a little cleaner and decouples the mergingIter from the details of range deletion iterators, and in particular, the levelIter's individual sstables. It also should reduce key comparisons, especially during scans, by avoiding unnecessary key comparisons with range deletions that are loaded but outside the keyspace being iterated over. Close cockroachdb#2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
May 10, 2024
Rework the merging iterator's handling of range deletions to use the interleaved range deletion boundary keys to determine when the iterator is positioned within a level's range deletion. This removes the direct manipulation of a range deletion keyspan.FragmentIterator from the mergingIter, delegating that to the child iterator's keyspan.InterleavingIter. This factoring is a little cleaner and decouples the mergingIter from the details of range deletion iterators, and in particular, the levelIter's individual sstables. It also should reduce key comparisons, especially during scans, by avoiding unnecessary key comparisons with range deletions that are loaded but outside the keyspace being iterated over. Close cockroachdb#2863.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
May 13, 2024
Rework the merging iterator's handling of range deletions to use the interleaved range deletion boundary keys to determine when the iterator is positioned within a level's range deletion. This removes the direct manipulation of a range deletion keyspan.FragmentIterator from the mergingIter, delegating that to the child iterator's keyspan.InterleavingIter. This factoring is a little cleaner and decouples the mergingIter from the details of range deletion iterators, and in particular, the levelIter's individual sstables. It also should reduce key comparisons, especially during scans, by avoiding unnecessary key comparisons with range deletions that are loaded but outside the keyspace being iterated over. Close cockroachdb#2863.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently, there's an intricate coordination dance performed between
mergingIter
andlevelIter
in order to make range deletions work. This dance has several complexities that make it fragile and difficult to grok. The introduction of range-key masking and "truthful" block-property filters also forces us into some awkward contortions to deal with the fact that a file's point iterator and range deletion iterator may become exhausted at different times.I suspect we could significantly clean things up by using the
keyspan.InterleavingIter
within thelevelIter
implementation. It would work something like:levelIter
initializes akeyspan.InterleavingIter
with the file's point iterator and rangedel iterator. It configures theInterleavingIter
to interleave both start and end boundaries of range deletions.levelIter
keeps a file open until it exhausts theInterleavingIter
. Since range deletions' start/end boundaries are interleaved, this ensures that we don't prematurely close the range deletion iterator during iteration. We can remove the code that manufactures "ignorable boundary keys." The interleaving iterator handles interleaving the rangedel end boundary if necessary.mergingIterLevel
has agetRangeDel() *keyspan.Span
func that returns the range deletion covering that level's iterator's current position, if any. When alevelIter
is setup, this is set tolevelIter.Span
which in turn callsInterelavingIter.Span
.filteredIter
interface and related tracking within the sstable iterator is removed. It's no longer necessary because range deletions are unconditionally interleaved, which ensures we never miss a range deletion due to block-property filters excluding point keys that exist beyond the smallest/largest range deletion bound.mergingIterLevel
no longer buffers a tombstone. It instead callsgetRangeDel()
which accesses theInterleavingIter
's buffered tombstone instead. We removemergingIter.init[Min,Max]RangeDelIters
, because theInterleavingIter
is transparently ensuring the range deletion iterators are appropriately positioned.The memtable and indexed batch levels will also have to be adjusted to make use of a
keyspan.InterleavingIter
.Jira issue: PEBBLE-82
The text was updated successfully, but these errors were encountered: