Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear object store between rounds #367

Merged
merged 3 commits into from
Oct 23, 2024
Merged

Clear object store between rounds #367

merged 3 commits into from
Oct 23, 2024

Conversation

yankevn
Copy link
Collaborator

@yankevn yankevn commented Oct 22, 2024

Summary

These changes clear the object store at the end of each round during multi-round compaction. This replaces the existing behavior, which calls delete_many() on the list of object refs created during that round.

Rationale

In E2E testing, delete_many showed to take far too long when deleting a large number of object refs. This made compaction latency infeasible, and so clear() is being used instead. For this to work, only one partition may be running compaction at a time, otherwise clearing the shared object store will lead to issues.

Changes

  • Switch delete_many to clear
  • Move the pull request template

Impact

Executing clear() rather than delete_many() should lead to better performance. However, any jobs that run multi-round compaction with multiple partitions compacting in parallel will fail.

Testing

Unit tests were written.

Regression Risk

There is a risk the clear(), like delete_many(), will also take an extremely long time to run. To mitigate this, we will have to perform additional E2E testing on a large table.

The multi-round tests also had to be made more lax, as the FileObjectStore class cannot support clear(). Thus, we cannot check if the files were actually deleted, although we still check if clear() was called.

Checklist

  • Unit tests covering the changes have been added

    • If this is a bugfix, regression tests have been added
  • E2E testing has been performed

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 2.

Benchmark suite Current: f4af160 Previous: 9e01066 Ratio
deltacat/tests/compute/test_compact_partition_incremental.py::test_compact_partition_incremental[1-incremental-pkstr-sknone-norcf_V1] 0.47873611444451214 iter/sec (stddev: 0) 1.1053474023542524 iter/sec (stddev: 0) 2.31

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Collaborator

@raghumdani raghumdani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@yankevn yankevn marked this pull request as ready for review October 22, 2024 22:41
@yankevn yankevn merged commit 88ccf0c into main Oct 23, 2024
3 checks passed
@raghumdani raghumdani deleted the clear_multiround branch October 23, 2024 20:20
@yankevn yankevn changed the title Clear multiround Clear object store between rounds Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants