-
-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved performance of rz_bv_copy_nbits and rz_bv_set_range #4740
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice start!
I can imagine we can even optimize the unaligned cases. But let's do this later.
I also changed your PR message. Because it closes #4716 only partially (missing unaligned cases).
You can add test cases in test/unit/test_bitvector.c
.
Once everything is implemented and passes, we can run the Travis CI to test it on big endian machines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing implementation of rz_bv_get_chunk
first_word &= ~(0xFFFFFFFF >> bit_offset); // Clear the upper bits | ||
second_word &= (0xFFFFFFFF >> (32 - bit_offset)); // Clear the lower bits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first_word &= ~(0xFFFFFFFF >> bit_offset); // Clear the upper bits | |
second_word &= (0xFFFFFFFF >> (32 - bit_offset)); // Clear the lower bits | |
first_word &= ~(UT32_MAX >> bit_offset); // Clear the upper bits | |
second_word &= (UT32_MAX >> (32 - bit_offset)); // Clear the lower bits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are touching something very critical within rizin, so please provide benchmark and C test for correctness and edge cases.
Your checklist for this pull request
Detailed description
This pull request optimizes the performance of the rz_bv_set_range function. The original implementation iterated through bits one at a time, leading to inefficiencies for large ranges. The updated implementation:
Processes aligned chunks of bits using system word size for faster operations.
Dynamically adjusts chunk size for different architectures (e.g., 32-bit, 64-bit).
Handles unaligned prefix and suffix bits separately while optimizing the main loop.
Adds robust boundary validation to ensure correctness.
This change reduces iteration overhead and improves performance while maintaining compatibility and correctness.
Test plan
Closing issues
Partially addresses #4716
...