-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Simplify CometShuffleMemoryAllocator to use Spark unified memory allocator #1063
Conversation
test failure:
I think we need to specify For this PR we should also fall back to Spark for shuffle if |
Basically Spark tests are running with on-heap config, except for tests that particularly for off-heap test. I'm not sure if enabling off-heap for all Spark tests can pass them all. If it works, let's do it. If not, I plan to keep and rename current CometShuffleMemoryAllocator to a test-only class CometTestShuffleMemoryAllocator. Once it runs Spark tests, Comet can use CometTestShuffleMemoryAllocator to run Spark tests. |
aab274e
to
efc32ee
Compare
@andygrove All Spark tests are passed now. |
I tried testing with TPC-H but see a memory issue:
|
One other issue. I tested with |
I will test it locally too. |
Yes. If using on-heap config, CometShuffleMemoryAllocator will throw runtime error, i.e., you need to use off-heap config in Spark. The test only CometTestShuffleMemoryAllocator is only used in Spark tests (as they are used for on-heap mostly). |
Hmm, I just ran TPC-H with this PR on Spark 3.4 using datafusion-comet script without any error. |
Right, so if the user is using on-heap, we should not use Comet shuffle and should fall back to Spark. We probably just need to update |
Oh, I see. That sounds good. I will update it. |
These are the settings that I am using. I am running in k8s. $SPARK_HOME/bin/spark-submit \
--master $SPARK_MASTER \
--conf spark.eventLog.enabled=false \
--conf spark.plugins=org.apache.spark.CometPlugin \
--conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \
--conf spark.driver.memory=8G \
--conf spark.memory.offHeap.enabled=true \
--conf spark.memory.offHeap.size=12g \
--conf spark.executor.instances=4 \
--conf spark.executor.memory=30719m \
--conf spark.executor.cores=6 \
--conf spark.comet.memory.overhead.factor=0.04 \
--conf spark.comet.exec.enabled=true \
--conf spark.comet.exec.shuffle.enabled=true \
--conf spark.comet.exec.shuffle.mode=jvm \ |
This is what I used to run:
|
I don't set |
This is the value from #886, which I think this PR is intended to close. I ran a clean build this morning and did not see the segfault, so it is possible that I picked up an old docker image ... I will continue testing this morning. |
Actually, this PR won't close #886 because this is still using a singleton, so let's ignore that for now. This PR LGTM and I will approve after some more testing. |
It fallbacks to Spark shuffle now if off-heap is not enabled. |
67fab58
to
e7e7847
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @viirya
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1063 +/- ##
============================================
- Coverage 34.46% 34.19% -0.28%
+ Complexity 888 884 -4
============================================
Files 113 115 +2
Lines 43580 42765 -815
Branches 9658 9346 -312
============================================
- Hits 15021 14622 -399
+ Misses 25507 25279 -228
+ Partials 3052 2864 -188 ☔ View full report in Codecov by Sentry. |
As now the allocator uses all available memory on the executor (we don't specify memory size on the allocator), it should not be an issue for #886 now. @andygrove Do you want to re-check if #886 can be fixed by this PR too? Thanks. And similar to TaskMemoryManager, I think it makes more sense to have a singleton of memory allocator for shuffle writers in same executor. |
Can we make |
Yes. They should be internal configs now. Let me update it now. |
c6b92ec
to
5425868
Compare
I will test this again today. |
I'm running into SIGSEGV issues again.
I will try running the same benchmark on main. edit: I cannot reproduce on main because it fails there with
|
I increased the off-heap pool size, and now I can run TPC-H q5 @ sf=1TB on the |
Let me see if I can reproduce it. |
Ah, I figured out what was wrong there. I updated this with the change. I ran the benchmarks locally and didn't see the error. Please also run the benchmarks to verify it fixes the error. Thanks. @andygrove |
// CometShuffleMemoryAllocator stores pages in TaskMemoryManager which is not singleton, | ||
// but one instance per task. So we need to create a new instance for each task. | ||
return new CometShuffleMemoryAllocator(taskMemoryManager, pageSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should also address my concerns about #886. I am testing now.
* created. For Spark tests, this returns `CometTestShuffleMemoryAllocator` which is a test-only | ||
* allocator that should not be used in production. | ||
*/ | ||
public static synchronized CometShuffleMemoryAllocatorTrait getInstance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit, but we could stop making the method synchronized and add a synchronized block around the INSTANCE
creation when in test mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, sounds good. I will update this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I no longer see memory errors with the recent changes.
Cool. Thanks @andygrove for verifying it. |
Which issue does this PR close?
Closes #1064
Closes #886
Rationale for this change
What changes are included in this PR?
How are these changes tested?