Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switching version check to spark session #321

Merged
merged 2 commits into from
Jul 10, 2024
Merged

Conversation

fdosani
Copy link
Member

@fdosani fdosani commented Jul 10, 2024

Fix for #320

Seems like Databricks does something to the pyspark.version.__version__. Switching to the actual spark session for the check in this case.

@achrusciel FYI, in case you can try out this specific branch to make sure it works for you.

@fdosani fdosani added the bug Something isn't working label Jul 10, 2024
@achrusciel
Copy link

I am testing the change right now. The code is now executing and I can see in the Spark UI of the cluster that jobs are being executed, and it works.

This is the first time I am switching from the LegacySparkCompare to the new SparkSQLCompare. The same comparison for about 5 million rows completes in ca. 2 minutes when using LegacySparkCompare and with the new SparkSQLCompare it has completed after 8 minutes successfully.

@fdosani
Copy link
Member Author

fdosani commented Jul 10, 2024

million rows completes in ca. 2 minutes when using LegacySparkCompare and with the new SparkSQLCompare it has completed after 8 minutes successfull

This is expected. The legacy version dropped duplicates and doesn't have some functionality which the new one has (aligns to the Pandas version). The main thing is the dropping duplicates.

@fdosani fdosani merged commit b9a2ae7 into develop Jul 10, 2024
30 checks passed
@fdosani fdosani deleted the spark-version-fix branch July 10, 2024 19:25
@fdosani fdosani mentioned this pull request Jul 10, 2024
rhaffar pushed a commit to rhaffar/datacompy that referenced this pull request Sep 11, 2024
* switching version check to spark session

* bumping version
rhaffar pushed a commit to rhaffar/datacompy that referenced this pull request Sep 12, 2024
* switching version check to spark session

* bumping version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants