-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow serialization with large Array data (> 60 Mbytes) #269
Comments
Interesting. I don't see any reason for Fuel to do that. There must be another mechanism at play. I currently don't have to look into this tough, in a week or two maybe. @tinchodias, do you have some time to spare maybe? |
Can you give me a rough timing estimate of what you mean by "slowly"? |
It was around 100K/sec |
This is very interesting. I found the issue to be time spent in |
Still investigating but interesting find: in Pharo 11 the string hashes appear to be badly distributed. |
Probably @guillep will be interested |
I'm not sure the hash distribution is bad per se, it's just that the hashes have only uneven numbers when using modulo 4096. It looks like |
Interesting finding :) |
I checked the usage of identity dictionaries. This kind of tells me that a better identity dictionary will drastically improve the performance of fuel. |
Brilliant, thanks! |
Oh nice, not just Fuel. The system relies a lot on IdentityDictionaries :) |
This issue has been automatically marked as stale because it has not had recent activity. It will remain open but will probably not come into focus. If you still think this should receive some attention, leave a comment. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will remain open but will probably not come into focus. If you still think this should receive some attention, leave a comment. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will remain open but will probably not come into focus. If you still think this should receive some attention, leave a comment. Thank you for your contributions. |
I recently had a 40MB serialized graph balloon to almost 400MB on disk, which translated into even more memory in the image. It turned out to be caused by a single OrderedCollection that had 99 million strings as items. Sounds potentially related... By the way, are there any tricks to investigate the cause of such a problem? It took me many hours to track down by selectively nil-ing out parts of the model until I found the collection. I never would've guessed it because it was not part of the model that seemed likely to grow that large. I tried the debugger, but memory use grew to about 16GB before the image started to fail e.g. "file does not exist" errors |
Just the standard "tricks":
With v5 I had introduced "graph culling", so you could have configured the depth of the graph, and maybe tried to increase the depth continuously, until you started seeing the issue. Then you could have set a break point at that depth level to inspect candidates. |
@seandenigris Did you use SpaceTally in your investigation? |
Hi i have the same problem with a 10mb fuel file... which takes less than 500ms to read, but almost 59seconds to produce... I tried a workaround published here by Mariano : http://forum.world.st/Fuel-serialize-of-70MB-takes-forever-on-Linux-vs-Mac-td5061002.html | set dict |
set := FLLargeIdentitySet.
dict := FLLargeIdentityDictionary.
Smalltalk at: #FLLargeIdentitySet put: IdentitySet.
Smalltalk at: #FLLargeIdentityDictionary put: IdentityDictionary. my serialization with fuel of the same data is now done in 4,6 seconds... |
I don't know the impact on my Pharo image with this kind of workaround... what should we do ? Eric. |
Well, it looks like it's time to switch to the standard collections :) I quickly checked and all tests pass using the standard collections instead of the Fuel custom ones. You should be good using the workaround you posted. |
i can propose a pull request on Pharo 12 branch |
Well, it looks like it's time to switch to the standard collections :)
What was the purpose of the custom collections vs. standard ones?
|
That would be nice.
Unfortunately, the original research paper doesn't say anything about those. I presume that Fuel exhibited performance issues with large collections due to hash collisions. Meanwhile, the VMs use 64 bit words and longer hashes, which probably has mitigated the issue. Maybe @marianopeck or @tinchodias can share some more details. |
It seems that Fuel takes a long time serializing an Array of Arrays containing multiple Strings each one. When serialization passes around 60 Mbytes, it starts writing data in chunks of 100Kbytes and very slowly:
To reproduce please download this Fuel serialization: https://filesender.renater.fr/?s=download&token=b3809275-6258-4ee4-8d93-3b8871b7a0bd
Materialize it in a fresh Pharo 11 image:
and then from the Inspector window evaluate:
We tested in Pharo 11 under macOS.
The text was updated successfully, but these errors were encountered: