Optimize Error Handling and Regex Caching in Tensor Loading #221

Madhav-MKNC · 2024-03-19T16:47:58Z

This PR introduces two key enhancements to the tensor loading process: (Fixes #220)

Improved error handling within ThreadPoolExecutor to provide detailed logs for failures during parallel tensor loading.
Implementation of caching for regex operations in get_load_path_str to reduce computational overhead and improve loading efficiency.

These changes aim to enhance the robustness and performance of tensor loading, particularly in distributed computing environments.

…perations with Caching

checkpoint.py

Aareon · 2024-03-22T23:32:25Z

Hey! I'm happy to see a limit set for the memory usage in LRU Cache. Would it be possible for you to roll a test for this change? I think a pytest test would suffice.

Aareon · 2024-03-22T23:34:19Z

Perhaps path_tuple_to_string can also utilize the LRU cache, as well. Just a thought.

…egex Operations with Caching

Aareon · 2024-03-23T20:57:35Z

checkpoint.py

+"""
+For path_tuple_to_string(),
+introducing a simple caching mechanism to avoid recomputing regex matches for paths that have already been processed.
+"""


Is this docstring supposed to be here, or inside the path_tuple_to_string method?

Aareon · 2024-03-27T04:24:19Z

After delving a bit. I now think 32MB in LRU cache may be too much overkill. I don't think more than 250kb at most should be necessary.

RaphaelFakhri · 2024-03-30T20:47:19Z

"delving" hahahaha (for reference: https://x.com/JeremyNguyenPhD/status/1774021645709295840)

Aareon · 2024-04-05T22:19:00Z

"delving" hahahaha (for reference: https://x.com/JeremyNguyenPhD/status/1774021645709295840)

Hadn't seen this, and the stat doesn't really correspond with anything related to ChatGPT. I most certainly didn't need to use ChatGPT to come to that conclusion.

I recommend trying both ways and running benchmarks to see which provides the best performance improvement.

Madhav-MKNC added 2 commits March 19, 2024 21:58

Enhanced Error Handling in load_tensors()

3fd4e7c

get_load_path_str() -> get_load_path_str_cached(): Optimizing Regex O…

5fc8239

…perations with Caching

Aareon reviewed Mar 20, 2024

View reviewed changes

checkpoint.py Outdated Show resolved Hide resolved

Set maxsize to 32MB in lru_cache() for reasonable memory usage

104affb

path_tuple_to_string() -> path_tuple_to_string_cached(): Optimizing R…

10fd227

…egex Operations with Caching

Aareon reviewed Mar 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Error Handling and Regex Caching in Tensor Loading #221

Optimize Error Handling and Regex Caching in Tensor Loading #221

Madhav-MKNC commented Mar 19, 2024

Aareon commented Mar 22, 2024

Aareon commented Mar 22, 2024

Aareon Mar 23, 2024

Aareon commented Mar 27, 2024

RaphaelFakhri commented Mar 30, 2024 •

edited

Loading

Aareon commented Apr 5, 2024

Optimize Error Handling and Regex Caching in Tensor Loading #221

Are you sure you want to change the base?

Optimize Error Handling and Regex Caching in Tensor Loading #221

Conversation

Madhav-MKNC commented Mar 19, 2024

This PR introduces two key enhancements to the tensor loading process: (Fixes #220)

Aareon commented Mar 22, 2024

Aareon commented Mar 22, 2024

Aareon Mar 23, 2024

Choose a reason for hiding this comment

Aareon commented Mar 27, 2024

RaphaelFakhri commented Mar 30, 2024 • edited Loading

Aareon commented Apr 5, 2024

RaphaelFakhri commented Mar 30, 2024 •

edited

Loading