-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialization/deserialization of optimized tract models #1313
Comments
Hey, thanks for your interest. You should give a try to the NNEF serialization. It's significantly faster to load and optimise than an ONNX model. |
Is there a way to load a model, optimize it with tract, and then save it back? |
The NNEF serialization is a step towards this, as you'll save the "decluttered" model. And decluttering account for the most expensive part of the the loading/declutter/optimize workflow (more than the actual optimisation). There is no way to dump and reload a tract fully optimized model at this stage. |
If there is no such thing yet, it would be a good idea to start by at least providing public access to all the necessary internals of your IR so that I can create my own utility without a fork. Is IR public? |
Well, I see that |
Yeah, the "IR" is just tract-core with TypedModel with some optimized operators. Most operators will retain their "decluttered" form, because there is not much to gain in optimizing them, but the most important ones (MatMul & co) are heavily modified. There is no commitment on stability of operators (decluttered and optimised). Additionally optimized operators are not portable from one architecture to another. |
Hi, I intend to use
tract
for inference withAWS Lambda
. I've observed that the initialization and optimization of ONNX models (from &[u8]) can be 2-3 times slower than the actual model execution. Perhaps it's a good idea to introduce a method for storing your graph IR as &[u8]?The text was updated successfully, but these errors were encountered: