-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfcs: graph api: support swish operation #2156
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the proposal.
For composition of eltwise support, I would actually advocate for using a composition of simpler ops (option 1) vs introducing new ops for each new eltwise (option 2).
There are a few reasons to that:
- Framework users often write eltwise operations in their script as composition of smaller ops (Most likely because they started using those eltwise ops before the FWK supported them). As an example, just search swish in HuggingFace github.
- There are often "equivalent" formulas that are used in the wild. Think
gelu_erf/gelu_tanh
, orhardswish(x) = x * hardsigmoid(x)
, orbounded_relu(x, alpha) = clip (x, 0, alpha)
, ..... This would be more scalable IMHO to just match those patterns internally than to expose new ops for each flavor. - as you mention, API simplicity: exposing dedicated op would still make it possible for user to pass composite op as in option 1. We would likely have to support both in library.
|
||
- Operation Kind: `Swish` (C++), `dnnl_graph_op_swish` (C). | ||
- Input/output: Single input, single output. | ||
- Attribute: `beta` (optional). `beta = 1.f` if not provided. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) why beta and not alpha?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review, @mgouicem. I didn't think too much about the naming here. It just followed the naming in OpenVINO and cuDNN. I feel alpha is also good to me if we think it's better to align with eltwise primitive.
@mgouicem Thanks for the review. I just incorporate the comments into the RFC and leave the conclusion section open for discussion. |
Thanks. I should have mentioned that we already have this composition for swish in our code (link). It indeed worked for some requests from frameworks. But we also see some issues as I mentioned in the cons of option 1.
Thanks for the link. I copied the link to the RFC also. :)
Yes, this is really a good point. Using composition of smaller operations will give us the flexibility to support more variants, without breaking or adding API. I also added this into the proposals. |
operations in oneDNN Graph is troublesome for some integrations. | ||
- Currently, oneDNN Graph Sigmoid operation does not support a multiplication | ||
`factor`. We may need to extend either the proposed Swish graph or the Sigmoid | ||
operation to support cases where `factor != 1.f`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since PyTorch's SiLU operation only supports factor = 1.f, it can be simply decomposed to Sigmoid + Multiply. When we dispatch these pattern to oneDNN swish kernel, we just set alpha = 1.f. It works perfectly.
But if we want to support factor != 1.f for other cases, the operation will be decomposed into Multiply + Sigmoid + Multiply with the first Multiply to do factor * x
. To dispatch this pattern into oneDNN swish kernel, we have more things to consider:
- We need to check that the factor input of the first Multiply should be a scalar.
- The value of the scalar should be const among iterations.
- The value of the scalar should be known at compilation stage as it's required to create primitive descriptors. But for now, the compile() API only accepts the logical tensors of input tensors, not the values of input tensors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THat is correct. For that case, you would need an extra attribute to Sigmoid with alpha value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Or, can we simply rely on binary + eltwise post-op + binary post-op for the cases where we don't know factor at compilation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is an option as well. THough I would expect for the specific case of switch, the alpha parameter is constant, and using switch eltwise would be faster.
The other thing is GPU support. alpha might not be a memory object on GPU.
@mgouicem Could you please take another look on this? The conclusion part has been updated in the last commit (I will squash them once ready for merge). Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
The RFC proposes adding support for Swish operation in Graph API.
Rendered version: link .