Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfcs: graph api: support swish operation #2156

Merged
merged 3 commits into from
Nov 21, 2024
Merged

Conversation

TaoLv
Copy link
Contributor

@TaoLv TaoLv commented Oct 9, 2024

The RFC proposes adding support for Swish operation in Graph API.

Rendered version: link .

@TaoLv TaoLv added the RFC A design document label Oct 9, 2024
@TaoLv TaoLv requested a review from a team as a code owner October 9, 2024 03:17
@TaoLv TaoLv requested a review from a team October 9, 2024 03:18
Copy link
Contributor

@mgouicem mgouicem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposal.
For composition of eltwise support, I would actually advocate for using a composition of simpler ops (option 1) vs introducing new ops for each new eltwise (option 2).

There are a few reasons to that:

  • Framework users often write eltwise operations in their script as composition of smaller ops (Most likely because they started using those eltwise ops before the FWK supported them). As an example, just search swish in HuggingFace github.
  • There are often "equivalent" formulas that are used in the wild. Think gelu_erf/gelu_tanh, or hardswish(x) = x * hardsigmoid(x), or bounded_relu(x, alpha) = clip (x, 0, alpha), ..... This would be more scalable IMHO to just match those patterns internally than to expose new ops for each flavor.
  • as you mention, API simplicity: exposing dedicated op would still make it possible for user to pass composite op as in option 1. We would likely have to support both in library.


- Operation Kind: `Swish` (C++), `dnnl_graph_op_swish` (C).
- Input/output: Single input, single output.
- Attribute: `beta` (optional). `beta = 1.f` if not provided.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) why beta and not alpha?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review, @mgouicem. I didn't think too much about the naming here. It just followed the naming in OpenVINO and cuDNN. I feel alpha is also good to me if we think it's better to align with eltwise primitive.

@TaoLv
Copy link
Contributor Author

TaoLv commented Oct 10, 2024

@mgouicem Thanks for the review. I just incorporate the comments into the RFC and leave the conclusion section open for discussion.

@TaoLv
Copy link
Contributor Author

TaoLv commented Oct 10, 2024

I would actually advocate for using a composition of simpler ops (option 1) ...

Thanks. I should have mentioned that we already have this composition for swish in our code (link). It indeed worked for some requests from frameworks. But we also see some issues as I mentioned in the cons of option 1.

As an example, just search swish in HuggingFace github.

Thanks for the link. I copied the link to the RFC also. :)
For these cases, if framework developers want to optimize it with oneDNN, they will have to detect the pattern, rewrite it with PyTorch's SiLU operation, and then call oneDNN to optimize the SiLU operation, if we don't consider further fusion into other operations like matmul or conv. So it still becomes if we want to use two or more operations to optimize SiLU or just one dedicated swish operation.

There are often "equivalent" formulas that are used in the wild.

Yes, this is really a good point. Using composition of smaller operations will give us the flexibility to support more variants, without breaking or adding API. I also added this into the proposals.

operations in oneDNN Graph is troublesome for some integrations.
- Currently, oneDNN Graph Sigmoid operation does not support a multiplication
`factor`. We may need to extend either the proposed Swish graph or the Sigmoid
operation to support cases where `factor != 1.f`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since PyTorch's SiLU operation only supports factor = 1.f, it can be simply decomposed to Sigmoid + Multiply. When we dispatch these pattern to oneDNN swish kernel, we just set alpha = 1.f. It works perfectly.
But if we want to support factor != 1.f for other cases, the operation will be decomposed into Multiply + Sigmoid + Multiply with the first Multiply to do factor * x. To dispatch this pattern into oneDNN swish kernel, we have more things to consider:

  • We need to check that the factor input of the first Multiply should be a scalar.
  • The value of the scalar should be const among iterations.
  • The value of the scalar should be known at compilation stage as it's required to create primitive descriptors. But for now, the compile() API only accepts the logical tensors of input tensors, not the values of input tensors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THat is correct. For that case, you would need an extra attribute to Sigmoid with alpha value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Or, can we simply rely on binary + eltwise post-op + binary post-op for the cases where we don't know factor at compilation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an option as well. THough I would expect for the specific case of switch, the alpha parameter is constant, and using switch eltwise would be faster.
The other thing is GPU support. alpha might not be a memory object on GPU.

@TaoLv
Copy link
Contributor Author

TaoLv commented Nov 20, 2024

@mgouicem Could you please take another look on this? The conclusion part has been updated in the last commit (I will squash them once ready for merge). Thank you!

Copy link
Contributor

@mgouicem mgouicem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@TaoLv TaoLv merged commit 8fd566e into oneapi-src:rfcs Nov 21, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC A design document
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants