-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage equivariant MLP #94
Comments
Hi, did you see the MLP example https://github.com/QUVA-Lab/escnn/blob/master/examples/mlp.ipynb? an equivariant MLP doesn't expect a base space (2D nor 3D), it works exactly as a classic MLP and takes only a stack of feature fields:
As a result, you will give your MLP "flat" stack of features (here, 1 copy of [scalar, vector]) and get back another stack of features but now with 12 copies. |
Following your suggestion, say I build a equiv MLP of input, hidden, output equivariant linear layers with some activation function. Such that the hidden layer group representation is defined by By Shur's lemma, we know there is no linear map between feature fields of different types, therefore, this naive construction of the equivariant MLP will result in a network which never mixes the signals from scalar and vector representations. That is, this network, will result in a decoupled network processing only scalar fields to scalar fields, and vector fields to vector fields. This is clearly a bad architectural design. To mix fields of different types, we are required to perform the CG tensor product, however it is not clear how to use this, and specially it is unclear what are good design principles for embedding the CG tensor product in the architecture. Any insights? |
Please note that the interaction between irreps will happen in non-linearity (e.g. QuotientELU), so use nn.TensorProductModule is not the only way. |
Hi @maxxxzdn can you point out a paper/lecture-note/escnn-documentation page where the action of this quotient activations are clearly explained? I am afraid I am unable to comprehend from the documentation how signals from different irreps are being mixed in this fashion. |
Maybe it's simpler to explain it via an example. Say we build a G=SO(3) equivariant MLP using Fourier-based pointwise non-linearities. If we use the FourierELU non-linearity, it assumes we employ a bandlimitied regular representation of SO(3). Then, the resulting architecture resembles the internal layers of the original Spherical CNNs paper by Cohen et al. (which is equivalent to a GCNN over SO(3)), which alternates
Our SO(3) MLP would be essentially identical, with the difference that we don't actually implement the FFT but use the dense FT matrix and that we represent the sequence as follows for convenience (s.t. the FT and IFT are merged in the non-linear layer):
Note that the convolution theorem used above is just another way of thinking about Schur's Lemma! In case of quotient representations, we restrict the considerations to signals over a quotient space X=G/H which are nothing more than signals over G but which are constant over the H cosets. This is the case for spherical signals, which can be thought as SO(3) signals constant over the SO(2) fibers. For example, for spherical signals, we only need the spherical harmonics (which correspond to one column of each of the Wigner D matrices, in the right basis). Mixture of frequencies happens in the non-linearity which acts pointwise in the "spatial" domain, while convolution acts pointwise in the "frequency" domain. This is the exact same design idea behind any convolution network (covering both CNNs and GCNNs). Whether this strategy is better than tensor-product / Clebsh-Gordan transform is more of a practical question. I hope this is useful! Best, |
self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias) |
Hi,
i am attempting to build a equivariant Variational Encoder-Decoder framework.
For this I am using R2Conv() and R3(Conv) layers in the encoder with trivial-representation input & output and regular-representations in between. For the Decoder I would like to use equivariant MLPs. However it is quite unclear to me how the examples map to a generic MLP.
For example I do not understand how one could specify the input and output-dimension respectively. Instead it seems to me, that the equivariant MLP expects (just like a CNN) a 2D or 3D dimensional input, and that the output dimension is determined by the Harmonics-decomposition of functions on that space. In contrast to that a MLP accepts a flat input and the (flat) output dimension is a hyperparameter specified by the user.
During my learning process, I start with a rectangular input grid of shape [B,1,X,Y,Z] corresponding to a scalar (field trivial representation). Use R3(Conv) to get [B,1,X,Y,1] with one hidden regular-representation and a trivial representation output, store [B,1,Z_encoding_size] as the encoding of Z and continue with [B,X,Y,1] and R2Conv() to obtain the encodings of X and Y in shape
[B, 1, X_encoding_size, Y_encoding_size].
A final linear layer maps the [B,1, X_encoding_size , Y_encoding_size , Z_encoding_size] shaped encoding to a latent-space that parametrizes the mean and variance of a distribution.
This to me seems more or less clear. The Decoder part much less.
I really hope for some clarification. The equivariant learning procedure is something I only discovered a week ago and it seems like opening the Pandora box considering all the nice but extensive theory behind it.
Sadly I do not have the time to pick up on it nor is there anyone in my environment who knows that stuff.
Is it reasonable to expect having a learning model within a week?
The text was updated successfully, but these errors were encountered: