Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage equivariant MLP #94

Open
vec123 opened this issue Dec 5, 2023 · 6 comments
Open

Usage equivariant MLP #94

vec123 opened this issue Dec 5, 2023 · 6 comments

Comments

@vec123
Copy link

vec123 commented Dec 5, 2023

Hi,

i am attempting to build a equivariant Variational Encoder-Decoder framework.

For this I am using R2Conv() and R3(Conv) layers in the encoder with trivial-representation input & output and regular-representations in between. For the Decoder I would like to use equivariant MLPs. However it is quite unclear to me how the examples map to a generic MLP.

For example I do not understand how one could specify the input and output-dimension respectively. Instead it seems to me, that the equivariant MLP expects (just like a CNN) a 2D or 3D dimensional input, and that the output dimension is determined by the Harmonics-decomposition of functions on that space. In contrast to that a MLP accepts a flat input and the (flat) output dimension is a hyperparameter specified by the user.

During my learning process, I start with a rectangular input grid of shape [B,1,X,Y,Z] corresponding to a scalar (field trivial representation). Use R3(Conv) to get [B,1,X,Y,1] with one hidden regular-representation and a trivial representation output, store [B,1,Z_encoding_size] as the encoding of Z and continue with [B,X,Y,1] and R2Conv() to obtain the encodings of X and Y in shape
[B, 1, X_encoding_size, Y_encoding_size].
A final linear layer maps the [B,1, X_encoding_size , Y_encoding_size , Z_encoding_size] shaped encoding to a latent-space that parametrizes the mean and variance of a distribution.

This to me seems more or less clear. The Decoder part much less.

I really hope for some clarification. The equivariant learning procedure is something I only discovered a week ago and it seems like opening the Pandora box considering all the nice but extensive theory behind it.
Sadly I do not have the time to pick up on it nor is there anyone in my environment who knows that stuff.
Is it reasonable to expect having a learning model within a week?

@maxxxzdn
Copy link

maxxxzdn commented Mar 5, 2024

Hi,

did you see the MLP example https://github.com/QUVA-Lab/escnn/blob/master/examples/mlp.ipynb?

an equivariant MLP doesn't expect a base space (2D nor 3D), it works exactly as a classic MLP and takes only a stack of feature fields:

G = group.so3_group()
        
# since we are building an MLP, there is no base-space
gspace = gspaces.no_base_space(self.G)
        
# assume you have scalar and vector quantities in your output:
scalar_repr = gspace.trivial_repr
vector_repr = gspace.fibergroup.standard_representation()

# assume your output goes like [[scalar, vector], [scalar, vector], ...., [scalar, vector]]
channel_repr = group.directsum([scalar_repr, vector_repr])

# specify the number of channels in input and output
c_in = 1
c_out = 12
in_repr = c_in * [channel_repr]
out_repr = c_out * [channel_repr]

# define feature field type
in_type = gspace.type(*in_repr)
out_type = gspace.type(*out_repr)

# define your MLP
mlp = MLP(in_type, out_type)

As a result, you will give your MLP "flat" stack of features (here, 1 copy of [scalar, vector]) and get back another stack of features but now with 12 copies.

@Danfoa
Copy link
Contributor

Danfoa commented Mar 18, 2024

Hi @maxxxzdn and @Gabri95,

Following your suggestion, say I build a equiv MLP of input, hidden, output equivariant linear layers with some activation function. Such that the hidden layer group representation is defined by hidden_repr = c_in * [channel_repr]

By Shur's lemma, we know there is no linear map between feature fields of different types, therefore, this naive construction of the equivariant MLP will result in a network which never mixes the signals from scalar and vector representations. That is, this network, will result in a decoupled network processing only scalar fields to scalar fields, and vector fields to vector fields. This is clearly a bad architectural design.

To mix fields of different types, we are required to perform the CG tensor product, however it is not clear how to use this, and specially it is unclear what are good design principles for embedding the CG tensor product in the architecture.

Any insights?

@maxxxzdn
Copy link

Please note that the interaction between irreps will happen in non-linearity (e.g. QuotientELU), so use nn.TensorProductModule is not the only way.

@Danfoa
Copy link
Contributor

Danfoa commented Apr 12, 2024

Hi @maxxxzdn can you point out a paper/lecture-note/escnn-documentation page where the action of this quotient activations are clearly explained? I am afraid I am unable to comprehend from the documentation how signals from different irreps are being mixed in this fashion.

@Gabri95
Copy link
Collaborator

Gabri95 commented Sep 2, 2024

Maybe it's simpler to explain it via an example. Say we build a G=SO(3) equivariant MLP using Fourier-based pointwise non-linearities.

If we use the FourierELU non-linearity, it assumes we employ a bandlimitied regular representation of SO(3). Then, the resulting architecture resembles the internal layers of the original Spherical CNNs paper by Cohen et al. (which is equivalent to a GCNN over SO(3)), which alternates

  • SO(3) FFT
  • Convolution (simpler multiplication in the Fourier Domain due to convolution theorem)
  • SO(3) IFFT (sampling)
  • Pointwise non-linearity

Our SO(3) MLP would be essentially identical, with the difference that we don't actually implement the FFT but use the dense FT matrix and that we represent the sequence as follows for convenience (s.t. the FT and IFT are merged in the non-linear layer):

  • Convolution (simpler multiplication in the Fourier Domain due to convolution theorem)
  • FourierELU layer
    - SO(3) IFFT (sampling)
    - Pointwise non-linearity
    - SO(3) FFT

Note that the convolution theorem used above is just another way of thinking about Schur's Lemma!

In case of quotient representations, we restrict the considerations to signals over a quotient space X=G/H which are nothing more than signals over G but which are constant over the H cosets. This is the case for spherical signals, which can be thought as SO(3) signals constant over the SO(2) fibers.
The formulation above remains identical but, because the signal lives in a smaller space, we require less Fourier coefficients and, since the signal is constant along the H fibers, we only require sampling in the domain X rather than G.

For example, for spherical signals, we only need the spherical harmonics (which correspond to one column of each of the Wigner D matrices, in the right basis).
Once can prove that this architecture is equivalent to a Spherical CNN using only zonal spherical filters (i.e. filters which are invariant wrt rotations along a certain axis): this is of course less expressive than a full GCNN over SO(3), but is also less expensive.

Mixture of frequencies happens in the non-linearity which acts pointwise in the "spatial" domain, while convolution acts pointwise in the "frequency" domain. This is the exact same design idea behind any convolution network (covering both CNNs and GCNNs).

Whether this strategy is better than tensor-product / Clebsh-Gordan transform is more of a practical question.
Here's a couple of useful insights, though.
The Clebsh-Gordan transform is nothing more than a quadratic non-linearity, which means it can at most double the number of frequencies in a signal. This is convenient because the output signal is still bandlimited and the operation can be implemented in an exact way, preserving exact equivariance.
Conversely, general pointwise activations can introduce arbitrarily high frequencies and have more freedom to mix them but this comes with the drawback that they typically require some approximation and don't guarantee exact equivairance (although this error can be controlled).

I hope this is useful!

Best,
Gabriele

@yaosx425
Copy link

你好

https://github.com/QUVA-Lab/escnn/blob/master/examples/mlp.ipynb 看到 MLP 示例了吗?

等变 MLP 不需要基本空间(2D 或 3D),它的工作方式与经典 MLP 完全相同,并且只采用一堆特征字段:

G = group.so3_group()
        
# since we are building an MLP, there is no base-space
gspace = gspaces.no_base_space(self.G)
        
# assume you have scalar and vector quantities in your output:
scalar_repr = gspace.trivial_repr
vector_repr = gspace.fibergroup.standard_representation()

# assume your output goes like [[scalar, vector], [scalar, vector], ...., [scalar, vector]]
channel_repr = group.directsum([scalar_repr, vector_repr])

# specify the number of channels in input and output
c_in = 1
c_out = 12
in_repr = c_in * [channel_repr]
out_repr = c_out * [channel_repr]

# define feature field type
in_type = gspace.type(*in_repr)
out_type = gspace.type(*out_repr)

# define your MLP
mlp = MLP(in_type, out_type)

因此,您将为您的 MLP 提供“平面”特征堆栈(此处为 [scalar, vector] 的 1 个副本)并返回另一个特征堆栈,但现在有 12 个副本。

self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)
self.attn_drop = nn.Dropout(attn_drop)
self.proj = nn.Linear(dim, dim)
self.proj_drop = nn.Dropout(proj_drop)
Hello, I have studied the example document you published about equivariant MLP, but it is not very clear. I would like to ask, the code above is the Linear operation implemented by torque's nn. If I change to equivariant Linear, how should I write it? Also, if I want to pass parameters of type flipRot2dOnR2 to equivariant linear, can I? What should I do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants