You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am attempting to implement Conservative LRP rules as part of #184 since I need it for a university project. I am running into certain issues with the classifier heads and was hoping you could point out what I'm potentially doing wrong.
I have so far implemented a composite in the following manner:
Where AHConservative and LNConservative are Rules described in the CLRP Paper.
I have also implemented a custom attributor which calculates relevance scores with respect to the embeddings (since integer inputs are not differentiable). However the problem I am facing seems to appear also with a basic Gradient attributor. Here is a minimal example:
Working with BertForSequenceClassification from the Huggingface transformers library the classifier head is an nn.Linear module with the size (768, 2). I am however getting an error from the Epsilon rule, specifically from the gradient_mapper:
Exception has occurred: RuntimeError (note: full exception trace is shown but execution is paused at: wrapper)
The size of tensor a (768) must match the size of tensor b (2) at non-singleton dimension 1
File "/home/chris/zennit/src/zennit/rules.py", line 120, in <lambda>
gradient_mapper=(lambda out_grad, outputs: out_grad / stabilizer_fn(outputs[0])),
File "/home/chris/zennit/src/zennit/core.py", line 539, in backward
grad_outputs = self.gradient_mapper(grad_output[0], outputs)
File "/home/chris/zennit/src/zennit/core.py", line 388, in wrapper (Current frame)
return hook.backward(module, grad_input, grad_output)
File "/home/chris/zennit/.venv/lib/python3.10/site-packages/torch/autograd/__init__.py", line 303, in grad
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/chris/zennit/src/zennit/attribution.py", line 257, in grad
gradient, = torch.autograd.grad(
File "/home/chris/zennit/src/zennit/attribution.py", line 287, in forward
return self.grad(input, attr_output_fn)
File "/home/chris/zennit/src/zennit/attribution.py", line 181, in __call__
return self.forward(input, attr_output_fn)
File "/home/chris/zennit/t3.py", line 25, in <module>
out, relevance = attributor(input.float(), attr_output=torch.ones((1, 2)))
File "/home/chris/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/chris/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
RuntimeError: The size of tensor a (768) must match the size of tensor b (2) at non-singleton dimension 1
I do understand that the tensor sizes must agree for division in the gradient_mapper. I am therefore suspecting I'm mishandling the classifier head but I am not sure how to proceed. Should I upsample the output to the size of input and just use Epsilon? Should I implement a custom rule? Any help would be largely appreciated! I'd love to get a rough implementation of the changes suggested in #184 working so we could push the needle on XAI for Transformers a bit.
The text was updated successfully, but these errors were encountered:
thanks a lot for working on this, I am really excited to see this working in Zennit at some point.
From your minimal example, I cannot see an immediate problem (this indeed runs
without specifying the Attention rules).
However, looking at the Huggingface transformer implementation, I heavily
suspect that you are hitting a current limitation in Zennit, which is that only
modules with single inputs/outputs are supported. I was not aware of the
importance of this for Transformers, so I am giving this a high priority.
Although, back in the day, I drafted #168 to implement multiple inputs (but not outputs), I did
not like the draft very much, on top of it not supporting multiple outputs.
I decided to postpone the param relevances, which are somewhat less important,
to a later PR, and started drafting #196 , which also supports multiple outputs and keyword arguments (as required by the huggingface transformers).
Unfortunately, we will probably not be able to get the Transformers working
before this is implemented. I will try to get #196 done as soon as possible.
Sorry for the late response and thank you for pointing out the single-input/single-output limitations. Indeed, this seems like the issue here. I'm in no massive rush, since we have also had some success with applying Shapley Value explanations to the language models we're using for our project but I'd still be willing to devote some time to getting CLRP working in the scope of Zennit. I think Zennit is the most mature and the most well-thought-out XAI framework for PyTorch created to date, so it makes sense to pool all the efforts into Zennit instead of creating custom, less portable implementations of the idea elsewhere.
I will hang around and watch #196 and once it's done I will try my hand at implementing CLRP Rules. Thank you so so much for creating and maintaining this project! It is a massive help to the XAI community.
Hi,
I am attempting to implement Conservative LRP rules as part of #184 since I need it for a university project. I am running into certain issues with the classifier heads and was hoping you could point out what I'm potentially doing wrong.
I have so far implemented a composite in the following manner:
Where
AHConservative
andLNConservative
are Rules described in the CLRP Paper.I have also implemented a custom attributor which calculates relevance scores with respect to the embeddings (since integer inputs are not differentiable). However the problem I am facing seems to appear also with a basic
Gradient
attributor. Here is a minimal example:Working with
BertForSequenceClassification
from the Huggingfacetransformers
library the classifier head is annn.Linear
module with the size(768, 2)
. I am however getting an error from theEpsilon
rule, specifically from thegradient_mapper
:I do understand that the tensor sizes must agree for division in the
gradient_mapper
. I am therefore suspecting I'm mishandling the classifier head but I am not sure how to proceed. Should I upsample the output to the size of input and just useEpsilon
? Should I implement a custom rule? Any help would be largely appreciated! I'd love to get a rough implementation of the changes suggested in #184 working so we could push the needle on XAI for Transformers a bit.The text was updated successfully, but these errors were encountered: