Neuralized K-Means #197

jackmcrider · 2023-08-16T17:22:10Z

Hi chr5tphr!

I started an attempt to implement (deep) neuralized k-means (https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9817459) as more people want to use it and ask for code.

I took the SoftplusCanonizer from the docs as a starting point.

Main changes:

Add Distance, NeuralizedKMeans and LogMeanExpPool in zennit.layer
Add KMeansCanonizer in zennit.canonizers

Some things can be optimized:

add tests (I wrote a few, but then changed something and did not bother to rewrite tests unless I see a chance of this being merged)
copy.deepcopy in KMeansCanonizer.register probably not ideal
- alternative: clone distance_module.centroids and import Distance from layer.py; then, create new Distance instance in remove()
KMeansCanonizer.apply scans for Distance layers, but there can be distance layers in some architectures that don't belong to k-means
one idea would be add a "contrastive layer" and identify kmeans as a composition of Distance followed by Contrastive similar to MergeBatchNorm:
- contrastive computes (out[:,None,:] - out[None,:,:])[mask].reshape(K,K-1,D), cf. line 379-384 in canonizers.py
several advantages:
- output of k-means and neuralized k-means are identical,
- contrastive layer could also be applied to classifiers to get class-contrastive explanations (cf. Fig. 10.5 in https://iphome.hhi.de/samek/pdf/MonXAI19.pdf)
- could replace NeuralizedKMeans layer by something more general (difference between squared distances and difference between linear layers are both linear, i.e. a contrastive layer covers both; would still need two separate Canonizers)
the scanning in KMeansCanonizer.apply does not work if Distance is the first layer (kmeans in input space), one can do Sequential(Distance(centroids)) as a trick, but not ideal
if I understand correctly, KMeansCanonizer.register is executed each time one calls with Gradient(...) as attributor; could be a bottleneck if number of clusters is large

Closes #198

src/zennit/layer.py

jackmcrider · 2023-08-16T21:20:50Z

I just found the contributing guide and converted this to draft for now since I broke every single guideline.

@p16i I clicked somewhere and triggered a review request. Please ignore.

- documentation in numpydoc format - pylint + flake8 stuff - KMeansCanonizer - NeuralizedKMeans layer - LogMeanExpPool layer - Distance layer - Distance type

jackmcrider · 2023-08-17T13:53:50Z

I tried to merge everything into one commit, extended documentation and made sure that all checks pass.
Could be reviewed, but there are no functional changes.

- Explaining Deep Cluster Assignments with Neuralized K-Means on Image Data - I tried to adhere to guidelines - That means: random data, random weights - Code for real data and real weights in comments - Runs on colab, did not test blender - also adds the reference to docs/source/tutorial/index.rst

jackmcrider · 2023-08-18T16:21:01Z

Checks pass 👍

It's quite challenging to get reproducible tox results for tutorials (e.g. had to manually fiddle with metadata.kernelspec.name in the raw .ipynb before committing). Probably a limitation of tox, but could be documented for future contributors. I'm not sure where, maybe in Contributing#continuous-integration.

I'm gonna freeze this branch for now, unless something comes up.

chr5tphr

Hey @jackmcrider

thanks a lot for the contribution!
I have looked at your implementation and left a few comments.

I have not yet looked at the tutorial.

src/zennit/layer.py

src/zennit/canonizers.py

Co-authored-by: Christopher <[email protected]>

change `torch.log(torch.tensor(n_dims, dtype=...))` to `math.log(n_dims)` Co-authored-by: Christopher <[email protected]>

change `setattr(parent_module, ...)` to `parent_module.add_module(...)` Co-authored-by: Christopher <[email protected]>

add spaces around binary operators Co-authored-by: Christopher <[email protected]>

- rename Distance to PairwiseCentroidDistance - remove LogMeanExpPool (might become relevant again, but not for now) - add MinPool1d and MinPool2d in layer.py - add MinTakesMost1d, MaxTakesMost1d, MinTakesMost2d, MaxTakesMost2d rules - largely untested. especially kernel_size as int or kernel_size as tuple - in principle, MaxTakesMost2d should also work for MaxPoll2d layers in standard conv nets - but needs some testing - add abstract TakesMostBase class - remove type definition for Distance in types.py - adapt KMeans canonizer: - replace LogMeanExpPool with MinPool1d followed by torch.nn.Flatten - remove beta parameter; beta is now sit in MinTakesMost1d - remove deepcopy and simply return the module itself - update docs/src/tutorials/deep_kmeans.ipynb - doc strings

- merge changes coming from github web interface

- various non-functional changes

jackmcrider · 2023-09-11T15:52:26Z

I have commited a new version with roughly these changes:

remove LogMeanExpPool
add layers MinPool1d and MinPool2d (simple inheritance from the PyTorch MaxPool* classes)
add rules MinTakesMost1d, MinTakesMost2d, MaxTakesMost1d, MaxTakesMost2d
change KMeansCanonizer to use MinPool1d instead of LogMeanExpPool
change tutorial to use MinTakesMost1d at the output layer
apply changes requested in review (rename Distance to PairwiseCentroidDistance, do self.parent_module.add_module instead of setattr(self.parent_module...)

I'm not sure if we want four rules for the *TakesMost* or if one rule with mode='max'/mode='min' and some autodetection for the 1d/2d is better.

p16i reviewed Aug 16, 2023

View reviewed changes

src/zennit/layer.py Show resolved Hide resolved

jackmcrider requested a review from p16i August 16, 2023 21:10

jackmcrider marked this pull request as draft August 16, 2023 21:17

jackmcrider force-pushed the master branch 3 times, most recently from 1c64129 to cbb350e Compare August 17, 2023 11:48

Neuralized K-Means

f3b6ba2

- documentation in numpydoc format - pylint + flake8 stuff - KMeansCanonizer - NeuralizedKMeans layer - LogMeanExpPool layer - Distance layer - Distance type

jackmcrider force-pushed the master branch from cbb350e to f3b6ba2 Compare August 17, 2023 11:54

jackmcrider marked this pull request as ready for review August 17, 2023 13:51

jackmcrider force-pushed the master branch 6 times, most recently from 2bf6f52 to 7047178 Compare August 18, 2023 14:43

jackmcrider force-pushed the master branch from 7047178 to 077d583 Compare August 18, 2023 14:48

chr5tphr requested changes Aug 22, 2023

View reviewed changes

jackmcrider and others added 7 commits August 24, 2023 09:45

Update src/zennit/layer.py

465d8bf

Co-authored-by: Christopher <[email protected]>

Update src/zennit/layer.py

f38a663

change `torch.log(torch.tensor(n_dims, dtype=...))` to `math.log(n_dims)` Co-authored-by: Christopher <[email protected]>

Update src/zennit/canonizers.py

661d280

change `setattr(parent_module, ...)` to `parent_module.add_module(...)` Co-authored-by: Christopher <[email protected]>

Update src/zennit/canonizers.py

bf31955

add spaces around binary operators Co-authored-by: Christopher <[email protected]>

Merge branch 'master' of github.com:jackmcrider/zennit

6a55fde

- merge changes coming from github web interface

tox compliance

7fb44cc

- various non-functional changes

jackmcrider force-pushed the master branch from aacf9c8 to 7fb44cc Compare September 8, 2023 14:39

jackmcrider requested a review from chr5tphr September 11, 2023 15:39

max takes most fix

b042aae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Neuralized K-Means #197

Neuralized K-Means #197

jackmcrider commented Aug 16, 2023 •

edited

Loading

jackmcrider commented Aug 16, 2023

jackmcrider commented Aug 17, 2023

jackmcrider commented Aug 18, 2023

chr5tphr left a comment

jackmcrider commented Sep 11, 2023

Neuralized K-Means #197

Are you sure you want to change the base?

Neuralized K-Means #197

Conversation

jackmcrider commented Aug 16, 2023 • edited Loading

jackmcrider commented Aug 16, 2023

jackmcrider commented Aug 17, 2023

jackmcrider commented Aug 18, 2023

chr5tphr left a comment

Choose a reason for hiding this comment

jackmcrider commented Sep 11, 2023

jackmcrider commented Aug 16, 2023 •

edited

Loading