How would you expand upon the Little Learner to build a transformer neural network? #23

rymaju · 2023-03-26T23:23:24Z

rymaju
Mar 26, 2023

Attention was addressed a briefly at the end of the book, but I'm really curious how one might do this. Hoping someone with a better understanding could chime in!

themetaschemer · 2023-03-27T05:31:04Z

themetaschemer
Mar 27, 2023
Maintainer

Hi @rymaju

Great question! The nice part about malt is that programming in it should be pretty natural for the most part. You can start by defining functions for the smaller blocks (such as scaled dot product attention) etc. and combining them by composing the blocks. There are quite a few examples of blocks chapter 12 onwards. I believe all the pieces required to implement attention are present in the malt distribution. Let us know if you run into any fundamental difficulties or missing pieces as you try to implement it and we can help augment malt to support it.

The biggest issue, perhaps, is that we still lack GPU support in malt so larger networks will be more time consuming. Providing GPU support is on our roadmap, but small networks using attention should be easily implementable, at the very least for learning and demonstration purposes.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How would you expand upon the Little Learner to build a transformer neural network? #23

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How would you expand upon the Little Learner to build a transformer neural network? #23

rymaju Mar 26, 2023

Replies: 1 comment

themetaschemer Mar 27, 2023 Maintainer

rymaju
Mar 26, 2023

themetaschemer
Mar 27, 2023
Maintainer