Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing matrix operations in Chpt. 16 #170

Open
xiongtx opened this issue Apr 5, 2024 · 1 comment
Open

Confusing matrix operations in Chpt. 16 #170

xiongtx opened this issue Apr 5, 2024 · 1 comment

Comments

@xiongtx
Copy link

xiongtx commented Apr 5, 2024

I find the matrix operations in Chpt. 16 confusing. For example, instead of:

keys = U_key.matmul(embedded_sentence.T).T
values = U_value.matmul(embedded_sentence.T).T
omega_23 = query_2.dot(keys[2])
attention_weights_2 = F.softmax(omega_2 / d**0.5, dim=0)

It's clearer to do:

queries = embedded_sentence @ U_query
keys = embedded_sentence @ U_key 
values = embedded_sentence @ U_value
omega = queries @ keys.T
attention_weights = F.softmax(omega / d**0.5, dim=0)

W/ multi-head attention, instead of:

stacked_inputs = embedded_sentence.T.repeat(8, 1, 1)
multihead_keys = torch.bmm(multihead_U_key, stacked_inputs)
multihead_keys = multihead_keys.permute(0, 2, 1)
# Eventually giving up...
multihead_z_2 = torch.rand(8, 16)

we can just do:

multihead_queries= embedded_sentence @ multihead_U_query
multihead_keys = embedded_sentence @ multihead_U_key
multihead_values = embedded_sentence @ multihead_U_vvalue
multihead_weights = F.softmax(multihead_queries @ multihead_keys.transpose(1, 2) / d**0.5, dim=1)
multihead_z = multihead_weights @ multhead_values

which makes it clear that the multihead case is analogous to the single-head case.

@rasbt
Copy link
Owner

rasbt commented Apr 20, 2024

Thanks for the comment, and I 100% agree. Not sure why I made it unnecessarily complicated there. In my other book (Build an LLM from Scratch), I am using the more legible version similar to what you suggest: https://github.com/rasbt/LLMs-from-scratch/blob/main/ch03/01_main-chapter-code/ch03.ipynb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants