Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bf16 datatype supported? #105

Open
clockfly opened this issue Jun 7, 2023 · 5 comments
Open

bf16 datatype supported? #105

clockfly opened this issue Jun 7, 2023 · 5 comments

Comments

@clockfly
Copy link

clockfly commented Jun 7, 2023

Do I need to set some flags to enable bfloat16?

Currently, it is not working. Report following error for nccl test:

llm011:2504887:2504887 [7] enqueue.cc:1175 NCCL WARN Error : no algorithm/protocol available
llm011:2504887:2504887 [7] NCCL INFO enqueue.cc:1273 -> 3
llm011:2504887:2504887 [7] NCCL INFO enqueue.cc:567 -> 3
llm011:2504887:2504887 [7] NCCL INFO enqueue.cc:936 -> 3

llm012:1731092:1731092 [7] enqueue.cc:1175 NCCL WARN Error : no algorithm/protocol available
llm012:1731092:1731092 [7] NCCL INFO enqueue.cc:1273 -> 3
llm012:1731092:1731092 [7] NCCL INFO enqueue.cc:567 -> 3
llm012:1731092:1731092 [7] NCCL INFO enqueue.cc:936 -> 3
llm012:1731092:1731092 [7] NCCL INFO group.cc:140 -> 3
llm012:1731092:1731092 [7] NCCL INFO group.cc:341 -> 3
llm012:1731092:1731092 [7] NCCL INFO group.cc:422 -> 3
llm012:1731092:1731092 [7] NCCL INFO group.cc:106 -> 3
@bureddy
Copy link
Collaborator

bureddy commented Jun 12, 2023

it is supported on NDR network fabric.

@cjld
Copy link

cjld commented Sep 7, 2023

I have the same problem: SHARP int8,uint8,bfloat16 Datatypes not supported, I am using 200G HDR switch, do HDR support bf16?

@salanki
Copy link

salanki commented Sep 7, 2023

No, you need NDR / Quantum2.

@QiuBiuBiu
Copy link

QiuBiuBiu commented Oct 23, 2023

Hi @salanki @bureddy @cjld , Where did you get the info that BF16 can't be supported by HDR? I didn't get these datatype difference between HDR and NDR on the official website.
Quantum2/NDR: https://www.nvidia.com/en-us/networking/quantum2/
Quantum1/HDR: https://network.nvidia.com/files/doc-2020/pb-qm8790.pdf

@salanki
Copy link

salanki commented Oct 23, 2023

@QiuBiuBiu you can see in the code that CX7 is required for bf16 initialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants