-
Notifications
You must be signed in to change notification settings - Fork 840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open-sourced PipeDLRM #122
base: pipedlrm
Are you sure you want to change the base?
Conversation
Hi @YanzhaoWu! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but we do not have a signature on file. In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Hi Yanzhao, I am very curious about your work. Could you please show some more instructions on how to run it in your github? It would help me and the others a lot. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
2 similar comments
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
Can we remove the empty "init.py " files |
Thank you for your interest in this work, we are currently actively reviewing this PR to merge it in. In the meanwhile please feel free to try it out, you can find the detail instructions here - https://github.com/facebookresearch/dlrm/pull/122/files#diff-22b1984e9055744bcb6b52260dfdfb71 |
bring the discussion from the email thread back here, perhaps we can look at including some of the Pipedream components here linked as a submodule rather than copy them over ? |
Thank you very much for your interest in our project. Besides, you may also check the script (https://github.com/facebookresearch/dlrm/pull/122/files#diff-bc0c739ba93024f3443445a48fd0319b) for running PipeDRLM on the Kaggle DAC dataset with a 3-stage pipeline. Hope it will be helpful. |
Sure. Currently, the empty init.py files are used to treat the directories containing this file as Python packages, which will be used in PipeDLRM. We may remove them as we reorganize the codebase. |
Hi Yanzhao, I currently have one node and 4 GPUs, I'm wondering what are the num_input_rank, nrank and ngpus I should set up correspondingly. From my understanding, the nranks represents the number of GPUs on one machine, therefore I set it to 4. I've tried several numbers for num_input_rank and so far all of them gave me errors such as: File "../communication.py", line 42, in init Could you please give me some recommendations on how to set these number correctly? Thank you so much! |
Thank you very much for your interest in our project. However, we still need to modify the model configuration file (models/dlrm/gpus=3/$conf_file) correspondingly. |
This looks cool! Agree with @dmudiger that the PipeDream parts of the code can probably be removed from this codebase, especially if you haven't made any changes -- will make the diff easier to look at. If you have some changes to PipeDream that you think would be broadly useful, I am happy to upstream them to PipeDream if you send me a PR. |
We are running pipedlrm with nrank=4 . In our case, with num_input_rank=3, and nrank=4. I am using the default script And with nrank=6 . I observe the following failure. |
Thank you very much for your interests in our project. For the second issue, it seems that the train_data or test_data is NoneType. The input ranks will load the actual training data while other ranks will generate random data to ensure the consistency of the number of iterations across different ranks. You need to check the configuration to ensure the num_batches is correct. Besides, it is suggested that you first try num_input_rank=1. |
The open-sourced version of PipeDLRM, consisting of 5 functioning components, profiler, optimizer, runtime implementation, modeling and visualizer. PipeDLRM is built on top of DLRM with some components from PipeDream (https://github.com/msr-fiddle/pipedream).