-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Satyaog/feature/covalent #217
Open
satyaog
wants to merge
3
commits into
master
Choose a base branch
from
satyaog/feature/covalent
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Closed
satyaog
force-pushed
the
satyaog/feature/covalent
branch
from
May 23, 2024 13:53
29f573e
to
3bfe690
Compare
satyaog
force-pushed
the
satyaog/feature/covalent
branch
from
May 23, 2024 14:02
3bfe690
to
65ca09a
Compare
satyaog
force-pushed
the
satyaog/feature/covalent
branch
from
May 23, 2024 14:28
65ca09a
to
978e16d
Compare
satyaog
force-pushed
the
satyaog/feature/covalent
branch
3 times, most recently
from
May 23, 2024 15:11
f9a8c6e
to
89898ee
Compare
satyaog
force-pushed
the
satyaog/feature/covalent
branch
from
May 24, 2024 13:03
89898ee
to
14ffdf1
Compare
satyaog
force-pushed
the
satyaog/feature/covalent
branch
from
May 24, 2024 15:40
14ffdf1
to
7d15073
Compare
satyaog
force-pushed
the
satyaog/feature/covalent
branch
from
May 27, 2024 13:49
7d15073
to
052d2b9
Compare
satyaog
force-pushed
the
satyaog/feature/covalent
branch
from
May 27, 2024 18:06
052d2b9
to
11dd515
Compare
satyaog
force-pushed
the
satyaog/feature/covalent
branch
from
May 27, 2024 23:50
11dd515
to
3683cb7
Compare
satyaog
force-pushed
the
satyaog/feature/covalent
branch
from
August 8, 2024 14:24
3683cb7
to
fa32dde
Compare
satyaog
force-pushed
the
satyaog/feature/covalent
branch
from
August 8, 2024 14:26
fa32dde
to
172c90f
Compare
satyaog
force-pushed
the
satyaog/feature/covalent
branch
from
August 8, 2024 14:53
172c90f
to
2f5f981
Compare
satyaog
force-pushed
the
satyaog/feature/covalent
branch
from
August 21, 2024 06:23
e9a129b
to
558a31d
Compare
satyaog
force-pushed
the
satyaog/feature/covalent
branch
from
August 22, 2024 04:13
558a31d
to
9e394be
Compare
satyaog
force-pushed
the
satyaog/feature/covalent
branch
from
September 6, 2024 03:24
9e394be
to
fdd5270
Compare
satyaog
force-pushed
the
satyaog/feature/covalent
branch
from
September 11, 2024 05:24
fdd5270
to
b03a424
Compare
covalent is not compatible with milabench as it requires sqlalchemy<2.0.0 Update .github/workflows/cloud-ci.yml Apply suggestions from code review Update .github/workflows/cloud-ci.yml Add azure covalent cloud infra Add multi-node on cloud * VM on the cloud might not have enough space on all partitions. Add a workaround which should cover most cases * Use branch and commit name to versionize reports directories * Fix parsing error when temperature is not available in nvidia-smi outputs * export MILABENCH_* env vars to remote Add docs Fix cloud instance name conflict This would prevent the CI or multiple contributors to run tests with the same config Fix github push in CI * Copy ssh key to allow connections from master to workers * Use local ip for manager's ip such that workers can find it and connect to it
satyaog
force-pushed
the
satyaog/feature/covalent
branch
from
September 20, 2024 15:48
b03a424
to
b591c23
Compare
Added the slurm covalent plugin to help debug the cloud setups |
satyaog
force-pushed
the
satyaog/feature/covalent
branch
from
September 23, 2024 21:42
b591c23
to
3b207f8
Compare
Tested slurm with:
Large llm models (llama3 70B) have been excluded as I don't have the resources to test yet It should work as well on azure which I'll test next week |
satyaog
force-pushed
the
satyaog/feature/covalent
branch
from
October 3, 2024 19:11
3b207f8
to
f75e3a5
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
milabench cloud --setup
It creates a system config file and takes a target cloud platform with
--run-on
.This starts a local covalent server which is used to manage python code that will be executed on the remote. For now this is only somewhat useful since milabench is mostly using ssh commands anyway and it would take a bit of time to refactor the pipeline I think to instead use the covalent interface to run code. I think this could be an interesting approach but it's a nice to have for now.
So
milabench cloud --setup
setup the remote and install basic stuff on it like the correct python version (necessary to ensure good serialization/deserialization of python objects between the local and remote machine), pip and venv. venv is used to separate the covalent env and milabench env which have incompatible package requirements versions (sqlalchemy caused problems). On this is done , the covalent server becomes uselessThen system config file should be used in the install, prepare and run commands. In those commands it creates a new standalone config for the tests that will be executed and copies it to the remote before the rest of the pipeline is executed.
At the end of the run command the results are copied to the local machine to allow the generation of a report
At the very end,
milabench cloud --teardown
should be used to release the cloud resources. The--all
argument will release all resources of a target cloud platform specified with--run-on
.Check docs/usage.rst for more info
milabench
with slurmThe
milabench cloud --setup
works as well with a slurm system configuration but does not support the--all
argument withmilabench cloud --teardown
.Check docs/usage.rst for more info
milabench report --push
Push the results to a
reports
branch which as well stores the status svg and summaryExample of reports : #210