Cant't start the sample. #217

li1553770945 · 2023-03-13T17:56:52Z

li1553770945
Mar 13, 2023

I have successfully installed the fedscale framework and downloaded the femnist dataset. I am trying to follow the information on this page Deploment to complete my first sample. I entered the fedscale driver start conf.yml command and the contents of my conf.yml are as follows.

# Configuration file of FAR training experiment

# ========== Cluster configuration ========== 
# ip address of the parameter server (need 1 GPU process)
ps_ip: localhost

# ip address of each worker:# of available gpus process on each gpu in this node
# Note that if we collocate ps and worker on same GPU, then we need to decrease this number of available processes on that GPU by 1
# E.g., master node has 4 available processes, then 1 for the ps, and worker should be set to: worker:3
worker_ips:
    - localhost:[4]

exp_path: $FEDSCALE_HOME/fedscale/cloud

# Entry function of executor and aggregator under $exp_path
executor_entry: execution/executor.py

aggregator_entry: aggregation/aggregator.py

auth:
    ssh_user: ""
    ssh_private_key: ~/.ssh/id_rsa

# cmd to run before we can indeed run FAR (in order)
setup_commands:
    - source $HOME/anaconda3/bin/activate fedscale

# ========== Additional job configuration ========== 
# Default parameters are specified in config_parser.py, wherein more description of the parameter can be found

job_conf: 
    - job_name: femnist                   # Generate logs under this folder: log_path/job_name/time_stamp
    - log_path: $FEDSCALE_HOME/benchmark # Path of log files
    - num_participants: 50                  # Number of participants per round, we use K=100 in our paper, large K will be much slower
    - data_set: femnist                     # Dataset: openImg, google_speech, stackoverflow
    - data_dir: $FEDSCALE_HOME/benchmark/dataset/data/femnist    # Path of the dataset
    - data_map_file: $FEDSCALE_HOME/benchmark/dataset/data/femnist/client_data_mapping/train.csv              # Allocation of data to each client, turn to iid setting if not provided
    - device_conf_file: $FEDSCALE_HOME/benchmark/dataset/data/device_info/client_device_capacity     # Path of the client trace
    - device_avail_file: $FEDSCALE_HOME/benchmark/dataset/data/device_info/client_behave_trace
    - model: resnet18             # NOTE: Please refer to our model zoo README and use models for these small image (e.g., 32x32x3) inputs
#    - model_zoo: fedscale-torch-zoo
    - eval_interval: 10                     # How many rounds to run a testing on the testing set
    - rounds: 1000                          # Number of rounds to run this training. We use 1000 in our paper, while it may converge w/ ~400 rounds
    - filter_less: 21                       # Remove clients w/ less than 21 samples
    - num_loaders: 2
    - local_steps: 5
    - learning_rate: 0.05
    - batch_size: 20
    - test_bsz: 20
    - use_cuda: False
    - save_checkpoint: False

After that the message displayed is

starting aggregator on localhost...
Aggregator local PID 2767758. run kill -9 2767758 to kill the job.
Starting workers on localhost ...
Submitted job, please check your logs $FEDSCALE_HOME/benchmark/logs/femnist/0314_015101 for status

I checked the log file directory and didn't find the log file, and I checked the PID of his output and there is no process.

Answered by donggook-me

Oct 11, 2023

I got into same trouble. In my case, when it fails, the log file(ex: femnist_logging) is under Fedscale(top directory) as above mentioned. And my problem occured because of unsuccessfully setting the env varible($FEDSCALE_HOME). If anyone got into same problem, check your home address variable by "echo $FEDSCALE_HOME".

Even after setting your path to $FEDSCALE_HOME, don't forget to use ". ~/bashrc" or ". ~/bash_profile"(mac). Thanks :)

View full answer

fanlai0990 · 2023-04-18T16:17:10Z

fanlai0990
Apr 18, 2023
Maintainer

Hi. Sorry for the delay (this discussion channel is not often used). Have you solved the issue? Please consider creating an issue if not.

0 replies

AmberLJC · 2023-04-18T17:19:08Z

AmberLJC
Apr 18, 2023
Maintainer

l did not encounter this problem, btw the log file is under Fedscale not /benchmark

0 replies

donggook-me · 2023-10-11T14:44:57Z

donggook-me
Oct 11, 2023

I got into same trouble. In my case, when it fails, the log file(ex: femnist_logging) is under Fedscale(top directory) as above mentioned. And my problem occured because of unsuccessfully setting the env varible($FEDSCALE_HOME). If anyone got into same problem, check your home address variable by "echo $FEDSCALE_HOME".

Even after setting your path to $FEDSCALE_HOME, don't forget to use ". ~/bashrc" or ". ~/bash_profile"(mac). Thanks :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cant't start the sample. #217

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Cant't start the sample. #217

li1553770945 Mar 13, 2023

Replies: 3 comments

fanlai0990 Apr 18, 2023 Maintainer

AmberLJC Apr 18, 2023 Maintainer

donggook-me Oct 11, 2023

li1553770945
Mar 13, 2023

fanlai0990
Apr 18, 2023
Maintainer

AmberLJC
Apr 18, 2023
Maintainer

donggook-me
Oct 11, 2023