Cant't start the sample. #217
-
I have successfully installed the fedscale framework and downloaded the femnist dataset. I am trying to follow the information on this page Deploment to complete my first sample. I entered the # Configuration file of FAR training experiment
# ========== Cluster configuration ==========
# ip address of the parameter server (need 1 GPU process)
ps_ip: localhost
# ip address of each worker:# of available gpus process on each gpu in this node
# Note that if we collocate ps and worker on same GPU, then we need to decrease this number of available processes on that GPU by 1
# E.g., master node has 4 available processes, then 1 for the ps, and worker should be set to: worker:3
worker_ips:
- localhost:[4]
exp_path: $FEDSCALE_HOME/fedscale/cloud
# Entry function of executor and aggregator under $exp_path
executor_entry: execution/executor.py
aggregator_entry: aggregation/aggregator.py
auth:
ssh_user: ""
ssh_private_key: ~/.ssh/id_rsa
# cmd to run before we can indeed run FAR (in order)
setup_commands:
- source $HOME/anaconda3/bin/activate fedscale
# ========== Additional job configuration ==========
# Default parameters are specified in config_parser.py, wherein more description of the parameter can be found
job_conf:
- job_name: femnist # Generate logs under this folder: log_path/job_name/time_stamp
- log_path: $FEDSCALE_HOME/benchmark # Path of log files
- num_participants: 50 # Number of participants per round, we use K=100 in our paper, large K will be much slower
- data_set: femnist # Dataset: openImg, google_speech, stackoverflow
- data_dir: $FEDSCALE_HOME/benchmark/dataset/data/femnist # Path of the dataset
- data_map_file: $FEDSCALE_HOME/benchmark/dataset/data/femnist/client_data_mapping/train.csv # Allocation of data to each client, turn to iid setting if not provided
- device_conf_file: $FEDSCALE_HOME/benchmark/dataset/data/device_info/client_device_capacity # Path of the client trace
- device_avail_file: $FEDSCALE_HOME/benchmark/dataset/data/device_info/client_behave_trace
- model: resnet18 # NOTE: Please refer to our model zoo README and use models for these small image (e.g., 32x32x3) inputs
# - model_zoo: fedscale-torch-zoo
- eval_interval: 10 # How many rounds to run a testing on the testing set
- rounds: 1000 # Number of rounds to run this training. We use 1000 in our paper, while it may converge w/ ~400 rounds
- filter_less: 21 # Remove clients w/ less than 21 samples
- num_loaders: 2
- local_steps: 5
- learning_rate: 0.05
- batch_size: 20
- test_bsz: 20
- use_cuda: False
- save_checkpoint: False
After that the message displayed is
I checked the log file directory and didn't find the log file, and I checked the PID of his output and there is no process. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Hi. Sorry for the delay (this discussion channel is not often used). Have you solved the issue? Please consider creating an issue if not. |
Beta Was this translation helpful? Give feedback.
-
l did not encounter this problem, btw the log file is under Fedscale not |
Beta Was this translation helpful? Give feedback.
-
I got into same trouble. In my case, when it fails, the log file(ex: femnist_logging) is under Fedscale(top directory) as above mentioned. And my problem occured because of unsuccessfully setting the env varible($FEDSCALE_HOME). If anyone got into same problem, check your home address variable by "echo $FEDSCALE_HOME". Even after setting your path to $FEDSCALE_HOME, don't forget to use ". ~/bashrc" or ". ~/bash_profile"(mac). Thanks :) |
Beta Was this translation helpful? Give feedback.
I got into same trouble. In my case, when it fails, the log file(ex: femnist_logging) is under Fedscale(top directory) as above mentioned. And my problem occured because of unsuccessfully setting the env varible($FEDSCALE_HOME). If anyone got into same problem, check your home address variable by "echo $FEDSCALE_HOME".
Even after setting your path to $FEDSCALE_HOME, don't forget to use ". ~/bashrc" or ". ~/bash_profile"(mac). Thanks :)