You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.
[2024-07-03 16:45:49] INFO (nni.tuner.tpe/MainThread) Using random seed 668056533
[2024-07-03 16:45:49] INFO (nni.runtime.msg_dispatcher_base/MainThread) Dispatcher started
[2024-07-03 16:45:49] INFO (nni.runtime.msg_dispatcher/Thread-1 (command_queue_worker)) Initial search space: {'hidden_sizes': {'_type': 'choice', '_value': [[], [256], [512], [1024], [1024, 512], [1024, 512, 256], [512, 256]]}, 'learning_rate': {'_type': 'loguniform', '_value': [1e-06, 0.1]}, 'batch_size': {'_type': 'choice', '_value': [32, 64, 128]}, 'num_epochs': {'_type': 'randint', '_value': [100, 1000]}, 'dropout_prob': {'_type': 'uniform', '_value': [0, 0.5]}, 'use_batch_norm': {'_type': 'choice', '_value': [True, False]}, 'activation_fn': {'_type': 'choice', '_value': ['relu', 'leaky_relu', 'sigmoid', 'tanh', 'elu', 'selu']}, 'patience': {'_type': 'randint', '_value': [0, 10]}}
[2024-07-03 17:10:21] ERROR (nni.runtime.msg_dispatcher_base/Thread-1 (command_queue_worker)) 45
Traceback (most recent call last):
File "/home/josep/.pyenv/versions/3.10.14/lib/python3.10/site-packages/nni/runtime/msg_dispatcher_base.py", line 108, in command_queue_worker
self.process_command(command, data)
File "/home/josep/.pyenv/versions/3.10.14/lib/python3.10/site-packages/nni/runtime/msg_dispatcher_base.py", line 154, in process_command
command_handlers[command](data)
File "/home/josep/.pyenv/versions/3.10.14/lib/python3.10/site-packages/nni/runtime/msg_dispatcher.py", line 148, in handle_report_metric_data
self._handle_final_metric_data(data)
File "/home/josep/.pyenv/versions/3.10.14/lib/python3.10/site-packages/nni/runtime/msg_dispatcher.py", line 201, in _handle_final_metric_data
self.tuner.receive_trial_result(id_, _trial_params[id_], value, customized=customized,
File "/home/josep/.pyenv/versions/3.10.14/lib/python3.10/site-packages/nni/algorithms/hpo/tpe_tuner.py", line 197, in receive_trial_result
params = self._running_params.pop(parameter_id)
KeyError: 45
[2024-07-03 17:10:28] INFO (nni.runtime.msg_dispatcher_base/MainThread) Dispatcher exiting...
[2024-07-03 17:10:28] INFO (nni.runtime.msg_dispatcher_base/MainThread) Dispatcher terminiated
How to reproduce it?:
It happens not just once for me, but occasionally with different experiments. I tried lowering concurrency to 1 in order to avoid it, but it appears nonetheless.
In this example, it was trial 45 evidently which caused the crash. In the web ui, I can see that trial 45 succeeded and there is a recorded metric value for it. Yet, when TPE goes to find its parameters, it seems it cannot find them?
The text was updated successfully, but these errors were encountered:
Describe the issue:
It seems the dispatcher crashes for me from unknown causes, and when this happens, my experiment stops running.
Environment:
Configuration:
Log message:
(relevant snippet)
How to reproduce it?:
It happens not just once for me, but occasionally with different experiments. I tried lowering concurrency to 1 in order to avoid it, but it appears nonetheless.
In this example, it was trial 45 evidently which caused the crash. In the web ui, I can see that trial 45 succeeded and there is a recorded metric value for it. Yet, when TPE goes to find its parameters, it seems it cannot find them?
The text was updated successfully, but these errors were encountered: