We provide PyTorch code and pre-trained models for the baseline seq2seq models described in the paper.
Following the quickstart installation, to start training seq2seq models:
$ cd $ALFRED_ROOT
$ python models/train/train_seq2seq.py --data data/json_feat_2.1.0 --model seq2seq_im_mask --dout exp/model:{model},name:pm_and_subgoals_01 --splits data/splits/oct21.json --gpu --batch 8 --pm_aux_loss_wt 0.1 --subgoal_aux_loss_wt 0.1 --preprocess
Run this once with --preprocess
to save preprocessed JSONs inside the trajectory folders. This could take a few minutes, but subsequent runs can be deployed without any preprocessing. See train_seq2seq.py for hyper-parameters and other settings.
Use tensorboard --logdir exp --port 6006
to visualize losses and performance metrics.
To evaluate a trained model through real-time execution on THOR:
$ python models/eval/eval_seq2seq.py --model_path <model_path>/best_seen.pth --eval_split valid_seen --data data/json_feat_2.1.0 --model models.model.seq2seq_im_mask --gpu --num_threads 3
Use eval_split
to specify which split to evaluate, and num_threads
to indicate the number of parallel evaluation threads to spawn. The experiments in the paper used max_fails=10
and max_steps=1000
. The results will be dumped as a JSON file task_results_<timestamp>.json
inside the model_path
directory.
Note: If you are training and evaluating on different machines or if you just downloaded a checkpoint, you need to run eval with --preprocess
once with the appropriate dataset path. Also, after a fresh-install, run with --num_threads 1
to allow the script to download the THOR binary.
To evaluate individual subgoals for each task, run with --subgoal
:
$ python models/eval/eval_seq2seq.py --model_path <model_path>/best_seen.pth --eval_split valid_seen --data data/json_feat_2.1.0 --model models.model.seq2seq_im_mask --gpu --num_threads 3 --subgoals all
This will use the expert demonstrations to reach the subgoal to be evaluated. You can specify --subgoals all
to evaluate all subgoals, or select specific ones e.g --subgoal GoalLocation,HeatObject
. Possible subgoals include GotoLocation
, PickupObject
, PutObject
, CleanObject
, HeatObject
, CoolObject
, ToggleObject
, SliceObject
. The results will be dumped as a JSON file subgoal_results_<timestamp>.json
inside the model_path
directory.
Run your model on test seen and unseen sets, and create an action-sequence dump of your agent:
$ cd $ALFRED_ROOT
$ python models/eval/leaderboard.py --model_path <model_path>/best_seen.pth --model models.model.seq2seq_im_mask --data data/json_feat_2.1.0 --gpu --num_threads 5
This will create a JSON file, e.g. task_results_20191218_081448_662435.json
, inside the <model_path>
folder. Submit this JSON here: AI2 ALFRED Leaderboard.
The JSON contains deterministic THOR API commmands executed by your trained agent, which will be replayed on the leaderboard server for evaluation. This process is model-agnostic, allowing you to use your local resources for test-time inference. For restrictions and guidelines regarding the submission, see the getting started page.
/model
seq2seq.py (base module with train and val loops)
seq2seq_im_mask.py (full model with batching and losses)
/nn
vnn.py (encoder, decoder, attention mechanisms)
resnet.py (pre-trained Resnet feature extractor)
/train
train_seq2seq.py (main with training args)
/eval
eval_seq2seq.py (main with eval args)
eval_subgoals.py (subgoal eval)
eval_task.py (overall task eval)
/config
rewards.json (reward values for actions; not used)
Download the Seq2Seq+PM (both) model checkpoint:
$ wget https://ai2-vision-alfred.s3-us-west-2.amazonaws.com/seq2seq_pm_chkpt.zip
$ unzip seq2seq_pm_chkpt.zip
The unzipped folder should contain best_seen.pth
and best_unseen.pth
checkpoints.
Note: Some users have reported that they had to train the model themselves to reproduce the results. If the results from the pre-trained model are dramatically worse than the ones reported in the paper, you could try training the model from scratch. We are looking into the source of this issue.