Add Amsgrad #137

tginart · 2020-10-19T01:48:58Z

Adding Amsgrad improves numerical performance. This naive implementation requires the usage of dense gradients, which is not efficient.

This PR also includes a heuristic for better distribution of the embedding layers among the devices when using parallel training.

Merge updates from DLRM

Parallelizing the pre-processing of the dataset. (facebookresearch#117)

DLRM

Fix typo in dynamic axis names (facebookresearch#134)

tginart · 2020-10-19T01:51:47Z

@mnaumovfb

Please see above PR. Can be tested with:

python dlrm/dlrm_s_pytorch.py --use-gpu --md-flag --md-threshold=1 --md-temp=0.2 --arch-sparse-feature-size=400 --arch-mlp-bot="13-512-256-64-400" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --raw-data-file='dlrm/input/train.txt' --processed-data-file='dlrm/input/kaggleAdDisplayChallenge_processed.npz' --loss-function=bce --round-targets=True --learning-rate=0.001 --mini-batch-size=2048 --print-freq=64 --print-time --test-freq=512 --test-mini-batch-size=2048 --solver=amsgrad --print-num-emb-params --use-emb-distrib-heuristic 2>&1 | tee run_kaggle_pt.log

Requires at least ~24GB, either on a single GPU or distributed across multiple.

Should achieve something like Testing at - 19186/19186 of epoch 0, loss 0.445875, accuracy 79.188 %, best 79.188 %

tginart and others added 25 commits June 7, 2020 22:17

bugfixes for mixd

71199fb

remove whitespace

0341ca3

switch mixd to dense adam

9349a4d

achieves 79.165 with alpha 0.3 and base 1024

3539ade

half prec commit

09be5ea

Merge pull request #1 from facebookresearch/master

b8f4a94

Merge updates from DLRM

run model in 16bit but compute loss in 32bit

dd31c7b

amp version

dcd68dd

amp control with argparse

f27fa0c

add amsgrad

bba1109

remove amp

1bea177

Merge branch 'mixd' into master

bdf6de6

add err message for unsupported solvers

9bdb7b8

Merge pull request #3 from facebookresearch/master

5cae11a

Parallelizing the pre-processing of the dataset. (facebookresearch#117)

add print num embs

a1b5067

Merge branch 'master' of https://github.com/tginart/dlrm

d126e81

DLRM

3c9ffbe

Merge pull request #4 from tginart/oops

e0a8e89

DLRM

distrib

b3a8247

running correctly with emb assign heuristic

a88c23a

remove println

1990e22

Merge pull request #5 from facebookresearch/master

1eb4f90

Fix typo in dynamic axis names (facebookresearch#134)

ready for PR

d1304b9

Merge branch 'master' of https://github.com/tginart/dlrm

43773fb

ready for PR

8947703

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 19, 2020

mnaumovfb mentioned this pull request Oct 19, 2020

Mixed Dimensions Trick #108

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Amsgrad #137

Add Amsgrad #137

tginart commented Oct 19, 2020

tginart commented Oct 19, 2020 •

edited

Loading

Add Amsgrad #137

Are you sure you want to change the base?

Add Amsgrad #137

Conversation

tginart commented Oct 19, 2020

tginart commented Oct 19, 2020 • edited Loading

tginart commented Oct 19, 2020 •

edited

Loading