Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] TensorStack #494

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

[Feature] TensorStack #494

wants to merge 6 commits into from

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jul 19, 2023

Description

Describe your changes in detail.

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #15213 if this solves the issue #15213

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 19, 2023
@vmoens vmoens marked this pull request as draft July 19, 2023 20:33
@github-actions
Copy link

github-actions bot commented Jul 19, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 109. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 51.8010μs 22.7099μs 44.0336 KOps/s 44.0688 KOps/s $\color{#d91a1a}-0.08\%$
test_plain_set_stack_nested 0.2543ms 0.2110ms 4.7392 KOps/s 4.6682 KOps/s $\color{#35bf28}+1.52\%$
test_plain_set_nested_inplace 63.4010μs 26.4558μs 37.7989 KOps/s 37.8646 KOps/s $\color{#d91a1a}-0.17\%$
test_plain_set_stack_nested_inplace 0.3288ms 0.2481ms 4.0314 KOps/s 3.9835 KOps/s $\color{#35bf28}+1.20\%$
test_items 31.8010μs 4.1040μs 243.6642 KOps/s 242.2504 KOps/s $\color{#35bf28}+0.58\%$
test_items_nested 0.7492ms 0.4241ms 2.3582 KOps/s 2.3735 KOps/s $\color{#d91a1a}-0.65\%$
test_items_nested_locked 0.5003ms 0.4211ms 2.3746 KOps/s 2.3465 KOps/s $\color{#35bf28}+1.20\%$
test_items_nested_leaf 1.8843ms 0.2606ms 3.8373 KOps/s 3.8842 KOps/s $\color{#d91a1a}-1.21\%$
test_items_stack_nested 2.4513ms 2.3066ms 433.5399 Ops/s 429.3371 Ops/s $\color{#35bf28}+0.98\%$
test_items_stack_nested_leaf 2.2146ms 2.0994ms 476.3327 Ops/s 472.9603 Ops/s $\color{#35bf28}+0.71\%$
test_items_stack_nested_locked 1.2442ms 1.1203ms 892.6446 Ops/s 878.8328 Ops/s $\color{#35bf28}+1.57\%$
test_keys 41.4010μs 6.0216μs 166.0683 KOps/s 166.0486 KOps/s $\color{#35bf28}+0.01\%$
test_keys_nested 0.8849ms 0.2137ms 4.6803 KOps/s 4.6684 KOps/s $\color{#35bf28}+0.25\%$
test_keys_nested_locked 0.2505ms 0.2108ms 4.7432 KOps/s 4.6806 KOps/s $\color{#35bf28}+1.34\%$
test_keys_nested_leaf 0.3659ms 0.2036ms 4.9108 KOps/s 4.5548 KOps/s $\textbf{\color{#35bf28}+7.82\%}$
test_keys_stack_nested 2.1919ms 2.0537ms 486.9149 Ops/s 484.8499 Ops/s $\color{#35bf28}+0.43\%$
test_keys_stack_nested_leaf 2.1268ms 2.0608ms 485.2518 Ops/s 485.9255 Ops/s $\color{#d91a1a}-0.14\%$
test_keys_stack_nested_locked 0.9717ms 0.8648ms 1.1563 KOps/s 1.1395 KOps/s $\color{#35bf28}+1.47\%$
test_values 51.1000μs 1.8320μs 545.8387 KOps/s 536.9532 KOps/s $\color{#35bf28}+1.65\%$
test_values_nested 0.1101ms 72.8258μs 13.7314 KOps/s 13.7166 KOps/s $\color{#35bf28}+0.11\%$
test_values_nested_locked 0.1139ms 77.7172μs 12.8672 KOps/s 13.7328 KOps/s $\textbf{\color{#d91a1a}-6.30\%}$
test_values_nested_leaf 0.1101ms 65.2923μs 15.3157 KOps/s 15.2454 KOps/s $\color{#35bf28}+0.46\%$
test_values_stack_nested 1.9708ms 1.8596ms 537.7620 Ops/s 539.0717 Ops/s $\color{#d91a1a}-0.24\%$
test_values_stack_nested_leaf 2.8941ms 1.8760ms 533.0573 Ops/s 542.6587 Ops/s $\color{#d91a1a}-1.77\%$
test_values_stack_nested_locked 0.8454ms 0.7443ms 1.3436 KOps/s 1.3336 KOps/s $\color{#35bf28}+0.75\%$
test_membership 26.4000μs 2.0966μs 476.9668 KOps/s 466.7463 KOps/s $\color{#35bf28}+2.19\%$
test_membership_nested 51.5010μs 4.0619μs 246.1919 KOps/s 241.8280 KOps/s $\color{#35bf28}+1.80\%$
test_membership_nested_leaf 49.4000μs 4.0323μs 247.9976 KOps/s 237.6864 KOps/s $\color{#35bf28}+4.34\%$
test_membership_stacked_nested 45.7010μs 16.8873μs 59.2162 KOps/s 59.9190 KOps/s $\color{#d91a1a}-1.17\%$
test_membership_stacked_nested_leaf 69.1010μs 16.8407μs 59.3801 KOps/s 58.8045 KOps/s $\color{#35bf28}+0.98\%$
test_membership_nested_last 46.9010μs 8.5158μs 117.4291 KOps/s 116.0165 KOps/s $\color{#35bf28}+1.22\%$
test_membership_nested_leaf_last 35.6000μs 8.4134μs 118.8579 KOps/s 114.7659 KOps/s $\color{#35bf28}+3.57\%$
test_membership_stacked_nested_last 0.3044ms 0.2579ms 3.8769 KOps/s 3.8675 KOps/s $\color{#35bf28}+0.24\%$
test_membership_stacked_nested_leaf_last 50.2010μs 19.5686μs 51.1024 KOps/s 50.7898 KOps/s $\color{#35bf28}+0.62\%$
test_nested_getleaf 77.5010μs 17.7299μs 56.4019 KOps/s 56.5629 KOps/s $\color{#d91a1a}-0.28\%$
test_nested_get 60.9010μs 16.8337μs 59.4046 KOps/s 59.4007 KOps/s $+0.01\%$
test_stacked_getleaf 1.2028ms 1.0330ms 968.0588 Ops/s 994.3456 Ops/s $\color{#d91a1a}-2.64\%$
test_stacked_get 1.0497ms 0.9869ms 1.0133 KOps/s 1.0397 KOps/s $\color{#d91a1a}-2.54\%$
test_nested_getitemleaf 91.6010μs 18.0248μs 55.4792 KOps/s 55.8500 KOps/s $\color{#d91a1a}-0.66\%$
test_nested_getitem 43.0010μs 16.9725μs 58.9187 KOps/s 58.4758 KOps/s $\color{#35bf28}+0.76\%$
test_stacked_getitemleaf 1.1630ms 1.0323ms 968.6759 Ops/s 986.8745 Ops/s $\color{#d91a1a}-1.84\%$
test_stacked_getitem 1.0918ms 0.9844ms 1.0158 KOps/s 1.0374 KOps/s $\color{#d91a1a}-2.08\%$
test_lock_nested 78.2172ms 1.7466ms 572.5432 Ops/s 600.0534 Ops/s $\color{#d91a1a}-4.58\%$
test_lock_stack_nested 99.1759ms 20.9597ms 47.7106 Ops/s 52.0033 Ops/s $\textbf{\color{#d91a1a}-8.25\%}$
test_unlock_nested 76.7827ms 1.7577ms 568.9346 Ops/s 566.9980 Ops/s $\color{#35bf28}+0.34\%$
test_unlock_stack_nested 0.1016s 21.5177ms 46.4735 Ops/s 50.6751 Ops/s $\textbf{\color{#d91a1a}-8.29\%}$
test_flatten_speed 1.2712ms 1.1784ms 848.5895 Ops/s 865.8550 Ops/s $\color{#d91a1a}-1.99\%$
test_unflatten_speed 2.1688ms 2.1273ms 470.0817 Ops/s 477.9084 Ops/s $\color{#d91a1a}-1.64\%$
test_common_ops 1.5450ms 1.2740ms 784.9311 Ops/s 788.5821 Ops/s $\color{#d91a1a}-0.46\%$
test_creation 38.4010μs 7.1639μs 139.5884 KOps/s 141.9476 KOps/s $\color{#d91a1a}-1.66\%$
test_creation_empty 49.2000μs 15.8404μs 63.1298 KOps/s 65.2866 KOps/s $\color{#d91a1a}-3.30\%$
test_creation_nested_1 58.6010μs 28.4662μs 35.1293 KOps/s 35.4135 KOps/s $\color{#d91a1a}-0.80\%$
test_creation_nested_2 61.0010μs 31.2306μs 32.0199 KOps/s 32.5927 KOps/s $\color{#d91a1a}-1.76\%$
test_clone 0.1460ms 28.7432μs 34.7908 KOps/s 35.0294 KOps/s $\color{#d91a1a}-0.68\%$
test_getitem[int] 73.6010μs 34.6907μs 28.8262 KOps/s 28.3399 KOps/s $\color{#35bf28}+1.72\%$
test_getitem[slice_int] 0.1637ms 74.6687μs 13.3925 KOps/s 13.4642 KOps/s $\color{#d91a1a}-0.53\%$
test_getitem[range] 0.1120ms 77.8600μs 12.8436 KOps/s 12.8677 KOps/s $\color{#d91a1a}-0.19\%$
test_getitem[tuple] 0.1423ms 69.8429μs 14.3179 KOps/s 14.4251 KOps/s $\color{#d91a1a}-0.74\%$
test_getitem[list] 0.3668ms 67.9292μs 14.7212 KOps/s 14.3994 KOps/s $\color{#35bf28}+2.24\%$
test_setitem_dim[int] 66.5010μs 37.7482μs 26.4913 KOps/s 25.2884 KOps/s $\color{#35bf28}+4.76\%$
test_setitem_dim[slice_int] 0.1100ms 78.9748μs 12.6623 KOps/s 12.3117 KOps/s $\color{#35bf28}+2.85\%$
test_setitem_dim[range] 0.1206ms 75.9752μs 13.1622 KOps/s 12.9414 KOps/s $\color{#35bf28}+1.71\%$
test_setitem_dim[tuple] 0.1105ms 70.7524μs 14.1338 KOps/s 13.8807 KOps/s $\color{#35bf28}+1.82\%$
test_setitem 0.1757ms 36.9954μs 27.0304 KOps/s 26.2518 KOps/s $\color{#35bf28}+2.97\%$
test_set 0.1812ms 35.7391μs 27.9806 KOps/s 27.2201 KOps/s $\color{#35bf28}+2.79\%$
test_set_shared 0.3931ms 0.2053ms 4.8708 KOps/s 4.8920 KOps/s $\color{#d91a1a}-0.43\%$
test_update 0.1994ms 40.5768μs 24.6446 KOps/s 24.4562 KOps/s $\color{#35bf28}+0.77\%$
test_update_nested 0.2284ms 60.2927μs 16.5858 KOps/s 16.3633 KOps/s $\color{#35bf28}+1.36\%$
test_set_nested 0.1858ms 39.4785μs 25.3302 KOps/s 24.9139 KOps/s $\color{#35bf28}+1.67\%$
test_set_nested_new 0.2381ms 61.5336μs 16.2513 KOps/s 16.2813 KOps/s $\color{#d91a1a}-0.18\%$
test_select 2.1698ms 0.1137ms 8.7966 KOps/s 8.8534 KOps/s $\color{#d91a1a}-0.64\%$
test_unbind_speed 0.8565ms 0.7639ms 1.3091 KOps/s 1.3151 KOps/s $\color{#d91a1a}-0.46\%$
test_unbind_speed_stack0 3.8733ms 3.6582ms 273.3565 Ops/s 274.1349 Ops/s $\color{#d91a1a}-0.28\%$
test_unbind_speed_stack1 2.8950μs 0.5565μs 1.7969 MOps/s 1.7908 MOps/s $\color{#35bf28}+0.34\%$
test_creation[device0] 0.6506ms 0.5354ms 1.8678 KOps/s 1.8914 KOps/s $\color{#d91a1a}-1.25\%$
test_creation_from_tensor 0.9414ms 0.6000ms 1.6668 KOps/s 1.7004 KOps/s $\color{#d91a1a}-1.98\%$
test_add_one[memmap_tensor0] 1.2448ms 37.3313μs 26.7872 KOps/s 26.2945 KOps/s $\color{#35bf28}+1.87\%$
test_contiguous[memmap_tensor0] 82.7010μs 9.8115μs 101.9216 KOps/s 100.2201 KOps/s $\color{#35bf28}+1.70\%$
test_stack[memmap_tensor0] 0.1418ms 30.1131μs 33.2081 KOps/s 32.2985 KOps/s $\color{#35bf28}+2.82\%$
test_memmaptd_index 0.4113ms 0.3208ms 3.1170 KOps/s 3.1028 KOps/s $\color{#35bf28}+0.46\%$
test_memmaptd_index_astensor 1.4376ms 1.3631ms 733.6300 Ops/s 721.2336 Ops/s $\color{#35bf28}+1.72\%$
test_memmaptd_index_op 3.0627ms 2.7536ms 363.1662 Ops/s 348.7042 Ops/s $\color{#35bf28}+4.15\%$
test_reshape_pytree 0.1152ms 43.4052μs 23.0387 KOps/s 22.9415 KOps/s $\color{#35bf28}+0.42\%$
test_reshape_td 90.2020μs 52.5793μs 19.0189 KOps/s 18.8783 KOps/s $\color{#35bf28}+0.74\%$
test_view_pytree 0.1120ms 40.6187μs 24.6192 KOps/s 24.6908 KOps/s $\color{#d91a1a}-0.29\%$
test_view_td 28.1000μs 10.3603μs 96.5222 KOps/s 99.6513 KOps/s $\color{#d91a1a}-3.14\%$
test_unbind_pytree 96.5010μs 44.1742μs 22.6377 KOps/s 22.3047 KOps/s $\color{#35bf28}+1.49\%$
test_unbind_td 0.1436ms 0.1118ms 8.9436 KOps/s 8.9501 KOps/s $\color{#d91a1a}-0.07\%$
test_split_pytree 0.1479ms 51.0612μs 19.5843 KOps/s 19.2453 KOps/s $\color{#35bf28}+1.76\%$
test_split_td 4.6803ms 0.1355ms 7.3805 KOps/s 7.4031 KOps/s $\color{#d91a1a}-0.31\%$
test_add_pytree 83.8010μs 52.7347μs 18.9628 KOps/s 17.8503 KOps/s $\textbf{\color{#35bf28}+6.23\%}$
test_add_td 0.1445ms 85.7955μs 11.6556 KOps/s 11.2026 KOps/s $\color{#35bf28}+4.04\%$
test_distributed 45.7010μs 10.9242μs 91.5399 KOps/s 93.7805 KOps/s $\color{#d91a1a}-2.39\%$
test_tdmodule 0.2339ms 32.5078μs 30.7618 KOps/s 30.2385 KOps/s $\color{#35bf28}+1.73\%$
test_tdmodule_dispatch 0.2803ms 62.6701μs 15.9566 KOps/s 7.3441 KOps/s $\textbf{\color{#35bf28}+117.27\%}$
test_tdseq 0.6150ms 37.7330μs 26.5020 KOps/s 25.4150 KOps/s $\color{#35bf28}+4.28\%$
test_tdseq_dispatch 0.2284ms 75.3302μs 13.2749 KOps/s 12.8379 KOps/s $\color{#35bf28}+3.40\%$
test_instantiation_functorch 2.3468ms 1.9018ms 525.8061 Ops/s 526.0514 Ops/s $\color{#d91a1a}-0.05\%$
test_instantiation_td 2.2714ms 1.5697ms 637.0563 Ops/s 628.4484 Ops/s $\color{#35bf28}+1.37\%$
test_exec_functorch 0.3078ms 0.2194ms 4.5577 KOps/s 4.5005 KOps/s $\color{#35bf28}+1.27\%$
test_exec_td 0.2512ms 0.2077ms 4.8155 KOps/s 4.7637 KOps/s $\color{#35bf28}+1.09\%$
test_vmap_mlp_speed[True-True] 1.7835ms 1.3394ms 746.5907 Ops/s 744.8484 Ops/s $\color{#35bf28}+0.23\%$
test_vmap_mlp_speed[True-False] 1.0314ms 0.6985ms 1.4317 KOps/s 1.4410 KOps/s $\color{#d91a1a}-0.64\%$
test_vmap_mlp_speed[False-True] 13.6800ms 1.1500ms 869.5427 Ops/s 879.6614 Ops/s $\color{#d91a1a}-1.15\%$
test_vmap_mlp_speed[False-False] 1.8753ms 0.5168ms 1.9351 KOps/s 1.9184 KOps/s $\color{#35bf28}+0.87\%$
test_vmap_transformer_speed[True-True] 20.8154ms 16.4208ms 60.8984 Ops/s 62.3221 Ops/s $\color{#d91a1a}-2.28\%$
test_vmap_transformer_speed[True-False] 11.5970ms 10.4869ms 95.3571 Ops/s 97.4623 Ops/s $\color{#d91a1a}-2.16\%$
test_vmap_transformer_speed[False-True] 20.6403ms 15.8031ms 63.2786 Ops/s 64.9563 Ops/s $\color{#d91a1a}-2.58\%$
test_vmap_transformer_speed[False-False] 11.7831ms 10.3764ms 96.3725 Ops/s 99.9534 Ops/s $\color{#d91a1a}-3.58\%$

@vmoens vmoens added the enhancement New feature or request label Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants