Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] nested_keys option in named_apply #641

Merged
merged 3 commits into from
Jan 27, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 27, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 27, 2024
@vmoens vmoens added the enhancement New feature or request label Jan 27, 2024
@vmoens vmoens changed the title [Feature] complete_keys option in named_apply [Feature] nested_keys option in named_apply Jan 27, 2024
Copy link

github-actions bot commented Jan 27, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 124. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}29$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 38.5720μs 17.4105μs 57.4367 KOps/s 60.6672 KOps/s $\textbf{\color{#d91a1a}-5.32\%}$
test_plain_set_stack_nested 0.2259ms 0.1464ms 6.8289 KOps/s 6.9490 KOps/s $\color{#d91a1a}-1.73\%$
test_plain_set_nested_inplace 66.1440μs 20.2785μs 49.3133 KOps/s 51.9036 KOps/s $\color{#d91a1a}-4.99\%$
test_plain_set_stack_nested_inplace 0.3197ms 0.1829ms 5.4672 KOps/s 5.6397 KOps/s $\color{#d91a1a}-3.06\%$
test_items 15.8700μs 2.5826μs 387.2105 KOps/s 402.5350 KOps/s $\color{#d91a1a}-3.81\%$
test_items_nested 0.4255ms 0.2738ms 3.6528 KOps/s 3.5523 KOps/s $\color{#35bf28}+2.83\%$
test_items_nested_locked 1.1438ms 0.2743ms 3.6457 KOps/s 3.5114 KOps/s $\color{#35bf28}+3.83\%$
test_items_nested_leaf 0.6058ms 0.1679ms 5.9554 KOps/s 5.7655 KOps/s $\color{#35bf28}+3.30\%$
test_items_stack_nested 1.5314ms 1.3228ms 755.9522 Ops/s 763.0367 Ops/s $\color{#d91a1a}-0.93\%$
test_items_stack_nested_leaf 2.0834ms 1.1906ms 839.9182 Ops/s 855.1219 Ops/s $\color{#d91a1a}-1.78\%$
test_items_stack_nested_locked 1.1520ms 0.8668ms 1.1536 KOps/s 1.1460 KOps/s $\color{#35bf28}+0.67\%$
test_keys 21.5400μs 3.8159μs 262.0626 KOps/s 254.6717 KOps/s $\color{#35bf28}+2.90\%$
test_keys_nested 48.1484ms 0.1582ms 6.3220 KOps/s 6.8269 KOps/s $\textbf{\color{#d91a1a}-7.40\%}$
test_keys_nested_locked 0.2269ms 0.1520ms 6.5808 KOps/s 6.6515 KOps/s $\color{#d91a1a}-1.06\%$
test_keys_nested_leaf 0.2439ms 0.1297ms 7.7122 KOps/s 7.8066 KOps/s $\color{#d91a1a}-1.21\%$
test_keys_stack_nested 1.4934ms 1.2646ms 790.7468 Ops/s 804.1054 Ops/s $\color{#d91a1a}-1.66\%$
test_keys_stack_nested_leaf 1.8648ms 1.2641ms 791.0535 Ops/s 797.1563 Ops/s $\color{#d91a1a}-0.77\%$
test_keys_stack_nested_locked 1.3117ms 0.8150ms 1.2271 KOps/s 1.2703 KOps/s $\color{#d91a1a}-3.40\%$
test_values 5.4600μs 1.1949μs 836.8564 KOps/s 850.9838 KOps/s $\color{#d91a1a}-1.66\%$
test_values_nested 0.1009ms 52.0755μs 19.2029 KOps/s 19.4400 KOps/s $\color{#d91a1a}-1.22\%$
test_values_nested_locked 0.1328ms 51.8368μs 19.2913 KOps/s 19.3612 KOps/s $\color{#d91a1a}-0.36\%$
test_values_nested_leaf 3.3772ms 46.2392μs 21.6267 KOps/s 21.4807 KOps/s $\color{#35bf28}+0.68\%$
test_values_stack_nested 1.2246ms 1.0267ms 974.0401 Ops/s 988.2701 Ops/s $\color{#d91a1a}-1.44\%$
test_values_stack_nested_leaf 1.1934ms 1.0181ms 982.2565 Ops/s 992.7305 Ops/s $\color{#d91a1a}-1.06\%$
test_values_stack_nested_locked 0.8399ms 0.6024ms 1.6600 KOps/s 1.6762 KOps/s $\color{#d91a1a}-0.97\%$
test_membership 11.5610μs 1.3458μs 743.0685 KOps/s 745.1575 KOps/s $\color{#d91a1a}-0.28\%$
test_membership_nested 22.8130μs 3.4443μs 290.3371 KOps/s 287.9650 KOps/s $\color{#35bf28}+0.82\%$
test_membership_nested_leaf 27.8420μs 3.4882μs 286.6822 KOps/s 286.3193 KOps/s $\color{#35bf28}+0.13\%$
test_membership_stacked_nested 41.8680μs 11.8733μs 84.2228 KOps/s 85.9089 KOps/s $\color{#d91a1a}-1.96\%$
test_membership_stacked_nested_leaf 43.3810μs 11.8783μs 84.1869 KOps/s 84.0713 KOps/s $\color{#35bf28}+0.14\%$
test_membership_nested_last 23.6450μs 6.7472μs 148.2100 KOps/s 151.0595 KOps/s $\color{#d91a1a}-1.89\%$
test_membership_nested_leaf_last 23.2040μs 6.6283μs 150.8676 KOps/s 151.1528 KOps/s $\color{#d91a1a}-0.19\%$
test_membership_stacked_nested_last 0.2976ms 0.1793ms 5.5764 KOps/s 5.6366 KOps/s $\color{#d91a1a}-1.07\%$
test_membership_stacked_nested_leaf_last 35.3360μs 13.9480μs 71.6948 KOps/s 71.9220 KOps/s $\color{#d91a1a}-0.32\%$
test_nested_getleaf 32.3310μs 10.9458μs 91.3590 KOps/s 93.4456 KOps/s $\color{#d91a1a}-2.23\%$
test_nested_get 59.6520μs 10.3540μs 96.5807 KOps/s 97.6981 KOps/s $\color{#d91a1a}-1.14\%$
test_stacked_getleaf 0.6270ms 0.3986ms 2.5088 KOps/s 2.5401 KOps/s $\color{#d91a1a}-1.23\%$
test_stacked_get 0.5594ms 0.3693ms 2.7076 KOps/s 2.7721 KOps/s $\color{#d91a1a}-2.32\%$
test_nested_getitemleaf 64.9110μs 12.3732μs 80.8199 KOps/s 81.9252 KOps/s $\color{#d91a1a}-1.35\%$
test_nested_getitem 33.7730μs 12.0379μs 83.0711 KOps/s 86.2161 KOps/s $\color{#d91a1a}-3.65\%$
test_stacked_getitemleaf 0.6135ms 0.4038ms 2.4766 KOps/s 2.5284 KOps/s $\color{#d91a1a}-2.05\%$
test_stacked_getitem 0.6747ms 0.3815ms 2.6215 KOps/s 2.7335 KOps/s $\color{#d91a1a}-4.10\%$
test_lock_nested 0.6626ms 0.3261ms 3.0667 KOps/s 3.0409 KOps/s $\color{#35bf28}+0.85\%$
test_lock_stack_nested 65.3962ms 5.1313ms 194.8842 Ops/s 194.0771 Ops/s $\color{#35bf28}+0.42\%$
test_unlock_nested 0.6383ms 0.3311ms 3.0204 KOps/s 2.6028 KOps/s $\textbf{\color{#35bf28}+16.04\%}$
test_unlock_stack_nested 62.5523ms 5.2574ms 190.2084 Ops/s 187.8214 Ops/s $\color{#35bf28}+1.27\%$
test_flatten_speed 0.5798ms 0.3729ms 2.6815 KOps/s 2.5908 KOps/s $\color{#35bf28}+3.50\%$
test_unflatten_speed 0.7873ms 0.4742ms 2.1089 KOps/s 2.1877 KOps/s $\color{#d91a1a}-3.60\%$
test_common_ops 3.2147ms 0.7022ms 1.4241 KOps/s 1.4928 KOps/s $\color{#d91a1a}-4.60\%$
test_creation 11.8320μs 1.8930μs 528.2749 KOps/s 529.2299 KOps/s $\color{#d91a1a}-0.18\%$
test_creation_empty 34.5450μs 11.2027μs 89.2641 KOps/s 106.5787 KOps/s $\textbf{\color{#d91a1a}-16.25\%}$
test_creation_nested_1 37.8710μs 13.6818μs 73.0898 KOps/s 84.2349 KOps/s $\textbf{\color{#d91a1a}-13.23\%}$
test_creation_nested_2 38.2110μs 17.1536μs 58.2967 KOps/s 66.0638 KOps/s $\textbf{\color{#d91a1a}-11.76\%}$
test_clone 50.4840μs 13.0721μs 76.4986 KOps/s 77.3288 KOps/s $\color{#d91a1a}-1.07\%$
test_getitem[int] 33.5430μs 11.0808μs 90.2459 KOps/s 89.9981 KOps/s $\color{#35bf28}+0.28\%$
test_getitem[slice_int] 88.6550μs 23.1487μs 43.1989 KOps/s 44.2083 KOps/s $\color{#d91a1a}-2.28\%$
test_getitem[range] 0.1275ms 41.7886μs 23.9300 KOps/s 26.0665 KOps/s $\textbf{\color{#d91a1a}-8.20\%}$
test_getitem[tuple] 59.2000μs 18.0113μs 55.5206 KOps/s 55.4846 KOps/s $\color{#35bf28}+0.07\%$
test_getitem[list] 0.1304ms 36.2714μs 27.5700 KOps/s 29.0409 KOps/s $\textbf{\color{#d91a1a}-5.07\%}$
test_setitem_dim[int] 56.1740μs 33.1650μs 30.1523 KOps/s 33.6258 KOps/s $\textbf{\color{#d91a1a}-10.33\%}$
test_setitem_dim[slice_int] 0.1091ms 58.6037μs 17.0638 KOps/s 18.2918 KOps/s $\textbf{\color{#d91a1a}-6.71\%}$
test_setitem_dim[range] 0.1274ms 78.3678μs 12.7603 KOps/s 14.0666 KOps/s $\textbf{\color{#d91a1a}-9.29\%}$
test_setitem_dim[tuple] 67.0850μs 47.6011μs 21.0079 KOps/s 22.2987 KOps/s $\textbf{\color{#d91a1a}-5.79\%}$
test_setitem 57.6380μs 20.1865μs 49.5379 KOps/s 53.3149 KOps/s $\textbf{\color{#d91a1a}-7.08\%}$
test_set 78.5770μs 19.6057μs 51.0055 KOps/s 54.5500 KOps/s $\textbf{\color{#d91a1a}-6.50\%}$
test_set_shared 3.2763ms 0.1394ms 7.1729 KOps/s 7.2805 KOps/s $\color{#d91a1a}-1.48\%$
test_update 88.2940μs 22.9022μs 43.6639 KOps/s 47.1379 KOps/s $\textbf{\color{#d91a1a}-7.37\%}$
test_update_nested 84.9480μs 30.5436μs 32.7401 KOps/s 35.7710 KOps/s $\textbf{\color{#d91a1a}-8.47\%}$
test_set_nested 62.9070μs 21.3219μs 46.9002 KOps/s 50.1853 KOps/s $\textbf{\color{#d91a1a}-6.55\%}$
test_set_nested_new 65.3820μs 25.4484μs 39.2952 KOps/s 42.6012 KOps/s $\textbf{\color{#d91a1a}-7.76\%}$
test_select 0.9249ms 38.7850μs 25.7831 KOps/s 27.4613 KOps/s $\textbf{\color{#d91a1a}-6.11\%}$
test_select_nested 0.1305ms 58.5390μs 17.0826 KOps/s 17.1620 KOps/s $\color{#d91a1a}-0.46\%$
test_exclude_nested 0.2882ms 0.1173ms 8.5257 KOps/s 8.4110 KOps/s $\color{#35bf28}+1.36\%$
test_empty[True] 0.7263ms 0.4073ms 2.4551 KOps/s 2.4738 KOps/s $\color{#d91a1a}-0.75\%$
test_empty[False] 4.6286μs 1.0364μs 964.8494 KOps/s 965.0603 KOps/s $\color{#d91a1a}-0.02\%$
test_unbind_speed 0.2911ms 0.2426ms 4.1213 KOps/s 4.1588 KOps/s $\color{#d91a1a}-0.90\%$
test_unbind_speed_stack0 87.1575ms 3.5157ms 284.4404 Ops/s 334.9060 Ops/s $\textbf{\color{#d91a1a}-15.07\%}$
test_unbind_speed_stack1 17.5730μs 1.9859μs 503.5558 KOps/s 510.5779 KOps/s $\color{#d91a1a}-1.38\%$
test_split 65.0252ms 1.6603ms 602.2880 Ops/s 613.8830 Ops/s $\color{#d91a1a}-1.89\%$
test_chunk 2.4017ms 1.4930ms 669.7749 Ops/s 631.2333 Ops/s $\textbf{\color{#35bf28}+6.11\%}$
test_creation[device0] 3.2903ms 0.1015ms 9.8526 KOps/s 10.3910 KOps/s $\textbf{\color{#d91a1a}-5.18\%}$
test_creation_from_tensor 0.1771ms 81.8273μs 12.2209 KOps/s 12.8042 KOps/s $\color{#d91a1a}-4.56\%$
test_add_one[memmap_tensor0] 0.2203ms 5.2911μs 188.9973 KOps/s 190.6690 KOps/s $\color{#d91a1a}-0.88\%$
test_contiguous[memmap_tensor0] 13.5560μs 0.6275μs 1.5936 MOps/s 1.5671 MOps/s $\color{#35bf28}+1.69\%$
test_stack[memmap_tensor0] 55.7340μs 3.4149μs 292.8363 KOps/s 280.2066 KOps/s $\color{#35bf28}+4.51\%$
test_memmaptd_index 1.1761ms 0.2255ms 4.4348 KOps/s 4.5585 KOps/s $\color{#d91a1a}-2.71\%$
test_memmaptd_index_astensor 69.1223ms 0.3060ms 3.2675 KOps/s 3.5922 KOps/s $\textbf{\color{#d91a1a}-9.04\%}$
test_memmaptd_index_op 0.8286ms 0.5859ms 1.7067 KOps/s 1.8610 KOps/s $\textbf{\color{#d91a1a}-8.29\%}$
test_serialize_model 0.1658s 0.1067s 9.3708 Ops/s 9.3538 Ops/s $\color{#35bf28}+0.18\%$
test_serialize_model_pickle 0.4558s 0.3758s 2.6610 Ops/s 2.6355 Ops/s $\color{#35bf28}+0.97\%$
test_serialize_weights 0.1607s 0.1023s 9.7748 Ops/s 9.3755 Ops/s $\color{#35bf28}+4.26\%$
test_serialize_weights_returnearly 0.2315s 0.1416s 7.0624 Ops/s 7.4056 Ops/s $\color{#d91a1a}-4.63\%$
test_serialize_weights_pickle 0.8014s 0.5394s 1.8540 Ops/s 2.3318 Ops/s $\textbf{\color{#d91a1a}-20.49\%}$
test_serialize_weights_filesystem 97.1322ms 89.5324ms 11.1691 Ops/s 10.5463 Ops/s $\textbf{\color{#35bf28}+5.91\%}$
test_serialize_model_filesystem 0.1641s 98.2061ms 10.1827 Ops/s 10.0796 Ops/s $\color{#35bf28}+1.02\%$
test_reshape_pytree 64.7510μs 23.2632μs 42.9863 KOps/s 42.5500 KOps/s $\color{#35bf28}+1.03\%$
test_reshape_td 84.7090μs 30.7096μs 32.5631 KOps/s 33.7710 KOps/s $\color{#d91a1a}-3.58\%$
test_view_pytree 59.8820μs 22.9529μs 43.5675 KOps/s 42.9274 KOps/s $\color{#35bf28}+1.49\%$
test_view_td 67.6905ms 10.4182μs 95.9855 KOps/s 93.9083 KOps/s $\color{#35bf28}+2.21\%$
test_unbind_pytree 62.5570μs 25.8590μs 38.6713 KOps/s 37.2387 KOps/s $\color{#35bf28}+3.85\%$
test_unbind_td 90.9190μs 35.2935μs 28.3338 KOps/s 27.9807 KOps/s $\color{#35bf28}+1.26\%$
test_split_pytree 76.5620μs 25.9030μs 38.6056 KOps/s 38.2459 KOps/s $\color{#35bf28}+0.94\%$
test_split_td 0.3824ms 40.4188μs 24.7409 KOps/s 24.9255 KOps/s $\color{#d91a1a}-0.74\%$
test_add_pytree 69.5190μs 31.3684μs 31.8792 KOps/s 31.1234 KOps/s $\color{#35bf28}+2.43\%$
test_add_td 0.1477ms 53.4884μs 18.6956 KOps/s 21.1300 KOps/s $\textbf{\color{#d91a1a}-11.52\%}$
test_distributed 0.1834ms 96.6303μs 10.3487 KOps/s 10.1372 KOps/s $\color{#35bf28}+2.09\%$
test_tdmodule 0.3041ms 23.7162μs 42.1652 KOps/s 45.3483 KOps/s $\textbf{\color{#d91a1a}-7.02\%}$
test_tdmodule_dispatch 0.1814ms 46.2314μs 21.6303 KOps/s 23.8307 KOps/s $\textbf{\color{#d91a1a}-9.23\%}$
test_tdseq 52.7490μs 26.7503μs 37.3827 KOps/s 40.5029 KOps/s $\textbf{\color{#d91a1a}-7.70\%}$
test_tdseq_dispatch 0.1429ms 50.2506μs 19.9003 KOps/s 21.4891 KOps/s $\textbf{\color{#d91a1a}-7.39\%}$
test_instantiation_functorch 2.0104ms 1.3085ms 764.2606 Ops/s 770.6703 Ops/s $\color{#d91a1a}-0.83\%$
test_instantiation_td 1.4549ms 1.0021ms 997.9267 Ops/s 918.1303 Ops/s $\textbf{\color{#35bf28}+8.69\%}$
test_exec_functorch 0.3191ms 0.1598ms 6.2576 KOps/s 6.4917 KOps/s $\color{#d91a1a}-3.60\%$
test_exec_functional_call 0.2538ms 0.1467ms 6.8162 KOps/s 6.9761 KOps/s $\color{#d91a1a}-2.29\%$
test_exec_td 0.2265ms 0.1454ms 6.8766 KOps/s 7.1601 KOps/s $\color{#d91a1a}-3.96\%$
test_exec_td_decorator 0.6309ms 0.1800ms 5.5550 KOps/s 5.6680 KOps/s $\color{#d91a1a}-1.99\%$
test_vmap_mlp_speed[True-True] 1.3949ms 0.9073ms 1.1022 KOps/s 1.1674 KOps/s $\textbf{\color{#d91a1a}-5.58\%}$
test_vmap_mlp_speed[True-False] 0.7782ms 0.4792ms 2.0870 KOps/s 2.1886 KOps/s $\color{#d91a1a}-4.64\%$
test_vmap_mlp_speed[False-True] 0.9095ms 0.7772ms 1.2867 KOps/s 1.3312 KOps/s $\color{#d91a1a}-3.34\%$
test_vmap_mlp_speed[False-False] 0.7384ms 0.3916ms 2.5536 KOps/s 2.6498 KOps/s $\color{#d91a1a}-3.63\%$
test_vmap_mlp_speed_decorator[True-True] 4.0496ms 2.3503ms 425.4768 Ops/s 432.0034 Ops/s $\color{#d91a1a}-1.51\%$
test_vmap_mlp_speed_decorator[True-False] 0.9499ms 0.5310ms 1.8834 KOps/s 1.9559 KOps/s $\color{#d91a1a}-3.71\%$
test_vmap_mlp_speed_decorator[False-True] 2.2447ms 1.9082ms 524.0480 Ops/s 541.7868 Ops/s $\color{#d91a1a}-3.27\%$
test_vmap_mlp_speed_decorator[False-False] 0.6727ms 0.4031ms 2.4811 KOps/s 2.5234 KOps/s $\color{#d91a1a}-1.68\%$

Copy link

github-actions bot commented Jan 27, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 132. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}25$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 60.8731ms 18.4715μs 54.1374 KOps/s 77.9774 KOps/s $\textbf{\color{#d91a1a}-30.57\%}$
test_plain_set_stack_nested 0.1442ms 0.1206ms 8.2943 KOps/s 8.4160 KOps/s $\color{#d91a1a}-1.45\%$
test_plain_set_nested_inplace 40.5320μs 15.3280μs 65.2401 KOps/s 70.6622 KOps/s $\textbf{\color{#d91a1a}-7.67\%}$
test_plain_set_stack_nested_inplace 0.2027ms 0.1487ms 6.7270 KOps/s 6.7560 KOps/s $\color{#d91a1a}-0.43\%$
test_items 20.1810μs 4.7925μs 208.6576 KOps/s 208.3749 KOps/s $\color{#35bf28}+0.14\%$
test_items_nested 0.4013ms 0.3446ms 2.9020 KOps/s 2.9055 KOps/s $\color{#d91a1a}-0.12\%$
test_items_nested_locked 0.4012ms 0.3485ms 2.8696 KOps/s 2.8769 KOps/s $\color{#d91a1a}-0.25\%$
test_items_nested_leaf 0.2453ms 0.2031ms 4.9227 KOps/s 4.8956 KOps/s $\color{#35bf28}+0.55\%$
test_items_stack_nested 1.5367ms 1.3289ms 752.5024 Ops/s 759.3651 Ops/s $\color{#d91a1a}-0.90\%$
test_items_stack_nested_leaf 1.2299ms 1.1600ms 862.0700 Ops/s 867.5459 Ops/s $\color{#d91a1a}-0.63\%$
test_items_stack_nested_locked 1.8267ms 0.9157ms 1.0920 KOps/s 1.1114 KOps/s $\color{#d91a1a}-1.74\%$
test_keys 27.4320μs 4.6063μs 217.0962 KOps/s 216.5920 KOps/s $\color{#35bf28}+0.23\%$
test_keys_nested 0.5776ms 94.9936μs 10.5270 KOps/s 10.4558 KOps/s $\color{#35bf28}+0.68\%$
test_keys_nested_locked 0.1204ms 98.4507μs 10.1574 KOps/s 10.1485 KOps/s $\color{#35bf28}+0.09\%$
test_keys_nested_leaf 0.1925ms 78.5652μs 12.7283 KOps/s 12.6164 KOps/s $\color{#35bf28}+0.89\%$
test_keys_stack_nested 1.2441ms 1.1760ms 850.3402 Ops/s 862.2380 Ops/s $\color{#d91a1a}-1.38\%$
test_keys_stack_nested_leaf 1.6194ms 1.1745ms 851.4069 Ops/s 865.0660 Ops/s $\color{#d91a1a}-1.58\%$
test_keys_stack_nested_locked 0.7894ms 0.7397ms 1.3519 KOps/s 1.3780 KOps/s $\color{#d91a1a}-1.90\%$
test_values 7.7470μs 1.8929μs 528.2995 KOps/s 527.0512 KOps/s $\color{#35bf28}+0.24\%$
test_values_nested 76.0740μs 45.0151μs 22.2147 KOps/s 21.9712 KOps/s $\color{#35bf28}+1.11\%$
test_values_nested_locked 70.4530μs 47.7416μs 20.9461 KOps/s 20.8391 KOps/s $\color{#35bf28}+0.51\%$
test_values_nested_leaf 61.8040μs 39.4228μs 25.3660 KOps/s 25.0250 KOps/s $\color{#35bf28}+1.36\%$
test_values_stack_nested 1.0072ms 0.9648ms 1.0364 KOps/s 1.0352 KOps/s $\color{#35bf28}+0.12\%$
test_values_stack_nested_leaf 1.0665ms 0.9721ms 1.0287 KOps/s 1.0563 KOps/s $\color{#d91a1a}-2.61\%$
test_values_stack_nested_locked 0.6684ms 0.5827ms 1.7161 KOps/s 1.7638 KOps/s $\color{#d91a1a}-2.71\%$
test_membership 4.9702μs 0.9389μs 1.0651 MOps/s 1.0381 MOps/s $\color{#35bf28}+2.60\%$
test_membership_nested 19.4310μs 2.9268μs 341.6740 KOps/s 338.7589 KOps/s $\color{#35bf28}+0.86\%$
test_membership_nested_leaf 21.3610μs 2.9251μs 341.8675 KOps/s 340.8451 KOps/s $\color{#35bf28}+0.30\%$
test_membership_stacked_nested 34.8220μs 11.2840μs 88.6211 KOps/s 87.9461 KOps/s $\color{#35bf28}+0.77\%$
test_membership_stacked_nested_leaf 27.7010μs 11.2549μs 88.8504 KOps/s 87.6629 KOps/s $\color{#35bf28}+1.35\%$
test_membership_nested_last 21.7110μs 5.3293μs 187.6405 KOps/s 185.7365 KOps/s $\color{#35bf28}+1.03\%$
test_membership_nested_leaf_last 28.5620μs 5.3388μs 187.3079 KOps/s 184.2803 KOps/s $\color{#35bf28}+1.64\%$
test_membership_stacked_nested_last 0.2005ms 0.1582ms 6.3198 KOps/s 6.3621 KOps/s $\color{#d91a1a}-0.66\%$
test_membership_stacked_nested_leaf_last 33.5020μs 13.1620μs 75.9766 KOps/s 75.4024 KOps/s $\color{#35bf28}+0.76\%$
test_nested_getleaf 58.9640μs 8.4481μs 118.3694 KOps/s 118.2619 KOps/s $\color{#35bf28}+0.09\%$
test_nested_get 24.6320μs 7.9381μs 125.9751 KOps/s 126.0934 KOps/s $\color{#d91a1a}-0.09\%$
test_stacked_getleaf 0.3617ms 0.3314ms 3.0171 KOps/s 3.0271 KOps/s $\color{#d91a1a}-0.33\%$
test_stacked_get 0.3351ms 0.2970ms 3.3672 KOps/s 3.3678 KOps/s $\color{#d91a1a}-0.02\%$
test_nested_getitemleaf 25.9520μs 9.9076μs 100.9323 KOps/s 101.6669 KOps/s $\color{#d91a1a}-0.72\%$
test_nested_getitem 30.5820μs 9.3946μs 106.4438 KOps/s 106.7480 KOps/s $\color{#d91a1a}-0.28\%$
test_stacked_getitemleaf 0.3888ms 0.3324ms 3.0087 KOps/s 3.0002 KOps/s $\color{#35bf28}+0.28\%$
test_stacked_getitem 0.3352ms 0.2992ms 3.3418 KOps/s 3.3602 KOps/s $\color{#d91a1a}-0.55\%$
test_lock_nested 0.7640ms 0.3537ms 2.8272 KOps/s 2.8249 KOps/s $\color{#35bf28}+0.08\%$
test_lock_stack_nested 84.2726ms 6.2180ms 160.8240 Ops/s 160.2953 Ops/s $\color{#35bf28}+0.33\%$
test_unlock_nested 79.1403ms 0.4296ms 2.3275 KOps/s 2.8271 KOps/s $\textbf{\color{#d91a1a}-17.67\%}$
test_unlock_stack_nested 85.1396ms 6.2959ms 158.8347 Ops/s 154.3434 Ops/s $\color{#35bf28}+2.91\%$
test_flatten_speed 0.6603ms 0.2623ms 3.8130 KOps/s 3.8173 KOps/s $\color{#d91a1a}-0.11\%$
test_unflatten_speed 0.3891ms 0.3637ms 2.7495 KOps/s 2.7419 KOps/s $\color{#35bf28}+0.28\%$
test_common_ops 1.0858ms 0.6277ms 1.5932 KOps/s 1.7678 KOps/s $\textbf{\color{#d91a1a}-9.87\%}$
test_creation 14.1810μs 1.5806μs 632.6831 KOps/s 630.1386 KOps/s $\color{#35bf28}+0.40\%$
test_creation_empty 29.7910μs 9.2220μs 108.4362 KOps/s 145.9787 KOps/s $\textbf{\color{#d91a1a}-25.72\%}$
test_creation_nested_1 26.3010μs 10.9102μs 91.6577 KOps/s 115.7523 KOps/s $\textbf{\color{#d91a1a}-20.82\%}$
test_creation_nested_2 38.9120μs 13.3364μs 74.9827 KOps/s 90.1913 KOps/s $\textbf{\color{#d91a1a}-16.86\%}$
test_clone 40.9330μs 14.3272μs 69.7974 KOps/s 73.6557 KOps/s $\textbf{\color{#d91a1a}-5.24\%}$
test_getitem[int] 53.2820μs 11.0111μs 90.8176 KOps/s 90.3161 KOps/s $\color{#35bf28}+0.56\%$
test_getitem[slice_int] 40.4820μs 21.8214μs 45.8266 KOps/s 46.0556 KOps/s $\color{#d91a1a}-0.50\%$
test_getitem[range] 70.4640μs 37.3265μs 26.7906 KOps/s 27.6948 KOps/s $\color{#d91a1a}-3.26\%$
test_getitem[tuple] 46.0630μs 19.2025μs 52.0766 KOps/s 52.2178 KOps/s $\color{#d91a1a}-0.27\%$
test_getitem[list] 0.1900ms 34.4412μs 29.0350 KOps/s 30.3367 KOps/s $\color{#d91a1a}-4.29\%$
test_setitem_dim[int] 46.9730μs 27.3884μs 36.5118 KOps/s 40.5057 KOps/s $\textbf{\color{#d91a1a}-9.86\%}$
test_setitem_dim[slice_int] 69.1140μs 48.0078μs 20.8299 KOps/s 21.9742 KOps/s $\textbf{\color{#d91a1a}-5.21\%}$
test_setitem_dim[range] 83.7450μs 62.7352μs 15.9400 KOps/s 17.0790 KOps/s $\textbf{\color{#d91a1a}-6.67\%}$
test_setitem_dim[tuple] 63.3640μs 42.1716μs 23.7126 KOps/s 25.7688 KOps/s $\textbf{\color{#d91a1a}-7.98\%}$
test_setitem 67.5540μs 19.6594μs 50.8663 KOps/s 56.9453 KOps/s $\textbf{\color{#d91a1a}-10.68\%}$
test_set 60.3930μs 18.8469μs 53.0593 KOps/s 58.7154 KOps/s $\textbf{\color{#d91a1a}-9.63\%}$
test_set_shared 2.6227ms 0.1027ms 9.7368 KOps/s 9.9469 KOps/s $\color{#d91a1a}-2.11\%$
test_update 0.1045ms 22.0749μs 45.3004 KOps/s 53.4031 KOps/s $\textbf{\color{#d91a1a}-15.17\%}$
test_update_nested 81.2940μs 29.2098μs 34.2350 KOps/s 39.2387 KOps/s $\textbf{\color{#d91a1a}-12.75\%}$
test_set_nested 71.8440μs 20.5147μs 48.7456 KOps/s 54.3129 KOps/s $\textbf{\color{#d91a1a}-10.25\%}$
test_set_nested_new 67.2240μs 23.4656μs 42.6155 KOps/s 47.6624 KOps/s $\textbf{\color{#d91a1a}-10.59\%}$
test_select 95.9050μs 34.8613μs 28.6851 KOps/s 29.2901 KOps/s $\color{#d91a1a}-2.07\%$
test_select_nested 80.7250μs 53.1892μs 18.8008 KOps/s 18.5914 KOps/s $\color{#35bf28}+1.13\%$
test_exclude_nested 0.1582ms 0.1166ms 8.5757 KOps/s 8.5646 KOps/s $\color{#35bf28}+0.13\%$
test_empty[True] 0.4543ms 0.3917ms 2.5529 KOps/s 2.5734 KOps/s $\color{#d91a1a}-0.80\%$
test_empty[False] 2.4541μs 0.8488μs 1.1781 MOps/s 1.1808 MOps/s $\color{#d91a1a}-0.23\%$
test_to 86.7350μs 56.3852μs 17.7352 KOps/s 17.4359 KOps/s $\color{#35bf28}+1.72\%$
test_to_nonblocking 61.6840μs 33.2183μs 30.1038 KOps/s 30.3331 KOps/s $\color{#d91a1a}-0.76\%$
test_unbind_speed 0.3514ms 0.2643ms 3.7835 KOps/s 3.6951 KOps/s $\color{#35bf28}+2.39\%$
test_unbind_speed_stack0 86.9760ms 3.7360ms 267.6633 Ops/s 269.4842 Ops/s $\color{#d91a1a}-0.68\%$
test_unbind_speed_stack1 16.1810μs 1.8113μs 552.1034 KOps/s 536.9437 KOps/s $\color{#35bf28}+2.82\%$
test_split 2.2106ms 1.5627ms 639.8993 Ops/s 644.3375 Ops/s $\color{#d91a1a}-0.69\%$
test_chunk 80.2492ms 1.6873ms 592.6519 Ops/s 594.9482 Ops/s $\color{#d91a1a}-0.39\%$
test_creation[device0] 0.1326ms 73.4366μs 13.6172 KOps/s 13.8280 KOps/s $\color{#d91a1a}-1.52\%$
test_creation_from_tensor 0.1430ms 55.1165μs 18.1434 KOps/s 18.4407 KOps/s $\color{#d91a1a}-1.61\%$
test_add_one[memmap_tensor0] 0.2955ms 6.6867μs 149.5507 KOps/s 156.9194 KOps/s $\color{#d91a1a}-4.70\%$
test_contiguous[memmap_tensor0] 10.2000μs 0.6413μs 1.5593 MOps/s 1.5266 MOps/s $\color{#35bf28}+2.15\%$
test_stack[memmap_tensor0] 40.9830μs 4.7568μs 210.2262 KOps/s 223.0767 KOps/s $\textbf{\color{#d91a1a}-5.76\%}$
test_memmaptd_index 0.4827ms 0.2716ms 3.6816 KOps/s 3.7863 KOps/s $\color{#d91a1a}-2.77\%$
test_memmaptd_index_astensor 0.6015ms 0.3296ms 3.0341 KOps/s 3.1177 KOps/s $\color{#d91a1a}-2.68\%$
test_memmaptd_index_op 1.0597ms 0.6221ms 1.6075 KOps/s 1.7460 KOps/s $\textbf{\color{#d91a1a}-7.93\%}$
test_serialize_model 0.1726s 98.3780ms 10.1649 Ops/s 9.5230 Ops/s $\textbf{\color{#35bf28}+6.74\%}$
test_serialize_model_pickle 1.3523s 1.2354s 0.8094 Ops/s 0.8083 Ops/s $\color{#35bf28}+0.14\%$
test_serialize_weights 0.1710s 96.3328ms 10.3807 Ops/s 9.8065 Ops/s $\textbf{\color{#35bf28}+5.85\%}$
test_serialize_weights_returnearly 0.2574s 73.6361ms 13.5803 Ops/s 12.2528 Ops/s $\textbf{\color{#35bf28}+10.83\%}$
test_serialize_weights_pickle 1.4127s 1.2556s 0.7965 Ops/s 0.8029 Ops/s $\color{#d91a1a}-0.80\%$
test_reshape_pytree 0.2448ms 25.7302μs 38.8648 KOps/s 40.2982 KOps/s $\color{#d91a1a}-3.56\%$
test_reshape_td 0.1660ms 30.2504μs 33.0574 KOps/s 33.7388 KOps/s $\color{#d91a1a}-2.02\%$
test_view_pytree 50.3020μs 24.3412μs 41.0827 KOps/s 41.0111 KOps/s $\color{#35bf28}+0.17\%$
test_view_td 85.4040ms 10.2332μs 97.7215 KOps/s 96.5886 KOps/s $\color{#35bf28}+1.17\%$
test_unbind_pytree 76.5340μs 30.2415μs 33.0671 KOps/s 31.1602 KOps/s $\textbf{\color{#35bf28}+6.12\%}$
test_unbind_td 0.2696ms 40.4097μs 24.7465 KOps/s 24.7271 KOps/s $\color{#35bf28}+0.08\%$
test_split_pytree 63.4340μs 29.0379μs 34.4377 KOps/s 34.0835 KOps/s $\color{#35bf28}+1.04\%$
test_split_td 0.2291ms 39.2306μs 25.4903 KOps/s 25.4674 KOps/s $\color{#35bf28}+0.09\%$
test_add_pytree 62.4330μs 35.6991μs 28.0119 KOps/s 28.3715 KOps/s $\color{#d91a1a}-1.27\%$
test_add_td 0.2591ms 48.4440μs 20.6424 KOps/s 23.1501 KOps/s $\textbf{\color{#d91a1a}-10.83\%}$
test_distributed 0.1698ms 70.2550μs 14.2339 KOps/s 14.1905 KOps/s $\color{#35bf28}+0.31\%$
test_tdmodule 36.0920μs 18.6336μs 53.6665 KOps/s 57.6905 KOps/s $\textbf{\color{#d91a1a}-6.98\%}$
test_tdmodule_dispatch 0.2408ms 38.7372μs 25.8150 KOps/s 28.1432 KOps/s $\textbf{\color{#d91a1a}-8.27\%}$
test_tdseq 43.2620μs 21.2771μs 46.9990 KOps/s 49.9618 KOps/s $\textbf{\color{#d91a1a}-5.93\%}$
test_tdseq_dispatch 61.2240μs 40.5436μs 24.6648 KOps/s 26.0385 KOps/s $\textbf{\color{#d91a1a}-5.28\%}$
test_instantiation_functorch 1.8926ms 1.7017ms 587.6309 Ops/s 585.7720 Ops/s $\color{#35bf28}+0.32\%$
test_instantiation_td 1.7101ms 1.1686ms 855.7238 Ops/s 851.9586 Ops/s $\color{#35bf28}+0.44\%$
test_exec_functorch 0.2114ms 0.1601ms 6.2465 KOps/s 6.3162 KOps/s $\color{#d91a1a}-1.10\%$
test_exec_functional_call 0.3835ms 0.1576ms 6.3442 KOps/s 6.2859 KOps/s $\color{#35bf28}+0.93\%$
test_exec_td 0.1851ms 0.1499ms 6.6694 KOps/s 6.7205 KOps/s $\color{#d91a1a}-0.76\%$
test_exec_td_decorator 0.7095ms 0.1865ms 5.3609 KOps/s 5.3535 KOps/s $\color{#35bf28}+0.14\%$
test_vmap_mlp_speed[True-True] 1.2389ms 1.0331ms 967.9283 Ops/s 938.2196 Ops/s $\color{#35bf28}+3.17\%$
test_vmap_mlp_speed[True-False] 0.7942ms 0.5927ms 1.6871 KOps/s 1.6790 KOps/s $\color{#35bf28}+0.48\%$
test_vmap_mlp_speed[False-True] 1.1621ms 0.9435ms 1.0599 KOps/s 1.0274 KOps/s $\color{#35bf28}+3.16\%$
test_vmap_mlp_speed[False-False] 0.7288ms 0.5206ms 1.9207 KOps/s 1.8490 KOps/s $\color{#35bf28}+3.88\%$
test_vmap_mlp_speed_decorator[True-True] 2.7908ms 2.3316ms 428.8848 Ops/s 430.6745 Ops/s $\color{#d91a1a}-0.42\%$
test_vmap_mlp_speed_decorator[True-False] 0.9303ms 0.6397ms 1.5632 KOps/s 1.5339 KOps/s $\color{#35bf28}+1.90\%$
test_vmap_mlp_speed_decorator[False-True] 2.4798ms 2.0365ms 491.0445 Ops/s 509.3174 Ops/s $\color{#d91a1a}-3.59\%$
test_vmap_mlp_speed_decorator[False-False] 0.9284ms 0.5566ms 1.7968 KOps/s 1.8075 KOps/s $\color{#d91a1a}-0.59\%$
test_vmap_transformer_speed[True-True] 12.7275ms 12.4098ms 80.5813 Ops/s 78.0809 Ops/s $\color{#35bf28}+3.20\%$
test_vmap_transformer_speed[True-False] 8.6347ms 8.1606ms 122.5395 Ops/s 122.8277 Ops/s $\color{#d91a1a}-0.23\%$
test_vmap_transformer_speed[False-True] 12.7887ms 12.2898ms 81.3686 Ops/s 81.3298 Ops/s $\color{#35bf28}+0.05\%$
test_vmap_transformer_speed[False-False] 8.5373ms 8.0562ms 124.1286 Ops/s 123.7357 Ops/s $\color{#35bf28}+0.32\%$
test_vmap_transformer_speed_decorator[True-True] 74.1191ms 73.3315ms 13.6367 Ops/s 13.5567 Ops/s $\color{#35bf28}+0.59\%$
test_vmap_transformer_speed_decorator[True-False] 20.5901ms 19.0270ms 52.5569 Ops/s 45.4604 Ops/s $\textbf{\color{#35bf28}+15.61\%}$
test_vmap_transformer_speed_decorator[False-True] 66.6361ms 65.5729ms 15.2502 Ops/s 15.0882 Ops/s $\color{#35bf28}+1.07\%$
test_vmap_transformer_speed_decorator[False-False] 20.3060ms 18.6686ms 53.5660 Ops/s 52.8300 Ops/s $\color{#35bf28}+1.39\%$

@vmoens vmoens merged commit 169b259 into main Jan 27, 2024
33 of 40 checks passed
@vmoens vmoens deleted the complete-named-apply branch January 27, 2024 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants