Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] _PASSTHROUGH_MEMO for passthrough tensorclass #1231

Merged
merged 2 commits into from
Feb 24, 2025

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 24, 2025

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 24, 2025
ghstack-source-id: 20e6f797afe30a4bd8f45fcd4e3b9a8f5af4bb4d
Pull Request resolved: #1231
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 24, 2025
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 24, 2025
ghstack-source-id: 0bfbfc9f6700f1165fcfd6b38f65fa4fd806be80
Pull Request resolved: #1231
@vmoens vmoens merged commit 4674919 into gh/vmoens/48/base Feb 24, 2025
5 checks passed
vmoens added a commit that referenced this pull request Feb 24, 2025
ghstack-source-id: 0bfbfc9f6700f1165fcfd6b38f65fa4fd806be80
Pull Request resolved: #1231
@vmoens vmoens deleted the gh/vmoens/48/head branch February 24, 2025 14:41
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 217. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}12$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 42.2090μs 21.0452μs 47.5169 KOps/s 49.3762 KOps/s $\color{#d91a1a}-3.77\%$
test_plain_set_stack_nested 84.4190μs 20.9759μs 47.6737 KOps/s 47.6987 KOps/s $\color{#d91a1a}-0.05\%$
test_plain_set_nested_inplace 49.4830μs 22.5617μs 44.3228 KOps/s 44.4651 KOps/s $\color{#d91a1a}-0.32\%$
test_plain_set_stack_nested_inplace 85.6910μs 22.9341μs 43.6033 KOps/s 44.9390 KOps/s $\color{#d91a1a}-2.97\%$
test_items 38.6830μs 4.2249μs 236.6896 KOps/s 235.3418 KOps/s $\color{#35bf28}+0.57\%$
test_items_nested 0.5920ms 0.4131ms 2.4209 KOps/s 2.4636 KOps/s $\color{#d91a1a}-1.74\%$
test_items_nested_locked 0.7647ms 0.4087ms 2.4468 KOps/s 2.4765 KOps/s $\color{#d91a1a}-1.20\%$
test_items_nested_leaf 0.2086ms 78.5919μs 12.7240 KOps/s 13.0317 KOps/s $\color{#d91a1a}-2.36\%$
test_items_stack_nested 0.8268ms 0.4104ms 2.4367 KOps/s 2.4657 KOps/s $\color{#d91a1a}-1.17\%$
test_items_stack_nested_leaf 0.1484ms 78.3256μs 12.7672 KOps/s 12.9147 KOps/s $\color{#d91a1a}-1.14\%$
test_items_stack_nested_locked 0.5526ms 0.4096ms 2.4415 KOps/s 2.4346 KOps/s $\color{#35bf28}+0.29\%$
test_keys 28.3340μs 3.5972μs 277.9941 KOps/s 282.8771 KOps/s $\color{#d91a1a}-1.73\%$
test_keys_nested 0.2686ms 0.1666ms 6.0010 KOps/s 6.0553 KOps/s $\color{#d91a1a}-0.90\%$
test_keys_nested_locked 2.8613ms 0.1722ms 5.8085 KOps/s 5.8531 KOps/s $\color{#d91a1a}-0.76\%$
test_keys_nested_leaf 0.2369ms 0.1451ms 6.8922 KOps/s 6.9563 KOps/s $\color{#d91a1a}-0.92\%$
test_keys_stack_nested 0.3152ms 0.1668ms 5.9959 KOps/s 6.0848 KOps/s $\color{#d91a1a}-1.46\%$
test_keys_stack_nested_leaf 0.2319ms 0.1447ms 6.9125 KOps/s 7.0270 KOps/s $\color{#d91a1a}-1.63\%$
test_keys_stack_nested_locked 0.2463ms 0.1745ms 5.7311 KOps/s 5.9025 KOps/s $\color{#d91a1a}-2.90\%$
test_values 11.1450μs 1.0626μs 941.0859 KOps/s 984.9276 KOps/s $\color{#d91a1a}-4.45\%$
test_values_nested 0.1271ms 63.8692μs 15.6570 KOps/s 16.2091 KOps/s $\color{#d91a1a}-3.41\%$
test_values_nested_locked 0.1478ms 64.6662μs 15.4640 KOps/s 16.1170 KOps/s $\color{#d91a1a}-4.05\%$
test_values_nested_leaf 0.1570ms 73.2387μs 13.6540 KOps/s 14.1280 KOps/s $\color{#d91a1a}-3.35\%$
test_values_stack_nested 0.1346ms 64.3068μs 15.5505 KOps/s 15.6120 KOps/s $\color{#d91a1a}-0.39\%$
test_values_stack_nested_leaf 0.1460ms 72.9270μs 13.7123 KOps/s 14.0289 KOps/s $\color{#d91a1a}-2.26\%$
test_values_stack_nested_locked 0.1456ms 64.0185μs 15.6205 KOps/s 16.1190 KOps/s $\color{#d91a1a}-3.09\%$
test_membership 15.8390μs 0.9259μs 1.0800 MOps/s 1.1573 MOps/s $\textbf{\color{#d91a1a}-6.68\%}$
test_membership_nested 25.9590μs 3.0327μs 329.7413 KOps/s 347.8855 KOps/s $\textbf{\color{#d91a1a}-5.22\%}$
test_membership_nested_leaf 67.5060μs 2.9550μs 338.4092 KOps/s 345.8257 KOps/s $\color{#d91a1a}-2.14\%$
test_membership_stacked_nested 35.1160μs 2.9984μs 333.5165 KOps/s 346.2982 KOps/s $\color{#d91a1a}-3.69\%$
test_membership_stacked_nested_leaf 49.0560μs 2.8576μs 349.9475 KOps/s 344.5824 KOps/s $\color{#35bf28}+1.56\%$
test_membership_nested_last 52.4890μs 4.4175μs 226.3718 KOps/s 229.7431 KOps/s $\color{#d91a1a}-1.47\%$
test_membership_nested_leaf_last 29.0250μs 4.3041μs 232.3376 KOps/s 227.7414 KOps/s $\color{#35bf28}+2.02\%$
test_membership_stacked_nested_last 24.8670μs 4.4760μs 223.4147 KOps/s 230.4807 KOps/s $\color{#d91a1a}-3.07\%$
test_membership_stacked_nested_leaf_last 35.0050μs 4.3239μs 231.2743 KOps/s 227.1677 KOps/s $\color{#35bf28}+1.81\%$
test_nested_getleaf 64.0970μs 10.4943μs 95.2897 KOps/s 94.9816 KOps/s $\color{#35bf28}+0.32\%$
test_nested_get 48.9920μs 9.8622μs 101.3976 KOps/s 100.7958 KOps/s $\color{#35bf28}+0.60\%$
test_stacked_getleaf 33.9440μs 10.5966μs 94.3700 KOps/s 95.1953 KOps/s $\color{#d91a1a}-0.87\%$
test_stacked_get 37.5610μs 10.2217μs 97.8313 KOps/s 98.7389 KOps/s $\color{#d91a1a}-0.92\%$
test_nested_getitemleaf 44.7940μs 11.2375μs 88.9876 KOps/s 87.7183 KOps/s $\color{#35bf28}+1.45\%$
test_nested_getitem 60.5940μs 10.8212μs 92.4111 KOps/s 92.6202 KOps/s $\color{#d91a1a}-0.23\%$
test_stacked_getitemleaf 53.6410μs 11.1529μs 89.6631 KOps/s 88.3512 KOps/s $\color{#35bf28}+1.48\%$
test_stacked_getitem 38.0820μs 10.7405μs 93.1059 KOps/s 93.6451 KOps/s $\color{#d91a1a}-0.58\%$
test_lock_nested 0.7321ms 0.4183ms 2.3908 KOps/s 2.4456 KOps/s $\color{#d91a1a}-2.24\%$
test_lock_stack_nested 0.6962ms 0.4291ms 2.3304 KOps/s 2.3646 KOps/s $\color{#d91a1a}-1.45\%$
test_unlock_nested 0.5227ms 0.3402ms 2.9398 KOps/s 2.9697 KOps/s $\color{#d91a1a}-1.01\%$
test_unlock_stack_nested 0.4553ms 0.3440ms 2.9068 KOps/s 2.9053 KOps/s $\color{#35bf28}+0.05\%$
test_flatten_speed 0.1930ms 99.4073μs 10.0596 KOps/s 9.8910 KOps/s $\color{#35bf28}+1.70\%$
test_unflatten_speed 0.5990ms 0.5088ms 1.9654 KOps/s 1.8866 KOps/s $\color{#35bf28}+4.17\%$
test_common_ops 1.3485ms 0.8125ms 1.2308 KOps/s 1.2369 KOps/s $\color{#d91a1a}-0.49\%$
test_creation 49.6540μs 2.4773μs 403.6594 KOps/s 390.5947 KOps/s $\color{#35bf28}+3.34\%$
test_creation_empty 43.5820μs 12.7938μs 78.1629 KOps/s 84.9393 KOps/s $\textbf{\color{#d91a1a}-7.98\%}$
test_creation_nested_1 44.7240μs 15.6198μs 64.0212 KOps/s 69.1661 KOps/s $\textbf{\color{#d91a1a}-7.44\%}$
test_creation_nested_2 52.0280μs 20.2665μs 49.3425 KOps/s 52.2632 KOps/s $\textbf{\color{#d91a1a}-5.59\%}$
test_clone 83.5770μs 13.2491μs 75.4770 KOps/s 72.4944 KOps/s $\color{#35bf28}+4.11\%$
test_getitem[int] 1.2785ms 12.8804μs 77.6376 KOps/s 79.7445 KOps/s $\color{#d91a1a}-2.64\%$
test_getitem[slice_int] 0.1354ms 24.3660μs 41.0408 KOps/s 41.4256 KOps/s $\color{#d91a1a}-0.93\%$
test_getitem[range] 0.1679ms 49.3091μs 20.2803 KOps/s 18.9164 KOps/s $\textbf{\color{#35bf28}+7.21\%}$
test_getitem[tuple] 0.1324ms 19.7607μs 50.6055 KOps/s 49.3418 KOps/s $\color{#35bf28}+2.56\%$
test_getitem[list] 0.1780ms 45.7107μs 21.8767 KOps/s 20.9861 KOps/s $\color{#35bf28}+4.24\%$
test_setitem_dim[int] 46.1870μs 25.0008μs 39.9987 KOps/s 39.4426 KOps/s $\color{#35bf28}+1.41\%$
test_setitem_dim[slice_int] 75.5320μs 51.1797μs 19.5390 KOps/s 19.1939 KOps/s $\color{#35bf28}+1.80\%$
test_setitem_dim[range] 0.1330ms 76.0866μs 13.1429 KOps/s 12.9201 KOps/s $\color{#35bf28}+1.72\%$
test_setitem_dim[tuple] 72.8270μs 40.2747μs 24.8295 KOps/s 24.8485 KOps/s $\color{#d91a1a}-0.08\%$
test_setitem 0.1417ms 20.5686μs 48.6178 KOps/s 48.3601 KOps/s $\color{#35bf28}+0.53\%$
test_set 92.0630μs 20.1596μs 49.6042 KOps/s 49.7350 KOps/s $\color{#d91a1a}-0.26\%$
test_set_shared 5.2948ms 0.1857ms 5.3862 KOps/s 5.2656 KOps/s $\color{#35bf28}+2.29\%$
test_update 0.1535ms 23.5321μs 42.4951 KOps/s 43.4889 KOps/s $\color{#d91a1a}-2.29\%$
test_update_nested 0.1410ms 34.6457μs 28.8636 KOps/s 29.2521 KOps/s $\color{#d91a1a}-1.33\%$
test_update__nested 0.6510ms 32.5463μs 30.7255 KOps/s 28.0114 KOps/s $\textbf{\color{#35bf28}+9.69\%}$
test_set_nested 0.1521ms 23.2432μs 43.0234 KOps/s 44.4850 KOps/s $\color{#d91a1a}-3.29\%$
test_set_nested_new 89.6080μs 26.7771μs 37.3454 KOps/s 37.3104 KOps/s $\color{#35bf28}+0.09\%$
test_select 86.9530μs 43.1368μs 23.1821 KOps/s 23.6335 KOps/s $\color{#d91a1a}-1.91\%$
test_select_nested 0.1148ms 63.5673μs 15.7314 KOps/s 15.9366 KOps/s $\color{#d91a1a}-1.29\%$
test_exclude_nested 0.1559ms 81.4935μs 12.2709 KOps/s 12.3852 KOps/s $\color{#d91a1a}-0.92\%$
test_empty[True] 0.5610ms 0.4071ms 2.4561 KOps/s 2.4540 KOps/s $\color{#35bf28}+0.08\%$
test_empty[False] 13.3750μs 1.4019μs 713.3428 KOps/s 725.5836 KOps/s $\color{#d91a1a}-1.69\%$
test_unbind_speed 1.8861ms 0.2722ms 3.6740 KOps/s 3.6998 KOps/s $\color{#d91a1a}-0.70\%$
test_unbind_speed_stack0 0.3579ms 0.2701ms 3.7026 KOps/s 3.7373 KOps/s $\color{#d91a1a}-0.93\%$
test_unbind_speed_stack1 0.1338s 0.7651ms 1.3070 KOps/s 1.1261 KOps/s $\textbf{\color{#35bf28}+16.06\%}$
test_split 0.1339s 1.8252ms 547.8953 Ops/s 639.5275 Ops/s $\textbf{\color{#d91a1a}-14.33\%}$
test_chunk 0.1457s 1.8283ms 546.9427 Ops/s 547.1949 Ops/s $\color{#d91a1a}-0.05\%$
test_consolidate_njt[False-None] 8.6810ms 8.3276ms 120.0826 Ops/s 101.3909 Ops/s $\textbf{\color{#35bf28}+18.44\%}$
test_creation[device0] 4.6533ms 95.2077μs 10.5034 KOps/s 10.5207 KOps/s $\color{#d91a1a}-0.17\%$
test_creation_from_tensor 0.3263ms 96.3441μs 10.3795 KOps/s 9.9934 KOps/s $\color{#35bf28}+3.86\%$
test_add_one[memmap_tensor0] 84.0880μs 4.8545μs 205.9934 KOps/s 203.9354 KOps/s $\color{#35bf28}+1.01\%$
test_contiguous[memmap_tensor0] 10.8700μs 0.5170μs 1.9341 MOps/s 1.9882 MOps/s $\color{#d91a1a}-2.72\%$
test_stack[memmap_tensor0] 26.7100μs 3.6004μs 277.7431 KOps/s 296.5546 KOps/s $\textbf{\color{#d91a1a}-6.34\%}$
test_memmaptd_index 1.4028ms 0.2299ms 4.3488 KOps/s 4.2704 KOps/s $\color{#35bf28}+1.84\%$
test_memmaptd_index_astensor 0.7257ms 0.3170ms 3.1548 KOps/s 3.1127 KOps/s $\color{#35bf28}+1.35\%$
test_memmaptd_index_op 0.9716ms 0.6033ms 1.6576 KOps/s 1.6801 KOps/s $\color{#d91a1a}-1.34\%$
test_serialize_model 0.2651s 0.1453s 6.8821 Ops/s 7.9163 Ops/s $\textbf{\color{#d91a1a}-13.06\%}$
test_serialize_model_pickle 0.4511s 0.3976s 2.5152 Ops/s 2.3616 Ops/s $\textbf{\color{#35bf28}+6.50\%}$
test_serialize_weights 0.1348s 0.1280s 7.8138 Ops/s 7.6883 Ops/s $\color{#35bf28}+1.63\%$
test_serialize_weights_returnearly 0.1879s 0.1709s 5.8525 Ops/s 5.7411 Ops/s $\color{#35bf28}+1.94\%$
test_serialize_weights_pickle 0.4521s 0.4086s 2.4474 Ops/s 2.4988 Ops/s $\color{#d91a1a}-2.05\%$
test_serialize_weights_filesystem 0.1823s 0.1588s 6.2960 Ops/s 6.3282 Ops/s $\color{#d91a1a}-0.51\%$
test_serialize_model_filesystem 0.1747s 0.1616s 6.1894 Ops/s 5.8527 Ops/s $\textbf{\color{#35bf28}+5.75\%}$
test_reshape_pytree 53.4700μs 25.5650μs 39.1159 KOps/s 35.5397 KOps/s $\textbf{\color{#35bf28}+10.06\%}$
test_reshape_td 77.4460μs 33.0927μs 30.2181 KOps/s 31.1868 KOps/s $\color{#d91a1a}-3.11\%$
test_view_pytree 0.1165ms 26.7742μs 37.3495 KOps/s 38.3530 KOps/s $\color{#d91a1a}-2.62\%$
test_view_td 87.3040μs 40.0598μs 24.9627 KOps/s 25.4435 KOps/s $\color{#d91a1a}-1.89\%$
test_unbind_pytree 63.1690μs 28.7022μs 34.8405 KOps/s 33.5707 KOps/s $\color{#35bf28}+3.78\%$
test_unbind_td 0.3615ms 42.0822μs 23.7630 KOps/s 25.5185 KOps/s $\textbf{\color{#d91a1a}-6.88\%}$
test_split_pytree 81.2520μs 28.6733μs 34.8756 KOps/s 34.7307 KOps/s $\color{#35bf28}+0.42\%$
test_split_td 0.5258ms 45.9478μs 21.7638 KOps/s 22.2104 KOps/s $\color{#d91a1a}-2.01\%$
test_add_pytree 87.4240μs 36.4592μs 27.4280 KOps/s 27.6046 KOps/s $\color{#d91a1a}-0.64\%$
test_add_td 0.1351ms 57.2883μs 17.4556 KOps/s 17.1320 KOps/s $\color{#35bf28}+1.89\%$
test_compile_add_one_nested[tensordict-compile] 0.1619ms 70.4489μs 14.1947 KOps/s 14.1547 KOps/s $\color{#35bf28}+0.28\%$
test_compile_add_one_nested[tensordict-eager] 0.4018ms 0.1763ms 5.6723 KOps/s 5.6395 KOps/s $\color{#35bf28}+0.58\%$
test_compile_add_one_nested[pytree-compile] 0.1194ms 47.1115μs 21.2262 KOps/s 21.2217 KOps/s $\color{#35bf28}+0.02\%$
test_compile_add_one_nested[pytree-eager] 0.2363ms 0.1237ms 8.0830 KOps/s 8.2148 KOps/s $\color{#d91a1a}-1.60\%$
test_compile_copy_nested[tensordict-compile] 95.0990μs 30.2773μs 33.0280 KOps/s 35.0702 KOps/s $\textbf{\color{#d91a1a}-5.82\%}$
test_compile_copy_nested[tensordict-eager] 0.1151ms 61.1140μs 16.3629 KOps/s 16.8100 KOps/s $\color{#d91a1a}-2.66\%$
test_compile_copy_nested[pytree-compile] 0.1784ms 83.4628μs 11.9814 KOps/s 12.4490 KOps/s $\color{#d91a1a}-3.76\%$
test_compile_copy_nested[pytree-eager] 0.1282ms 69.7346μs 14.3401 KOps/s 14.8027 KOps/s $\color{#d91a1a}-3.12\%$
test_compile_add_one_flat[tensordict-compile] 0.2164ms 0.1154ms 8.6671 KOps/s 8.8913 KOps/s $\color{#d91a1a}-2.52\%$
test_compile_add_one_flat[tensordict-eager] 0.3244ms 0.2205ms 4.5350 KOps/s 4.5061 KOps/s $\color{#35bf28}+0.64\%$
test_compile_add_one_flat[tensorclass-compile] 0.1161ms 50.2173μs 19.9135 KOps/s 20.8161 KOps/s $\color{#d91a1a}-4.34\%$
test_compile_add_one_flat[tensorclass-eager] 0.2473ms 69.1762μs 14.4558 KOps/s 14.7806 KOps/s $\color{#d91a1a}-2.20\%$
test_compile_add_one_flat[pytree-compile] 0.2260ms 0.1071ms 9.3401 KOps/s 9.7552 KOps/s $\color{#d91a1a}-4.26\%$
test_compile_add_one_flat[pytree-eager] 0.4173ms 0.2096ms 4.7705 KOps/s 4.6853 KOps/s $\color{#35bf28}+1.82\%$
test_compile_add_self_flat[tensordict-eager] 0.3394ms 0.2378ms 4.2051 KOps/s 4.2132 KOps/s $\color{#d91a1a}-0.19\%$
test_compile_add_self_flat[tensordict-compile] 0.2087ms 0.1149ms 8.7046 KOps/s 8.9404 KOps/s $\color{#d91a1a}-2.64\%$
test_compile_add_self_flat[tensorclass-eager] 0.1460ms 65.1199μs 15.3563 KOps/s 15.9984 KOps/s $\color{#d91a1a}-4.01\%$
test_compile_add_self_flat[tensorclass-compile] 0.1040ms 50.2513μs 19.9000 KOps/s 20.1383 KOps/s $\color{#d91a1a}-1.18\%$
test_compile_add_self_flat[pytree-eager] 0.2590ms 0.1624ms 6.1563 KOps/s 6.0630 KOps/s $\color{#35bf28}+1.54\%$
test_compile_add_self_flat[pytree-compile] 0.1762ms 0.1071ms 9.3377 KOps/s 9.5226 KOps/s $\color{#d91a1a}-1.94\%$
test_compile_copy_flat[tensordict-compile] 0.1019ms 22.6322μs 44.1848 KOps/s 44.5568 KOps/s $\color{#d91a1a}-0.83\%$
test_compile_copy_flat[tensordict-eager] 0.1392ms 69.3440μs 14.4209 KOps/s 14.7914 KOps/s $\color{#d91a1a}-2.51\%$
test_compile_copy_flat[pytree-compile] 0.1640ms 82.8661μs 12.0677 KOps/s 11.9368 KOps/s $\color{#35bf28}+1.10\%$
test_compile_copy_flat[pytree-eager] 0.1324ms 68.6213μs 14.5727 KOps/s 14.3533 KOps/s $\color{#35bf28}+1.53\%$
test_compile_assign_and_add[tensordict-compile] 0.3493ms 0.2205ms 4.5353 KOps/s 4.6536 KOps/s $\color{#d91a1a}-2.54\%$
test_compile_assign_and_add[tensordict-eager] 1.7476ms 1.4123ms 708.0547 Ops/s 728.8034 Ops/s $\color{#d91a1a}-2.85\%$
test_compile_assign_and_add[pytree-compile] 0.3076ms 0.2150ms 4.6521 KOps/s 4.8367 KOps/s $\color{#d91a1a}-3.82\%$
test_compile_assign_and_add[pytree-eager] 1.1077ms 0.8444ms 1.1843 KOps/s 1.2023 KOps/s $\color{#d91a1a}-1.50\%$
test_compile_assign_and_add_stack[compile] 0.6218ms 0.4640ms 2.1553 KOps/s 2.2391 KOps/s $\color{#d91a1a}-3.74\%$
test_compile_assign_and_add_stack[eager] 4.7263ms 2.8039ms 356.6493 Ops/s 372.9809 Ops/s $\color{#d91a1a}-4.38\%$
test_compile_indexing[tensor-tensordict-compile] 0.1488ms 40.0100μs 24.9938 KOps/s 25.9530 KOps/s $\color{#d91a1a}-3.70\%$
test_compile_indexing[tensor-tensordict-eager] 1.0200ms 33.3702μs 29.9669 KOps/s 30.6405 KOps/s $\color{#d91a1a}-2.20\%$
test_compile_indexing[tensor-tensorclass-compile] 97.0720μs 31.6563μs 31.5893 KOps/s 32.2200 KOps/s $\color{#d91a1a}-1.96\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1084ms 23.5255μs 42.5071 KOps/s 43.3916 KOps/s $\color{#d91a1a}-2.04\%$
test_compile_indexing[tensor-pytree-compile] 0.1130ms 33.4641μs 29.8827 KOps/s 30.5832 KOps/s $\color{#d91a1a}-2.29\%$
test_compile_indexing[tensor-pytree-eager] 63.4890μs 22.7677μs 43.9218 KOps/s 43.1932 KOps/s $\color{#35bf28}+1.69\%$
test_compile_indexing[slice-tensordict-compile] 0.1162ms 54.2298μs 18.4401 KOps/s 18.8890 KOps/s $\color{#d91a1a}-2.38\%$
test_compile_indexing[slice-tensordict-eager] 0.4004ms 20.5951μs 48.5553 KOps/s 48.8054 KOps/s $\color{#d91a1a}-0.51\%$
test_compile_indexing[slice-tensorclass-compile] 0.1062ms 47.3615μs 21.1142 KOps/s 21.6377 KOps/s $\color{#d91a1a}-2.42\%$
test_compile_indexing[slice-tensorclass-eager] 81.4840μs 18.5614μs 53.8753 KOps/s 53.6727 KOps/s $\color{#35bf28}+0.38\%$
test_compile_indexing[slice-pytree-compile] 0.1439ms 47.5095μs 21.0484 KOps/s 21.3784 KOps/s $\color{#d91a1a}-1.54\%$
test_compile_indexing[slice-pytree-eager] 87.1930μs 18.9727μs 52.7074 KOps/s 50.6956 KOps/s $\color{#35bf28}+3.97\%$
test_compile_indexing[int-tensordict-compile] 0.1394ms 56.3316μs 17.7520 KOps/s 18.3793 KOps/s $\color{#d91a1a}-3.41\%$
test_compile_indexing[int-tensordict-eager] 1.1436ms 19.4814μs 51.3311 KOps/s 49.7575 KOps/s $\color{#35bf28}+3.16\%$
test_compile_indexing[int-tensorclass-compile] 0.1335ms 48.2112μs 20.7421 KOps/s 21.0795 KOps/s $\color{#d91a1a}-1.60\%$
test_compile_indexing[int-tensorclass-eager] 64.6720μs 18.5929μs 53.7838 KOps/s 53.1513 KOps/s $\color{#35bf28}+1.19\%$
test_compile_indexing[int-pytree-compile] 0.1126ms 46.9638μs 21.2930 KOps/s 21.2631 KOps/s $\color{#35bf28}+0.14\%$
test_compile_indexing[int-pytree-eager] 89.6080μs 19.1920μs 52.1050 KOps/s 53.6402 KOps/s $\color{#d91a1a}-2.86\%$
test_mod_add[eager] 99.6370μs 37.2548μs 26.8421 KOps/s 27.6797 KOps/s $\color{#d91a1a}-3.03\%$
test_mod_add[compile] 0.1486ms 66.8963μs 14.9485 KOps/s 15.6244 KOps/s $\color{#d91a1a}-4.33\%$
test_mod_add[compile-overhead] 0.1537ms 64.2478μs 15.5647 KOps/s 15.5918 KOps/s $\color{#d91a1a}-0.17\%$
test_mod_wrap[eager] 0.3455ms 0.2211ms 4.5222 KOps/s 4.4092 KOps/s $\color{#35bf28}+2.56\%$
test_mod_wrap[compile] 2.1238ms 0.2264ms 4.4177 KOps/s 4.3001 KOps/s $\color{#35bf28}+2.73\%$
test_mod_wrap[compile-overhead] 0.3817ms 0.2303ms 4.3420 KOps/s 4.2971 KOps/s $\color{#35bf28}+1.05\%$
test_mod_wrap_and_backward[eager] 18.9063ms 14.1339ms 70.7521 Ops/s 78.7150 Ops/s $\textbf{\color{#d91a1a}-10.12\%}$
test_mod_wrap_and_backward[compile] 16.4171ms 12.6860ms 78.8271 Ops/s 71.6513 Ops/s $\textbf{\color{#35bf28}+10.01\%}$
test_mod_wrap_and_backward[compile-overhead] 16.2723ms 12.6032ms 79.3452 Ops/s 78.1036 Ops/s $\color{#35bf28}+1.59\%$
test_seq_add[eager] 0.2415ms 0.1179ms 8.4827 KOps/s 8.0832 KOps/s $\color{#35bf28}+4.94\%$
test_seq_add[compile] 0.1346ms 76.9590μs 12.9939 KOps/s 12.4177 KOps/s $\color{#35bf28}+4.64\%$
test_seq_add[compile-overhead] 0.1819ms 75.5170μs 13.2420 KOps/s 12.9095 KOps/s $\color{#35bf28}+2.58\%$
test_seq_wrap[eager] 0.7683ms 0.4591ms 2.1784 KOps/s 2.0808 KOps/s $\color{#35bf28}+4.69\%$
test_seq_wrap[compile] 0.4380ms 0.2395ms 4.1751 KOps/s 3.9246 KOps/s $\textbf{\color{#35bf28}+6.38\%}$
test_seq_wrap[compile-overhead] 0.4598ms 0.2401ms 4.1654 KOps/s 3.7452 KOps/s $\textbf{\color{#35bf28}+11.22\%}$
test_func_call_runtime[False-eager] 0.6861ms 0.5443ms 1.8371 KOps/s 1.7735 KOps/s $\color{#35bf28}+3.58\%$
test_func_call_runtime[False-compile] 0.6100ms 0.4515ms 2.2148 KOps/s 2.1605 KOps/s $\color{#35bf28}+2.51\%$
test_func_call_runtime[False-compile-overhead] 0.6515ms 0.4469ms 2.2378 KOps/s 2.1823 KOps/s $\color{#35bf28}+2.54\%$
test_func_call_runtime[True-eager] 1.0973ms 0.7599ms 1.3159 KOps/s 1.2804 KOps/s $\color{#35bf28}+2.77\%$
test_func_call_runtime[True-compile] 0.6659ms 0.4756ms 2.1025 KOps/s 2.0906 KOps/s $\color{#35bf28}+0.57\%$
test_func_call_runtime[True-compile-overhead] 0.7267ms 0.4780ms 2.0920 KOps/s 2.0125 KOps/s $\color{#35bf28}+3.95\%$
test_func_call_cm_runtime[False-eager] 0.9398ms 0.5504ms 1.8167 KOps/s 1.8068 KOps/s $\color{#35bf28}+0.55\%$
test_func_call_cm_runtime[False-compile] 0.6174ms 0.4493ms 2.2255 KOps/s 2.1720 KOps/s $\color{#35bf28}+2.46\%$
test_func_call_cm_runtime[False-compile-overhead] 0.7333ms 0.4534ms 2.2056 KOps/s 2.1541 KOps/s $\color{#35bf28}+2.39\%$
test_func_call_cm_runtime[True-eager] 1.4935ms 0.9111ms 1.0976 KOps/s 1.0734 KOps/s $\color{#35bf28}+2.26\%$
test_func_call_cm_runtime[True-compile] 1.2639ms 0.7961ms 1.2561 KOps/s 1.2169 KOps/s $\color{#35bf28}+3.22\%$
test_func_call_cm_runtime[True-compile-overhead] 1.4830ms 0.8148ms 1.2273 KOps/s 1.2284 KOps/s $\color{#d91a1a}-0.09\%$
test_vmap_func_call_cm_runtime[eager] 2.6346ms 1.9694ms 507.7778 Ops/s 505.5961 Ops/s $\color{#35bf28}+0.43\%$
test_vmap_func_call_cm_runtime[compile] 1.0489ms 0.5518ms 1.8122 KOps/s 1.7577 KOps/s $\color{#35bf28}+3.10\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.9299ms 0.5535ms 1.8068 KOps/s 1.7677 KOps/s $\color{#35bf28}+2.21\%$
test_distributed 0.8577ms 0.1248ms 8.0100 KOps/s 7.6582 KOps/s $\color{#35bf28}+4.59\%$
test_tdmodule 52.0670μs 28.1916μs 35.4716 KOps/s 33.5914 KOps/s $\textbf{\color{#35bf28}+5.60\%}$
test_tdmodule_dispatch 93.8670μs 51.3967μs 19.4565 KOps/s 19.4296 KOps/s $\color{#35bf28}+0.14\%$
test_tdseq 83.2380μs 30.8061μs 32.4611 KOps/s 33.8466 KOps/s $\color{#d91a1a}-4.09\%$
test_tdseq_dispatch 83.8280μs 57.8281μs 17.2926 KOps/s 18.1625 KOps/s $\color{#d91a1a}-4.79\%$
test_instantiation_functorch 2.4054ms 1.5589ms 641.4612 Ops/s 646.2565 Ops/s $\color{#d91a1a}-0.74\%$
test_exec_functorch 0.3126ms 0.1796ms 5.5677 KOps/s 5.6112 KOps/s $\color{#d91a1a}-0.77\%$
test_exec_functional_call 0.3200ms 0.1721ms 5.8092 KOps/s 5.6824 KOps/s $\color{#35bf28}+2.23\%$
test_exec_td_decorator 0.5982ms 0.2310ms 4.3299 KOps/s 4.1077 KOps/s $\textbf{\color{#35bf28}+5.41\%}$
test_vmap_mlp_speed_decorator[True-True] 0.9894ms 0.6748ms 1.4819 KOps/s 1.4763 KOps/s $\color{#35bf28}+0.38\%$
test_vmap_mlp_speed_decorator[True-False] 0.9048ms 0.6701ms 1.4922 KOps/s 1.4866 KOps/s $\color{#35bf28}+0.38\%$
test_vmap_mlp_speed_decorator[False-True] 0.9416ms 0.5357ms 1.8668 KOps/s 1.8350 KOps/s $\color{#35bf28}+1.73\%$
test_vmap_mlp_speed_decorator[False-False] 0.8870ms 0.5328ms 1.8770 KOps/s 1.8370 KOps/s $\color{#35bf28}+2.18\%$
test_to_module_speed[True] 1.8245ms 1.3735ms 728.0706 Ops/s 739.2391 Ops/s $\color{#d91a1a}-1.51\%$
test_to_module_speed[False] 2.4649ms 1.3763ms 726.5743 Ops/s 746.1946 Ops/s $\color{#d91a1a}-2.63\%$
test_tc_init 81.2630μs 48.5851μs 20.5824 KOps/s 21.3893 KOps/s $\color{#d91a1a}-3.77\%$
test_tc_init_nested 0.1979ms 97.6803μs 10.2375 KOps/s 10.6106 KOps/s $\color{#d91a1a}-3.52\%$
test_tc_first_layer_tensor 15.2390μs 1.5888μs 629.3875 KOps/s 655.1568 KOps/s $\color{#d91a1a}-3.93\%$
test_tc_first_layer_nontensor 27.7720μs 4.8879μs 204.5880 KOps/s 207.0078 KOps/s $\color{#d91a1a}-1.17\%$
test_tc_second_layer_tensor 22.5830μs 2.9323μs 341.0338 KOps/s 340.0552 KOps/s $\color{#35bf28}+0.29\%$
test_tc_second_layer_nontensor 53.1300μs 6.2203μs 160.7627 KOps/s 164.2973 KOps/s $\color{#d91a1a}-2.15\%$
test_unbind 0.2934s 16.0596ms 62.2680 Ops/s 57.6449 Ops/s $\textbf{\color{#35bf28}+8.02\%}$
test_full_like 15.5259ms 11.7458ms 85.1368 Ops/s 71.4591 Ops/s $\textbf{\color{#35bf28}+19.14\%}$
test_zeros_like 6.2247ms 4.5161ms 221.4286 Ops/s 188.0807 Ops/s $\textbf{\color{#35bf28}+17.73\%}$
test_ones_like 5.9427ms 4.3910ms 227.7361 Ops/s 135.6839 Ops/s $\textbf{\color{#35bf28}+67.84\%}$
test_clone 11.5144ms 8.5893ms 116.4241 Ops/s 81.9101 Ops/s $\textbf{\color{#35bf28}+42.14\%}$
test_squeeze 65.6130μs 13.0823μs 76.4392 KOps/s 81.2915 KOps/s $\textbf{\color{#d91a1a}-5.97\%}$
test_unsqueeze 0.3335ms 99.0908μs 10.0918 KOps/s 10.4715 KOps/s $\color{#d91a1a}-3.63\%$
test_split 0.3534ms 0.1971ms 5.0733 KOps/s 5.0658 KOps/s $\color{#35bf28}+0.15\%$
test_permute 0.4394ms 0.2134ms 4.6850 KOps/s 4.9171 KOps/s $\color{#d91a1a}-4.72\%$
test_stack 47.7760ms 36.0412ms 27.7460 Ops/s 27.0646 Ops/s $\color{#35bf28}+2.52\%$
test_cat 42.7895ms 34.6013ms 28.9007 Ops/s 27.6392 Ops/s $\color{#35bf28}+4.56\%$

vmoens added a commit that referenced this pull request Feb 26, 2025
ghstack-source-id: 0bfbfc9f6700f1165fcfd6b38f65fa4fd806be80
Pull Request resolved: #1231

(cherry picked from commit d25bd54)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants