Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Ensure that maybe_dense_stack preserves the TC type #1252

Merged
merged 1 commit into from
Mar 5, 2025

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 5, 2025

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 5, 2025
ghstack-source-id: 8972977b8317ad78d98ad20d6ee7ecf0337d0bd4
Pull Request resolved: #1252
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 5, 2025
Copy link

github-actions bot commented Mar 5, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 217. Improved: $\large\color{#35bf28}16$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 45.8460μs 20.5343μs 48.6989 KOps/s 48.2704 KOps/s $\color{#35bf28}+0.89\%$
test_plain_set_stack_nested 50.2430μs 21.0069μs 47.6034 KOps/s 47.6093 KOps/s $\color{#d91a1a}-0.01\%$
test_plain_set_nested_inplace 78.9780μs 22.4274μs 44.5883 KOps/s 43.8913 KOps/s $\color{#35bf28}+1.59\%$
test_plain_set_stack_nested_inplace 62.5970μs 22.4277μs 44.5878 KOps/s 43.7942 KOps/s $\color{#35bf28}+1.81\%$
test_items 41.3770μs 4.1724μs 239.6709 KOps/s 241.3615 KOps/s $\color{#d91a1a}-0.70\%$
test_items_nested 0.5139ms 0.4048ms 2.4707 KOps/s 2.4616 KOps/s $\color{#35bf28}+0.37\%$
test_items_nested_locked 0.8296ms 0.4052ms 2.4681 KOps/s 2.4690 KOps/s $\color{#d91a1a}-0.04\%$
test_items_nested_leaf 0.1413ms 78.0295μs 12.8157 KOps/s 13.0755 KOps/s $\color{#d91a1a}-1.99\%$
test_items_stack_nested 0.5301ms 0.4071ms 2.4566 KOps/s 2.4517 KOps/s $\color{#35bf28}+0.20\%$
test_items_stack_nested_leaf 0.1349ms 77.7082μs 12.8687 KOps/s 12.8462 KOps/s $\color{#35bf28}+0.17\%$
test_items_stack_nested_locked 0.6700ms 0.4074ms 2.4546 KOps/s 2.4563 KOps/s $\color{#d91a1a}-0.07\%$
test_keys 40.2650μs 3.4793μs 287.4116 KOps/s 289.4778 KOps/s $\color{#d91a1a}-0.71\%$
test_keys_nested 0.2400ms 0.1630ms 6.1332 KOps/s 5.9903 KOps/s $\color{#35bf28}+2.39\%$
test_keys_nested_locked 1.8149ms 0.1698ms 5.8882 KOps/s 5.8352 KOps/s $\color{#35bf28}+0.91\%$
test_keys_nested_leaf 0.2336ms 0.1434ms 6.9759 KOps/s 6.9604 KOps/s $\color{#35bf28}+0.22\%$
test_keys_stack_nested 0.2517ms 0.1633ms 6.1223 KOps/s 6.0858 KOps/s $\color{#35bf28}+0.60\%$
test_keys_stack_nested_leaf 0.2368ms 0.1438ms 6.9565 KOps/s 6.9528 KOps/s $\color{#35bf28}+0.05\%$
test_keys_stack_nested_locked 0.2676ms 0.1702ms 5.8756 KOps/s 5.8315 KOps/s $\color{#35bf28}+0.76\%$
test_values 8.6740μs 1.0298μs 971.0837 KOps/s 959.0679 KOps/s $\color{#35bf28}+1.25\%$
test_values_nested 0.1166ms 62.1209μs 16.0977 KOps/s 15.9915 KOps/s $\color{#35bf28}+0.66\%$
test_values_nested_locked 0.1386ms 61.9189μs 16.1502 KOps/s 16.0599 KOps/s $\color{#35bf28}+0.56\%$
test_values_nested_leaf 0.1318ms 71.4034μs 14.0049 KOps/s 13.9865 KOps/s $\color{#35bf28}+0.13\%$
test_values_stack_nested 0.1128ms 61.8194μs 16.1761 KOps/s 15.9256 KOps/s $\color{#35bf28}+1.57\%$
test_values_stack_nested_leaf 0.1382ms 71.3074μs 14.0238 KOps/s 13.9496 KOps/s $\color{#35bf28}+0.53\%$
test_values_stack_nested_locked 0.1138ms 61.6265μs 16.2268 KOps/s 15.9233 KOps/s $\color{#35bf28}+1.91\%$
test_membership 25.4280μs 0.8841μs 1.1311 MOps/s 1.1296 MOps/s $\color{#35bf28}+0.13\%$
test_membership_nested 32.8810μs 2.8589μs 349.7904 KOps/s 342.3871 KOps/s $\color{#35bf28}+2.16\%$
test_membership_nested_leaf 52.8480μs 2.8693μs 348.5145 KOps/s 308.8320 KOps/s $\textbf{\color{#35bf28}+12.85\%}$
test_membership_stacked_nested 60.8240μs 2.8617μs 349.4443 KOps/s 341.5434 KOps/s $\color{#35bf28}+2.31\%$
test_membership_stacked_nested_leaf 43.5610μs 2.8451μs 351.4862 KOps/s 338.6159 KOps/s $\color{#35bf28}+3.80\%$
test_membership_nested_last 33.5030μs 4.2761μs 233.8567 KOps/s 226.9099 KOps/s $\color{#35bf28}+3.06\%$
test_membership_nested_leaf_last 49.1420μs 4.2883μs 233.1920 KOps/s 224.4860 KOps/s $\color{#35bf28}+3.88\%$
test_membership_stacked_nested_last 36.6980μs 4.2661μs 234.4047 KOps/s 230.9008 KOps/s $\color{#35bf28}+1.52\%$
test_membership_stacked_nested_leaf_last 44.2320μs 4.2473μs 235.4422 KOps/s 227.4071 KOps/s $\color{#35bf28}+3.53\%$
test_nested_getleaf 55.8950μs 10.4623μs 95.5816 KOps/s 93.6017 KOps/s $\color{#35bf28}+2.12\%$
test_nested_get 43.0710μs 9.9210μs 100.7967 KOps/s 99.3900 KOps/s $\color{#35bf28}+1.42\%$
test_stacked_getleaf 49.8130μs 10.3989μs 96.1639 KOps/s 94.9018 KOps/s $\color{#35bf28}+1.33\%$
test_stacked_get 54.6330μs 9.8608μs 101.4115 KOps/s 100.6053 KOps/s $\color{#35bf28}+0.80\%$
test_nested_getitemleaf 38.9420μs 11.2656μs 88.7655 KOps/s 88.5534 KOps/s $\color{#35bf28}+0.24\%$
test_nested_getitem 58.4600μs 10.5788μs 94.5285 KOps/s 94.9406 KOps/s $\color{#d91a1a}-0.43\%$
test_stacked_getitemleaf 64.2640μs 11.1524μs 89.6669 KOps/s 89.3638 KOps/s $\color{#35bf28}+0.34\%$
test_stacked_getitem 43.7020μs 10.5105μs 95.1430 KOps/s 94.5929 KOps/s $\color{#35bf28}+0.58\%$
test_lock_nested 0.8050ms 0.4080ms 2.4510 KOps/s 2.4394 KOps/s $\color{#35bf28}+0.48\%$
test_lock_stack_nested 0.7593ms 0.4186ms 2.3888 KOps/s 2.3519 KOps/s $\color{#35bf28}+1.57\%$
test_unlock_nested 0.6146ms 0.3298ms 3.0324 KOps/s 2.9991 KOps/s $\color{#35bf28}+1.11\%$
test_unlock_stack_nested 0.5278ms 0.3345ms 2.9896 KOps/s 2.9024 KOps/s $\color{#35bf28}+3.00\%$
test_flatten_speed 0.1879ms 99.3385μs 10.0666 KOps/s 9.9256 KOps/s $\color{#35bf28}+1.42\%$
test_unflatten_speed 0.8975ms 0.5253ms 1.9037 KOps/s 1.9016 KOps/s $\color{#35bf28}+0.11\%$
test_common_ops 1.0097ms 0.8011ms 1.2483 KOps/s 1.1920 KOps/s $\color{#35bf28}+4.73\%$
test_creation 31.3080μs 2.4938μs 400.9878 KOps/s 400.5175 KOps/s $\color{#35bf28}+0.12\%$
test_creation_empty 42.9900μs 12.0755μs 82.8120 KOps/s 85.6394 KOps/s $\color{#d91a1a}-3.30\%$
test_creation_nested_1 49.1620μs 15.1332μs 66.0797 KOps/s 68.1188 KOps/s $\color{#d91a1a}-2.99\%$
test_creation_nested_2 56.0350μs 19.5102μs 51.2551 KOps/s 51.8566 KOps/s $\color{#d91a1a}-1.16\%$
test_clone 0.1243ms 13.5803μs 73.6362 KOps/s 73.2067 KOps/s $\color{#35bf28}+0.59\%$
test_getitem[int] 0.8757ms 12.6203μs 79.2377 KOps/s 76.2132 KOps/s $\color{#35bf28}+3.97\%$
test_getitem[slice_int] 0.1394ms 23.8888μs 41.8606 KOps/s 40.8803 KOps/s $\color{#35bf28}+2.40\%$
test_getitem[range] 0.1814ms 50.7030μs 19.7227 KOps/s 19.7650 KOps/s $\color{#d91a1a}-0.21\%$
test_getitem[tuple] 0.1350ms 19.9083μs 50.2303 KOps/s 48.2315 KOps/s $\color{#35bf28}+4.14\%$
test_getitem[list] 0.1595ms 44.4787μs 22.4827 KOps/s 22.3343 KOps/s $\color{#35bf28}+0.66\%$
test_setitem_dim[int] 57.5770μs 25.3505μs 39.4470 KOps/s 38.4305 KOps/s $\color{#35bf28}+2.64\%$
test_setitem_dim[slice_int] 82.0940μs 50.3198μs 19.8729 KOps/s 19.6559 KOps/s $\color{#35bf28}+1.10\%$
test_setitem_dim[range] 0.1912ms 75.7033μs 13.2095 KOps/s 12.9549 KOps/s $\color{#35bf28}+1.97\%$
test_setitem_dim[tuple] 79.9290μs 39.4773μs 25.3310 KOps/s 24.4158 KOps/s $\color{#35bf28}+3.75\%$
test_setitem 89.3280μs 20.6375μs 48.4554 KOps/s 47.8647 KOps/s $\color{#35bf28}+1.23\%$
test_set 90.5990μs 20.1427μs 49.6458 KOps/s 48.8154 KOps/s $\color{#35bf28}+1.70\%$
test_set_shared 4.7103ms 0.1796ms 5.5680 KOps/s 5.4250 KOps/s $\color{#35bf28}+2.64\%$
test_update 0.1322ms 26.2676μs 38.0697 KOps/s 37.7160 KOps/s $\color{#35bf28}+0.94\%$
test_update_nested 0.1113ms 41.9802μs 23.8208 KOps/s 23.9570 KOps/s $\color{#d91a1a}-0.57\%$
test_update__nested 0.4784ms 33.6744μs 29.6962 KOps/s 29.9762 KOps/s $\color{#d91a1a}-0.93\%$
test_set_nested 62.9780μs 22.2795μs 44.8844 KOps/s 44.9100 KOps/s $\color{#d91a1a}-0.06\%$
test_set_nested_new 0.1062ms 26.9145μs 37.1547 KOps/s 37.2439 KOps/s $\color{#d91a1a}-0.24\%$
test_select 0.1315ms 42.8051μs 23.3617 KOps/s 23.2447 KOps/s $\color{#35bf28}+0.50\%$
test_select_nested 0.1356ms 63.2267μs 15.8161 KOps/s 16.0212 KOps/s $\color{#d91a1a}-1.28\%$
test_exclude_nested 0.1683ms 79.7687μs 12.5362 KOps/s 12.4925 KOps/s $\color{#35bf28}+0.35\%$
test_empty[True] 0.6862ms 0.4029ms 2.4820 KOps/s 2.4647 KOps/s $\color{#35bf28}+0.70\%$
test_empty[False] 9.0517μs 1.3800μs 724.6611 KOps/s 721.6739 KOps/s $\color{#35bf28}+0.41\%$
test_unbind_speed 0.6317ms 0.2666ms 3.7503 KOps/s 3.6592 KOps/s $\color{#35bf28}+2.49\%$
test_unbind_speed_stack0 0.4225ms 0.2636ms 3.7934 KOps/s 3.7072 KOps/s $\color{#35bf28}+2.33\%$
test_unbind_speed_stack1 1.1648ms 0.6637ms 1.5067 KOps/s 1.2034 KOps/s $\textbf{\color{#35bf28}+25.21\%}$
test_split 0.1214s 1.7516ms 570.9209 Ops/s 624.8068 Ops/s $\textbf{\color{#d91a1a}-8.62\%}$
test_chunk 0.1265s 1.7738ms 563.7726 Ops/s 506.8700 Ops/s $\textbf{\color{#35bf28}+11.23\%}$
test_consolidate_njt[False-None] 8.3700ms 8.0847ms 123.6899 Ops/s 123.2551 Ops/s $\color{#35bf28}+0.35\%$
test_creation[device0] 0.2684ms 90.7301μs 11.0217 KOps/s 10.7881 KOps/s $\color{#35bf28}+2.17\%$
test_creation_from_tensor 3.7059ms 95.7946μs 10.4390 KOps/s 10.4846 KOps/s $\color{#d91a1a}-0.44\%$
test_add_one[memmap_tensor0] 66.1930μs 4.9026μs 203.9732 KOps/s 187.1354 KOps/s $\textbf{\color{#35bf28}+9.00\%}$
test_contiguous[memmap_tensor0] 19.8680μs 0.4998μs 2.0008 MOps/s 1.9514 MOps/s $\color{#35bf28}+2.53\%$
test_stack[memmap_tensor0] 23.5940μs 3.3907μs 294.9214 KOps/s 275.7944 KOps/s $\textbf{\color{#35bf28}+6.94\%}$
test_memmaptd_index 0.3851ms 0.2318ms 4.3145 KOps/s 4.2966 KOps/s $\color{#35bf28}+0.42\%$
test_memmaptd_index_astensor 1.0557ms 0.3143ms 3.1814 KOps/s 3.0957 KOps/s $\color{#35bf28}+2.77\%$
test_memmaptd_index_op 0.8889ms 0.5764ms 1.7348 KOps/s 1.6700 KOps/s $\color{#35bf28}+3.88\%$
test_serialize_model 0.2316s 0.1312s 7.6208 Ops/s 8.3145 Ops/s $\textbf{\color{#d91a1a}-8.34\%}$
test_serialize_model_pickle 0.4631s 0.4028s 2.4828 Ops/s 2.5061 Ops/s $\color{#d91a1a}-0.93\%$
test_serialize_weights 0.1189s 0.1126s 8.8832 Ops/s 8.4147 Ops/s $\textbf{\color{#35bf28}+5.57\%}$
test_serialize_weights_returnearly 0.1840s 0.1671s 5.9834 Ops/s 5.4287 Ops/s $\textbf{\color{#35bf28}+10.22\%}$
test_serialize_weights_pickle 0.6337s 0.4582s 2.1823 Ops/s 2.4905 Ops/s $\textbf{\color{#d91a1a}-12.38\%}$
test_serialize_weights_filesystem 0.2480s 0.1613s 6.1978 Ops/s 7.0191 Ops/s $\textbf{\color{#d91a1a}-11.70\%}$
test_serialize_model_filesystem 0.1509s 0.1437s 6.9569 Ops/s 6.4591 Ops/s $\textbf{\color{#35bf28}+7.71\%}$
test_reshape_pytree 87.9340μs 26.7314μs 37.4092 KOps/s 38.2511 KOps/s $\color{#d91a1a}-2.20\%$
test_reshape_td 68.9490μs 32.1970μs 31.0588 KOps/s 29.5843 KOps/s $\color{#35bf28}+4.98\%$
test_view_pytree 64.4710μs 26.1442μs 38.2494 KOps/s 38.4444 KOps/s $\color{#d91a1a}-0.51\%$
test_view_td 0.1139ms 39.0612μs 25.6009 KOps/s 24.9443 KOps/s $\color{#35bf28}+2.63\%$
test_unbind_pytree 71.8340μs 29.2620μs 34.1741 KOps/s 33.1153 KOps/s $\color{#35bf28}+3.20\%$
test_unbind_td 0.3621ms 39.4472μs 25.3504 KOps/s 23.7322 KOps/s $\textbf{\color{#35bf28}+6.82\%}$
test_split_pytree 78.2960μs 29.0428μs 34.4319 KOps/s 34.0600 KOps/s $\color{#35bf28}+1.09\%$
test_split_td 0.5744ms 44.6343μs 22.4043 KOps/s 22.0202 KOps/s $\color{#35bf28}+1.74\%$
test_add_pytree 77.6560μs 35.9047μs 27.8515 KOps/s 28.0257 KOps/s $\color{#d91a1a}-0.62\%$
test_add_td 0.1298ms 55.0977μs 18.1496 KOps/s 17.7051 KOps/s $\color{#35bf28}+2.51\%$
test_compile_add_one_nested[tensordict-compile] 0.1461ms 67.1806μs 14.8853 KOps/s 14.8880 KOps/s $\color{#d91a1a}-0.02\%$
test_compile_add_one_nested[tensordict-eager] 0.3883ms 0.1727ms 5.7900 KOps/s 5.7691 KOps/s $\color{#35bf28}+0.36\%$
test_compile_add_one_nested[pytree-compile] 0.1219ms 45.6433μs 21.9090 KOps/s 21.2321 KOps/s $\color{#35bf28}+3.19\%$
test_compile_add_one_nested[pytree-eager] 0.2280ms 0.1183ms 8.4528 KOps/s 8.3357 KOps/s $\color{#35bf28}+1.40\%$
test_compile_copy_nested[tensordict-compile] 76.3330μs 28.0566μs 35.6422 KOps/s 35.1040 KOps/s $\color{#35bf28}+1.53\%$
test_compile_copy_nested[tensordict-eager] 0.1238ms 58.5333μs 17.0843 KOps/s 17.2999 KOps/s $\color{#d91a1a}-1.25\%$
test_compile_copy_nested[pytree-compile] 0.1365ms 79.2949μs 12.6112 KOps/s 12.6864 KOps/s $\color{#d91a1a}-0.59\%$
test_compile_copy_nested[pytree-eager] 0.1619ms 66.3406μs 15.0737 KOps/s 15.0863 KOps/s $\color{#d91a1a}-0.08\%$
test_compile_add_one_flat[tensordict-compile] 0.1835ms 0.1066ms 9.3804 KOps/s 9.3509 KOps/s $\color{#35bf28}+0.32\%$
test_compile_add_one_flat[tensordict-eager] 0.3368ms 0.2132ms 4.6902 KOps/s 4.6132 KOps/s $\color{#35bf28}+1.67\%$
test_compile_add_one_flat[tensorclass-compile] 0.1052ms 48.3753μs 20.6717 KOps/s 21.5101 KOps/s $\color{#d91a1a}-3.90\%$
test_compile_add_one_flat[tensorclass-eager] 0.2835ms 67.0903μs 14.9053 KOps/s 14.6719 KOps/s $\color{#35bf28}+1.59\%$
test_compile_add_one_flat[pytree-compile] 0.2494ms 0.1025ms 9.7585 KOps/s 9.9398 KOps/s $\color{#d91a1a}-1.82\%$
test_compile_add_one_flat[pytree-eager] 0.3058ms 0.1986ms 5.0351 KOps/s 4.9072 KOps/s $\color{#35bf28}+2.61\%$
test_compile_add_self_flat[tensordict-eager] 0.4587ms 0.2301ms 4.3452 KOps/s 4.2983 KOps/s $\color{#35bf28}+1.09\%$
test_compile_add_self_flat[tensordict-compile] 0.2048ms 0.1062ms 9.4191 KOps/s 9.2915 KOps/s $\color{#35bf28}+1.37\%$
test_compile_add_self_flat[tensorclass-eager] 0.2653ms 62.7736μs 15.9303 KOps/s 16.1495 KOps/s $\color{#d91a1a}-1.36\%$
test_compile_add_self_flat[tensorclass-compile] 0.1444ms 48.8634μs 20.4652 KOps/s 20.5793 KOps/s $\color{#d91a1a}-0.55\%$
test_compile_add_self_flat[pytree-eager] 0.3571ms 0.1553ms 6.4389 KOps/s 6.2679 KOps/s $\color{#35bf28}+2.73\%$
test_compile_add_self_flat[pytree-compile] 0.1820ms 0.1007ms 9.9306 KOps/s 9.8392 KOps/s $\color{#35bf28}+0.93\%$
test_compile_copy_flat[tensordict-compile] 62.9970μs 20.9993μs 47.6206 KOps/s 47.5651 KOps/s $\color{#35bf28}+0.12\%$
test_compile_copy_flat[tensordict-eager] 0.1358ms 67.1097μs 14.9010 KOps/s 15.2291 KOps/s $\color{#d91a1a}-2.15\%$
test_compile_copy_flat[pytree-compile] 0.1720ms 80.5850μs 12.4093 KOps/s 12.3257 KOps/s $\color{#35bf28}+0.68\%$
test_compile_copy_flat[pytree-eager] 0.1407ms 66.4216μs 15.0554 KOps/s 14.9912 KOps/s $\color{#35bf28}+0.43\%$
test_compile_assign_and_add[tensordict-compile] 0.4101ms 0.2133ms 4.6884 KOps/s 4.6166 KOps/s $\color{#35bf28}+1.55\%$
test_compile_assign_and_add[tensordict-eager] 2.1546ms 1.3521ms 739.6130 Ops/s 710.4124 Ops/s $\color{#35bf28}+4.11\%$
test_compile_assign_and_add[pytree-compile] 0.2938ms 0.2083ms 4.8001 KOps/s 4.6853 KOps/s $\color{#35bf28}+2.45\%$
test_compile_assign_and_add[pytree-eager] 1.0205ms 0.8150ms 1.2270 KOps/s 1.1682 KOps/s $\textbf{\color{#35bf28}+5.03\%}$
test_compile_assign_and_add_stack[compile] 0.6225ms 0.4508ms 2.2185 KOps/s 2.1953 KOps/s $\color{#35bf28}+1.06\%$
test_compile_assign_and_add_stack[eager] 3.3132ms 2.7089ms 369.1542 Ops/s 365.0904 Ops/s $\color{#35bf28}+1.11\%$
test_compile_indexing[tensor-tensordict-compile] 0.7140ms 38.7120μs 25.8318 KOps/s 25.4421 KOps/s $\color{#35bf28}+1.53\%$
test_compile_indexing[tensor-tensordict-eager] 0.5673ms 32.0296μs 31.2212 KOps/s 29.3230 KOps/s $\textbf{\color{#35bf28}+6.47\%}$
test_compile_indexing[tensor-tensorclass-compile] 78.8170μs 31.5820μs 31.6637 KOps/s 31.5511 KOps/s $\color{#35bf28}+0.36\%$
test_compile_indexing[tensor-tensorclass-eager] 95.7890μs 23.8815μs 41.8735 KOps/s 41.8192 KOps/s $\color{#35bf28}+0.13\%$
test_compile_indexing[tensor-pytree-compile] 76.2620μs 32.1663μs 31.0885 KOps/s 30.5286 KOps/s $\color{#35bf28}+1.83\%$
test_compile_indexing[tensor-pytree-eager] 76.1920μs 23.5803μs 42.4083 KOps/s 42.5242 KOps/s $\color{#d91a1a}-0.27\%$
test_compile_indexing[slice-tensordict-compile] 0.1297ms 53.9009μs 18.5526 KOps/s 18.7526 KOps/s $\color{#d91a1a}-1.07\%$
test_compile_indexing[slice-tensordict-eager] 0.3921ms 19.5189μs 51.2324 KOps/s 48.9368 KOps/s $\color{#35bf28}+4.69\%$
test_compile_indexing[slice-tensorclass-compile] 97.7330μs 46.3502μs 21.5749 KOps/s 21.4628 KOps/s $\color{#35bf28}+0.52\%$
test_compile_indexing[slice-tensorclass-eager] 56.0350μs 18.7943μs 53.2077 KOps/s 54.0684 KOps/s $\color{#d91a1a}-1.59\%$
test_compile_indexing[slice-pytree-compile] 0.1272ms 47.6632μs 20.9805 KOps/s 21.3552 KOps/s $\color{#d91a1a}-1.75\%$
test_compile_indexing[slice-pytree-eager] 54.3820μs 18.6097μs 53.7354 KOps/s 54.6771 KOps/s $\color{#d91a1a}-1.72\%$
test_compile_indexing[int-tensordict-compile] 0.1085ms 55.3724μs 18.0595 KOps/s 18.2925 KOps/s $\color{#d91a1a}-1.27\%$
test_compile_indexing[int-tensordict-eager] 1.1738ms 19.5272μs 51.2107 KOps/s 49.5256 KOps/s $\color{#35bf28}+3.40\%$
test_compile_indexing[int-tensorclass-compile] 0.6796ms 47.3242μs 21.1308 KOps/s 21.2978 KOps/s $\color{#d91a1a}-0.78\%$
test_compile_indexing[int-tensorclass-eager] 61.9060μs 18.6252μs 53.6906 KOps/s 54.2278 KOps/s $\color{#d91a1a}-0.99\%$
test_compile_indexing[int-pytree-compile] 0.1107ms 47.1659μs 21.2018 KOps/s 21.3995 KOps/s $\color{#d91a1a}-0.92\%$
test_compile_indexing[int-pytree-eager] 61.1340μs 18.6113μs 53.7308 KOps/s 54.0015 KOps/s $\color{#d91a1a}-0.50\%$
test_mod_add[eager] 0.1082ms 36.6405μs 27.2922 KOps/s 27.4296 KOps/s $\color{#d91a1a}-0.50\%$
test_mod_add[compile] 0.1392ms 65.0726μs 15.3675 KOps/s 15.1306 KOps/s $\color{#35bf28}+1.57\%$
test_mod_add[compile-overhead] 0.2478ms 64.1282μs 15.5938 KOps/s 15.3321 KOps/s $\color{#35bf28}+1.71\%$
test_mod_wrap[eager] 0.3781ms 0.2217ms 4.5106 KOps/s 4.5036 KOps/s $\color{#35bf28}+0.16\%$
test_mod_wrap[compile] 2.0942ms 0.2216ms 4.5124 KOps/s 4.2910 KOps/s $\textbf{\color{#35bf28}+5.16\%}$
test_mod_wrap[compile-overhead] 0.3151ms 0.2173ms 4.6028 KOps/s 4.4059 KOps/s $\color{#35bf28}+4.47\%$
test_mod_wrap_and_backward[eager] 14.1125ms 11.4406ms 87.4080 Ops/s 88.0247 Ops/s $\color{#d91a1a}-0.70\%$
test_mod_wrap_and_backward[compile] 12.8561ms 11.3402ms 88.1820 Ops/s 86.3021 Ops/s $\color{#35bf28}+2.18\%$
test_mod_wrap_and_backward[compile-overhead] 13.1151ms 11.2698ms 88.7323 Ops/s 84.7003 Ops/s $\color{#35bf28}+4.76\%$
test_seq_add[eager] 0.2485ms 0.1188ms 8.4176 KOps/s 8.4181 KOps/s $-0.01\%$
test_seq_add[compile] 0.2064ms 75.7573μs 13.2000 KOps/s 12.8505 KOps/s $\color{#35bf28}+2.72\%$
test_seq_add[compile-overhead] 0.2267ms 75.7506μs 13.2012 KOps/s 13.2291 KOps/s $\color{#d91a1a}-0.21\%$
test_seq_wrap[eager] 1.3767ms 0.4585ms 2.1811 KOps/s 2.2391 KOps/s $\color{#d91a1a}-2.59\%$
test_seq_wrap[compile] 0.3771ms 0.2351ms 4.2539 KOps/s 4.0845 KOps/s $\color{#35bf28}+4.15\%$
test_seq_wrap[compile-overhead] 0.3877ms 0.2348ms 4.2597 KOps/s 4.0638 KOps/s $\color{#35bf28}+4.82\%$
test_func_call_runtime[False-eager] 0.7474ms 0.5313ms 1.8823 KOps/s 1.8846 KOps/s $\color{#d91a1a}-0.12\%$
test_func_call_runtime[False-compile] 0.5828ms 0.4364ms 2.2916 KOps/s 2.2054 KOps/s $\color{#35bf28}+3.91\%$
test_func_call_runtime[False-compile-overhead] 0.6066ms 0.4353ms 2.2973 KOps/s 2.2220 KOps/s $\color{#35bf28}+3.39\%$
test_func_call_runtime[True-eager] 1.5278ms 0.7509ms 1.3318 KOps/s 1.3456 KOps/s $\color{#d91a1a}-1.03\%$
test_func_call_runtime[True-compile] 0.5734ms 0.4555ms 2.1955 KOps/s 2.0616 KOps/s $\textbf{\color{#35bf28}+6.49\%}$
test_func_call_runtime[True-compile-overhead] 0.6108ms 0.4584ms 2.1815 KOps/s 2.1033 KOps/s $\color{#35bf28}+3.72\%$
test_func_call_cm_runtime[False-eager] 0.7976ms 0.5252ms 1.9041 KOps/s 1.9015 KOps/s $\color{#35bf28}+0.14\%$
test_func_call_cm_runtime[False-compile] 0.8885ms 0.4381ms 2.2825 KOps/s 2.2180 KOps/s $\color{#35bf28}+2.91\%$
test_func_call_cm_runtime[False-compile-overhead] 1.0189ms 0.4408ms 2.2685 KOps/s 2.1959 KOps/s $\color{#35bf28}+3.31\%$
test_func_call_cm_runtime[True-eager] 1.0932ms 0.8791ms 1.1375 KOps/s 1.1240 KOps/s $\color{#35bf28}+1.21\%$
test_func_call_cm_runtime[True-compile] 0.9591ms 0.7864ms 1.2716 KOps/s 1.2646 KOps/s $\color{#35bf28}+0.56\%$
test_func_call_cm_runtime[True-compile-overhead] 1.2960ms 0.7840ms 1.2755 KOps/s 1.2538 KOps/s $\color{#35bf28}+1.74\%$
test_vmap_func_call_cm_runtime[eager] 3.4546ms 1.9220ms 520.2982 Ops/s 525.8850 Ops/s $\color{#d91a1a}-1.06\%$
test_vmap_func_call_cm_runtime[compile] 1.0642ms 0.5284ms 1.8925 KOps/s 1.8626 KOps/s $\color{#35bf28}+1.61\%$
test_vmap_func_call_cm_runtime[compile-overhead] 1.2229ms 0.5317ms 1.8807 KOps/s 1.8161 KOps/s $\color{#35bf28}+3.56\%$
test_distributed 0.2362ms 0.1225ms 8.1658 KOps/s 7.7379 KOps/s $\textbf{\color{#35bf28}+5.53\%}$
test_tdmodule 45.1650μs 27.5173μs 36.3408 KOps/s 35.0652 KOps/s $\color{#35bf28}+3.64\%$
test_tdmodule_dispatch 84.5790μs 50.1125μs 19.9551 KOps/s 19.4484 KOps/s $\color{#35bf28}+2.61\%$
test_tdseq 55.1830μs 29.6158μs 33.7657 KOps/s 33.1559 KOps/s $\color{#35bf28}+1.84\%$
test_tdseq_dispatch 83.9070μs 54.0318μs 18.5076 KOps/s 17.9974 KOps/s $\color{#35bf28}+2.83\%$
test_instantiation_functorch 1.6254ms 1.5100ms 662.2430 Ops/s 641.6381 Ops/s $\color{#35bf28}+3.21\%$
test_exec_functorch 0.3016ms 0.1806ms 5.5372 KOps/s 5.4893 KOps/s $\color{#35bf28}+0.87\%$
test_exec_functional_call 0.2620ms 0.1715ms 5.8302 KOps/s 5.7787 KOps/s $\color{#35bf28}+0.89\%$
test_exec_td_decorator 0.5190ms 0.2317ms 4.3161 KOps/s 4.2737 KOps/s $\color{#35bf28}+0.99\%$
test_vmap_mlp_speed_decorator[True-True] 0.8718ms 0.6524ms 1.5327 KOps/s 1.5044 KOps/s $\color{#35bf28}+1.88\%$
test_vmap_mlp_speed_decorator[True-False] 0.9382ms 0.6527ms 1.5321 KOps/s 1.5241 KOps/s $\color{#35bf28}+0.52\%$
test_vmap_mlp_speed_decorator[False-True] 0.8409ms 0.5285ms 1.8922 KOps/s 1.8934 KOps/s $\color{#d91a1a}-0.06\%$
test_vmap_mlp_speed_decorator[False-False] 0.7540ms 0.5280ms 1.8938 KOps/s 1.8858 KOps/s $\color{#35bf28}+0.42\%$
test_to_module_speed[True] 2.0239ms 1.3158ms 760.0066 Ops/s 757.9523 Ops/s $\color{#35bf28}+0.27\%$
test_to_module_speed[False] 1.4068ms 1.2738ms 785.0615 Ops/s 766.9220 Ops/s $\color{#35bf28}+2.37\%$
test_tc_init 81.2530μs 47.7485μs 20.9431 KOps/s 21.6198 KOps/s $\color{#d91a1a}-3.13\%$
test_tc_init_nested 0.2034ms 95.2511μs 10.4986 KOps/s 10.4635 KOps/s $\color{#35bf28}+0.34\%$
test_tc_first_layer_tensor 28.8200μs 1.5465μs 646.6378 KOps/s 656.4066 KOps/s $\color{#d91a1a}-1.49\%$
test_tc_first_layer_nontensor 27.3010μs 4.7142μs 212.1257 KOps/s 215.6275 KOps/s $\color{#d91a1a}-1.62\%$
test_tc_second_layer_tensor 32.3910μs 2.8737μs 347.9850 KOps/s 356.9225 KOps/s $\color{#d91a1a}-2.50\%$
test_tc_second_layer_nontensor 53.0990μs 6.0495μs 165.3019 KOps/s 166.6428 KOps/s $\color{#d91a1a}-0.80\%$
test_unbind 0.2693s 14.2253ms 70.2974 Ops/s 60.3136 Ops/s $\textbf{\color{#35bf28}+16.55\%}$
test_full_like 11.9154ms 9.2372ms 108.2574 Ops/s 111.6149 Ops/s $\color{#d91a1a}-3.01\%$
test_zeros_like 7.5362ms 3.3587ms 297.7301 Ops/s 310.3354 Ops/s $\color{#d91a1a}-4.06\%$
test_ones_like 5.0787ms 3.5163ms 284.3901 Ops/s 262.7444 Ops/s $\textbf{\color{#35bf28}+8.24\%}$
test_clone 9.9492ms 6.9424ms 144.0415 Ops/s 149.7034 Ops/s $\color{#d91a1a}-3.78\%$
test_squeeze 87.7250μs 12.5236μs 79.8492 KOps/s 78.5591 KOps/s $\color{#35bf28}+1.64\%$
test_unsqueeze 0.1561ms 93.4249μs 10.7038 KOps/s 10.7465 KOps/s $\color{#d91a1a}-0.40\%$
test_split 0.3517ms 0.1969ms 5.0789 KOps/s 5.1950 KOps/s $\color{#d91a1a}-2.23\%$
test_permute 0.2921ms 0.2020ms 4.9515 KOps/s 5.0711 KOps/s $\color{#d91a1a}-2.36\%$
test_stack 32.9651ms 27.6642ms 36.1478 Ops/s 35.0862 Ops/s $\color{#35bf28}+3.03\%$
test_cat 32.9034ms 27.9696ms 35.7531 Ops/s 34.0832 Ops/s $\color{#35bf28}+4.90\%$

@vmoens vmoens added the bug Something isn't working label Mar 5, 2025
@vmoens vmoens merged commit 37a5551 into gh/vmoens/48/base Mar 5, 2025
48 of 49 checks passed
vmoens added a commit that referenced this pull request Mar 5, 2025
ghstack-source-id: 8972977b8317ad78d98ad20d6ee7ecf0337d0bd4
Pull Request resolved: #1252
@vmoens vmoens deleted the gh/vmoens/48/head branch March 5, 2025 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants