-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Ensure that maybe_dense_stack preserves the TC type #1252
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
vmoens
added a commit
that referenced
this pull request
Mar 5, 2025
ghstack-source-id: 8972977b8317ad78d98ad20d6ee7ecf0337d0bd4 Pull Request resolved: #1252
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 45.8460μs | 20.5343μs | 48.6989 KOps/s | 48.2704 KOps/s | |
test_plain_set_stack_nested | 50.2430μs | 21.0069μs | 47.6034 KOps/s | 47.6093 KOps/s | |
test_plain_set_nested_inplace | 78.9780μs | 22.4274μs | 44.5883 KOps/s | 43.8913 KOps/s | |
test_plain_set_stack_nested_inplace | 62.5970μs | 22.4277μs | 44.5878 KOps/s | 43.7942 KOps/s | |
test_items | 41.3770μs | 4.1724μs | 239.6709 KOps/s | 241.3615 KOps/s | |
test_items_nested | 0.5139ms | 0.4048ms | 2.4707 KOps/s | 2.4616 KOps/s | |
test_items_nested_locked | 0.8296ms | 0.4052ms | 2.4681 KOps/s | 2.4690 KOps/s | |
test_items_nested_leaf | 0.1413ms | 78.0295μs | 12.8157 KOps/s | 13.0755 KOps/s | |
test_items_stack_nested | 0.5301ms | 0.4071ms | 2.4566 KOps/s | 2.4517 KOps/s | |
test_items_stack_nested_leaf | 0.1349ms | 77.7082μs | 12.8687 KOps/s | 12.8462 KOps/s | |
test_items_stack_nested_locked | 0.6700ms | 0.4074ms | 2.4546 KOps/s | 2.4563 KOps/s | |
test_keys | 40.2650μs | 3.4793μs | 287.4116 KOps/s | 289.4778 KOps/s | |
test_keys_nested | 0.2400ms | 0.1630ms | 6.1332 KOps/s | 5.9903 KOps/s | |
test_keys_nested_locked | 1.8149ms | 0.1698ms | 5.8882 KOps/s | 5.8352 KOps/s | |
test_keys_nested_leaf | 0.2336ms | 0.1434ms | 6.9759 KOps/s | 6.9604 KOps/s | |
test_keys_stack_nested | 0.2517ms | 0.1633ms | 6.1223 KOps/s | 6.0858 KOps/s | |
test_keys_stack_nested_leaf | 0.2368ms | 0.1438ms | 6.9565 KOps/s | 6.9528 KOps/s | |
test_keys_stack_nested_locked | 0.2676ms | 0.1702ms | 5.8756 KOps/s | 5.8315 KOps/s | |
test_values | 8.6740μs | 1.0298μs | 971.0837 KOps/s | 959.0679 KOps/s | |
test_values_nested | 0.1166ms | 62.1209μs | 16.0977 KOps/s | 15.9915 KOps/s | |
test_values_nested_locked | 0.1386ms | 61.9189μs | 16.1502 KOps/s | 16.0599 KOps/s | |
test_values_nested_leaf | 0.1318ms | 71.4034μs | 14.0049 KOps/s | 13.9865 KOps/s | |
test_values_stack_nested | 0.1128ms | 61.8194μs | 16.1761 KOps/s | 15.9256 KOps/s | |
test_values_stack_nested_leaf | 0.1382ms | 71.3074μs | 14.0238 KOps/s | 13.9496 KOps/s | |
test_values_stack_nested_locked | 0.1138ms | 61.6265μs | 16.2268 KOps/s | 15.9233 KOps/s | |
test_membership | 25.4280μs | 0.8841μs | 1.1311 MOps/s | 1.1296 MOps/s | |
test_membership_nested | 32.8810μs | 2.8589μs | 349.7904 KOps/s | 342.3871 KOps/s | |
test_membership_nested_leaf | 52.8480μs | 2.8693μs | 348.5145 KOps/s | 308.8320 KOps/s | |
test_membership_stacked_nested | 60.8240μs | 2.8617μs | 349.4443 KOps/s | 341.5434 KOps/s | |
test_membership_stacked_nested_leaf | 43.5610μs | 2.8451μs | 351.4862 KOps/s | 338.6159 KOps/s | |
test_membership_nested_last | 33.5030μs | 4.2761μs | 233.8567 KOps/s | 226.9099 KOps/s | |
test_membership_nested_leaf_last | 49.1420μs | 4.2883μs | 233.1920 KOps/s | 224.4860 KOps/s | |
test_membership_stacked_nested_last | 36.6980μs | 4.2661μs | 234.4047 KOps/s | 230.9008 KOps/s | |
test_membership_stacked_nested_leaf_last | 44.2320μs | 4.2473μs | 235.4422 KOps/s | 227.4071 KOps/s | |
test_nested_getleaf | 55.8950μs | 10.4623μs | 95.5816 KOps/s | 93.6017 KOps/s | |
test_nested_get | 43.0710μs | 9.9210μs | 100.7967 KOps/s | 99.3900 KOps/s | |
test_stacked_getleaf | 49.8130μs | 10.3989μs | 96.1639 KOps/s | 94.9018 KOps/s | |
test_stacked_get | 54.6330μs | 9.8608μs | 101.4115 KOps/s | 100.6053 KOps/s | |
test_nested_getitemleaf | 38.9420μs | 11.2656μs | 88.7655 KOps/s | 88.5534 KOps/s | |
test_nested_getitem | 58.4600μs | 10.5788μs | 94.5285 KOps/s | 94.9406 KOps/s | |
test_stacked_getitemleaf | 64.2640μs | 11.1524μs | 89.6669 KOps/s | 89.3638 KOps/s | |
test_stacked_getitem | 43.7020μs | 10.5105μs | 95.1430 KOps/s | 94.5929 KOps/s | |
test_lock_nested | 0.8050ms | 0.4080ms | 2.4510 KOps/s | 2.4394 KOps/s | |
test_lock_stack_nested | 0.7593ms | 0.4186ms | 2.3888 KOps/s | 2.3519 KOps/s | |
test_unlock_nested | 0.6146ms | 0.3298ms | 3.0324 KOps/s | 2.9991 KOps/s | |
test_unlock_stack_nested | 0.5278ms | 0.3345ms | 2.9896 KOps/s | 2.9024 KOps/s | |
test_flatten_speed | 0.1879ms | 99.3385μs | 10.0666 KOps/s | 9.9256 KOps/s | |
test_unflatten_speed | 0.8975ms | 0.5253ms | 1.9037 KOps/s | 1.9016 KOps/s | |
test_common_ops | 1.0097ms | 0.8011ms | 1.2483 KOps/s | 1.1920 KOps/s | |
test_creation | 31.3080μs | 2.4938μs | 400.9878 KOps/s | 400.5175 KOps/s | |
test_creation_empty | 42.9900μs | 12.0755μs | 82.8120 KOps/s | 85.6394 KOps/s | |
test_creation_nested_1 | 49.1620μs | 15.1332μs | 66.0797 KOps/s | 68.1188 KOps/s | |
test_creation_nested_2 | 56.0350μs | 19.5102μs | 51.2551 KOps/s | 51.8566 KOps/s | |
test_clone | 0.1243ms | 13.5803μs | 73.6362 KOps/s | 73.2067 KOps/s | |
test_getitem[int] | 0.8757ms | 12.6203μs | 79.2377 KOps/s | 76.2132 KOps/s | |
test_getitem[slice_int] | 0.1394ms | 23.8888μs | 41.8606 KOps/s | 40.8803 KOps/s | |
test_getitem[range] | 0.1814ms | 50.7030μs | 19.7227 KOps/s | 19.7650 KOps/s | |
test_getitem[tuple] | 0.1350ms | 19.9083μs | 50.2303 KOps/s | 48.2315 KOps/s | |
test_getitem[list] | 0.1595ms | 44.4787μs | 22.4827 KOps/s | 22.3343 KOps/s | |
test_setitem_dim[int] | 57.5770μs | 25.3505μs | 39.4470 KOps/s | 38.4305 KOps/s | |
test_setitem_dim[slice_int] | 82.0940μs | 50.3198μs | 19.8729 KOps/s | 19.6559 KOps/s | |
test_setitem_dim[range] | 0.1912ms | 75.7033μs | 13.2095 KOps/s | 12.9549 KOps/s | |
test_setitem_dim[tuple] | 79.9290μs | 39.4773μs | 25.3310 KOps/s | 24.4158 KOps/s | |
test_setitem | 89.3280μs | 20.6375μs | 48.4554 KOps/s | 47.8647 KOps/s | |
test_set | 90.5990μs | 20.1427μs | 49.6458 KOps/s | 48.8154 KOps/s | |
test_set_shared | 4.7103ms | 0.1796ms | 5.5680 KOps/s | 5.4250 KOps/s | |
test_update | 0.1322ms | 26.2676μs | 38.0697 KOps/s | 37.7160 KOps/s | |
test_update_nested | 0.1113ms | 41.9802μs | 23.8208 KOps/s | 23.9570 KOps/s | |
test_update__nested | 0.4784ms | 33.6744μs | 29.6962 KOps/s | 29.9762 KOps/s | |
test_set_nested | 62.9780μs | 22.2795μs | 44.8844 KOps/s | 44.9100 KOps/s | |
test_set_nested_new | 0.1062ms | 26.9145μs | 37.1547 KOps/s | 37.2439 KOps/s | |
test_select | 0.1315ms | 42.8051μs | 23.3617 KOps/s | 23.2447 KOps/s | |
test_select_nested | 0.1356ms | 63.2267μs | 15.8161 KOps/s | 16.0212 KOps/s | |
test_exclude_nested | 0.1683ms | 79.7687μs | 12.5362 KOps/s | 12.4925 KOps/s | |
test_empty[True] | 0.6862ms | 0.4029ms | 2.4820 KOps/s | 2.4647 KOps/s | |
test_empty[False] | 9.0517μs | 1.3800μs | 724.6611 KOps/s | 721.6739 KOps/s | |
test_unbind_speed | 0.6317ms | 0.2666ms | 3.7503 KOps/s | 3.6592 KOps/s | |
test_unbind_speed_stack0 | 0.4225ms | 0.2636ms | 3.7934 KOps/s | 3.7072 KOps/s | |
test_unbind_speed_stack1 | 1.1648ms | 0.6637ms | 1.5067 KOps/s | 1.2034 KOps/s | |
test_split | 0.1214s | 1.7516ms | 570.9209 Ops/s | 624.8068 Ops/s | |
test_chunk | 0.1265s | 1.7738ms | 563.7726 Ops/s | 506.8700 Ops/s | |
test_consolidate_njt[False-None] | 8.3700ms | 8.0847ms | 123.6899 Ops/s | 123.2551 Ops/s | |
test_creation[device0] | 0.2684ms | 90.7301μs | 11.0217 KOps/s | 10.7881 KOps/s | |
test_creation_from_tensor | 3.7059ms | 95.7946μs | 10.4390 KOps/s | 10.4846 KOps/s | |
test_add_one[memmap_tensor0] | 66.1930μs | 4.9026μs | 203.9732 KOps/s | 187.1354 KOps/s | |
test_contiguous[memmap_tensor0] | 19.8680μs | 0.4998μs | 2.0008 MOps/s | 1.9514 MOps/s | |
test_stack[memmap_tensor0] | 23.5940μs | 3.3907μs | 294.9214 KOps/s | 275.7944 KOps/s | |
test_memmaptd_index | 0.3851ms | 0.2318ms | 4.3145 KOps/s | 4.2966 KOps/s | |
test_memmaptd_index_astensor | 1.0557ms | 0.3143ms | 3.1814 KOps/s | 3.0957 KOps/s | |
test_memmaptd_index_op | 0.8889ms | 0.5764ms | 1.7348 KOps/s | 1.6700 KOps/s | |
test_serialize_model | 0.2316s | 0.1312s | 7.6208 Ops/s | 8.3145 Ops/s | |
test_serialize_model_pickle | 0.4631s | 0.4028s | 2.4828 Ops/s | 2.5061 Ops/s | |
test_serialize_weights | 0.1189s | 0.1126s | 8.8832 Ops/s | 8.4147 Ops/s | |
test_serialize_weights_returnearly | 0.1840s | 0.1671s | 5.9834 Ops/s | 5.4287 Ops/s | |
test_serialize_weights_pickle | 0.6337s | 0.4582s | 2.1823 Ops/s | 2.4905 Ops/s | |
test_serialize_weights_filesystem | 0.2480s | 0.1613s | 6.1978 Ops/s | 7.0191 Ops/s | |
test_serialize_model_filesystem | 0.1509s | 0.1437s | 6.9569 Ops/s | 6.4591 Ops/s | |
test_reshape_pytree | 87.9340μs | 26.7314μs | 37.4092 KOps/s | 38.2511 KOps/s | |
test_reshape_td | 68.9490μs | 32.1970μs | 31.0588 KOps/s | 29.5843 KOps/s | |
test_view_pytree | 64.4710μs | 26.1442μs | 38.2494 KOps/s | 38.4444 KOps/s | |
test_view_td | 0.1139ms | 39.0612μs | 25.6009 KOps/s | 24.9443 KOps/s | |
test_unbind_pytree | 71.8340μs | 29.2620μs | 34.1741 KOps/s | 33.1153 KOps/s | |
test_unbind_td | 0.3621ms | 39.4472μs | 25.3504 KOps/s | 23.7322 KOps/s | |
test_split_pytree | 78.2960μs | 29.0428μs | 34.4319 KOps/s | 34.0600 KOps/s | |
test_split_td | 0.5744ms | 44.6343μs | 22.4043 KOps/s | 22.0202 KOps/s | |
test_add_pytree | 77.6560μs | 35.9047μs | 27.8515 KOps/s | 28.0257 KOps/s | |
test_add_td | 0.1298ms | 55.0977μs | 18.1496 KOps/s | 17.7051 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.1461ms | 67.1806μs | 14.8853 KOps/s | 14.8880 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.3883ms | 0.1727ms | 5.7900 KOps/s | 5.7691 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.1219ms | 45.6433μs | 21.9090 KOps/s | 21.2321 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.2280ms | 0.1183ms | 8.4528 KOps/s | 8.3357 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 76.3330μs | 28.0566μs | 35.6422 KOps/s | 35.1040 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 0.1238ms | 58.5333μs | 17.0843 KOps/s | 17.2999 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.1365ms | 79.2949μs | 12.6112 KOps/s | 12.6864 KOps/s | |
test_compile_copy_nested[pytree-eager] | 0.1619ms | 66.3406μs | 15.0737 KOps/s | 15.0863 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.1835ms | 0.1066ms | 9.3804 KOps/s | 9.3509 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.3368ms | 0.2132ms | 4.6902 KOps/s | 4.6132 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.1052ms | 48.3753μs | 20.6717 KOps/s | 21.5101 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.2835ms | 67.0903μs | 14.9053 KOps/s | 14.6719 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.2494ms | 0.1025ms | 9.7585 KOps/s | 9.9398 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.3058ms | 0.1986ms | 5.0351 KOps/s | 4.9072 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.4587ms | 0.2301ms | 4.3452 KOps/s | 4.2983 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.2048ms | 0.1062ms | 9.4191 KOps/s | 9.2915 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.2653ms | 62.7736μs | 15.9303 KOps/s | 16.1495 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.1444ms | 48.8634μs | 20.4652 KOps/s | 20.5793 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.3571ms | 0.1553ms | 6.4389 KOps/s | 6.2679 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.1820ms | 0.1007ms | 9.9306 KOps/s | 9.8392 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 62.9970μs | 20.9993μs | 47.6206 KOps/s | 47.5651 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 0.1358ms | 67.1097μs | 14.9010 KOps/s | 15.2291 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1720ms | 80.5850μs | 12.4093 KOps/s | 12.3257 KOps/s | |
test_compile_copy_flat[pytree-eager] | 0.1407ms | 66.4216μs | 15.0554 KOps/s | 14.9912 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 0.4101ms | 0.2133ms | 4.6884 KOps/s | 4.6166 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 2.1546ms | 1.3521ms | 739.6130 Ops/s | 710.4124 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 0.2938ms | 0.2083ms | 4.8001 KOps/s | 4.6853 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 1.0205ms | 0.8150ms | 1.2270 KOps/s | 1.1682 KOps/s | |
test_compile_assign_and_add_stack[compile] | 0.6225ms | 0.4508ms | 2.2185 KOps/s | 2.1953 KOps/s | |
test_compile_assign_and_add_stack[eager] | 3.3132ms | 2.7089ms | 369.1542 Ops/s | 365.0904 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.7140ms | 38.7120μs | 25.8318 KOps/s | 25.4421 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.5673ms | 32.0296μs | 31.2212 KOps/s | 29.3230 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 78.8170μs | 31.5820μs | 31.6637 KOps/s | 31.5511 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 95.7890μs | 23.8815μs | 41.8735 KOps/s | 41.8192 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 76.2620μs | 32.1663μs | 31.0885 KOps/s | 30.5286 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 76.1920μs | 23.5803μs | 42.4083 KOps/s | 42.5242 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.1297ms | 53.9009μs | 18.5526 KOps/s | 18.7526 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.3921ms | 19.5189μs | 51.2324 KOps/s | 48.9368 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 97.7330μs | 46.3502μs | 21.5749 KOps/s | 21.4628 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 56.0350μs | 18.7943μs | 53.2077 KOps/s | 54.0684 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.1272ms | 47.6632μs | 20.9805 KOps/s | 21.3552 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 54.3820μs | 18.6097μs | 53.7354 KOps/s | 54.6771 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.1085ms | 55.3724μs | 18.0595 KOps/s | 18.2925 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 1.1738ms | 19.5272μs | 51.2107 KOps/s | 49.5256 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.6796ms | 47.3242μs | 21.1308 KOps/s | 21.2978 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 61.9060μs | 18.6252μs | 53.6906 KOps/s | 54.2278 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.1107ms | 47.1659μs | 21.2018 KOps/s | 21.3995 KOps/s | |
test_compile_indexing[int-pytree-eager] | 61.1340μs | 18.6113μs | 53.7308 KOps/s | 54.0015 KOps/s | |
test_mod_add[eager] | 0.1082ms | 36.6405μs | 27.2922 KOps/s | 27.4296 KOps/s | |
test_mod_add[compile] | 0.1392ms | 65.0726μs | 15.3675 KOps/s | 15.1306 KOps/s | |
test_mod_add[compile-overhead] | 0.2478ms | 64.1282μs | 15.5938 KOps/s | 15.3321 KOps/s | |
test_mod_wrap[eager] | 0.3781ms | 0.2217ms | 4.5106 KOps/s | 4.5036 KOps/s | |
test_mod_wrap[compile] | 2.0942ms | 0.2216ms | 4.5124 KOps/s | 4.2910 KOps/s | |
test_mod_wrap[compile-overhead] | 0.3151ms | 0.2173ms | 4.6028 KOps/s | 4.4059 KOps/s | |
test_mod_wrap_and_backward[eager] | 14.1125ms | 11.4406ms | 87.4080 Ops/s | 88.0247 Ops/s | |
test_mod_wrap_and_backward[compile] | 12.8561ms | 11.3402ms | 88.1820 Ops/s | 86.3021 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 13.1151ms | 11.2698ms | 88.7323 Ops/s | 84.7003 Ops/s | |
test_seq_add[eager] | 0.2485ms | 0.1188ms | 8.4176 KOps/s | 8.4181 KOps/s | |
test_seq_add[compile] | 0.2064ms | 75.7573μs | 13.2000 KOps/s | 12.8505 KOps/s | |
test_seq_add[compile-overhead] | 0.2267ms | 75.7506μs | 13.2012 KOps/s | 13.2291 KOps/s | |
test_seq_wrap[eager] | 1.3767ms | 0.4585ms | 2.1811 KOps/s | 2.2391 KOps/s | |
test_seq_wrap[compile] | 0.3771ms | 0.2351ms | 4.2539 KOps/s | 4.0845 KOps/s | |
test_seq_wrap[compile-overhead] | 0.3877ms | 0.2348ms | 4.2597 KOps/s | 4.0638 KOps/s | |
test_func_call_runtime[False-eager] | 0.7474ms | 0.5313ms | 1.8823 KOps/s | 1.8846 KOps/s | |
test_func_call_runtime[False-compile] | 0.5828ms | 0.4364ms | 2.2916 KOps/s | 2.2054 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.6066ms | 0.4353ms | 2.2973 KOps/s | 2.2220 KOps/s | |
test_func_call_runtime[True-eager] | 1.5278ms | 0.7509ms | 1.3318 KOps/s | 1.3456 KOps/s | |
test_func_call_runtime[True-compile] | 0.5734ms | 0.4555ms | 2.1955 KOps/s | 2.0616 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.6108ms | 0.4584ms | 2.1815 KOps/s | 2.1033 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.7976ms | 0.5252ms | 1.9041 KOps/s | 1.9015 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.8885ms | 0.4381ms | 2.2825 KOps/s | 2.2180 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 1.0189ms | 0.4408ms | 2.2685 KOps/s | 2.1959 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.0932ms | 0.8791ms | 1.1375 KOps/s | 1.1240 KOps/s | |
test_func_call_cm_runtime[True-compile] | 0.9591ms | 0.7864ms | 1.2716 KOps/s | 1.2646 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 1.2960ms | 0.7840ms | 1.2755 KOps/s | 1.2538 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 3.4546ms | 1.9220ms | 520.2982 Ops/s | 525.8850 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 1.0642ms | 0.5284ms | 1.8925 KOps/s | 1.8626 KOps/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 1.2229ms | 0.5317ms | 1.8807 KOps/s | 1.8161 KOps/s | |
test_distributed | 0.2362ms | 0.1225ms | 8.1658 KOps/s | 7.7379 KOps/s | |
test_tdmodule | 45.1650μs | 27.5173μs | 36.3408 KOps/s | 35.0652 KOps/s | |
test_tdmodule_dispatch | 84.5790μs | 50.1125μs | 19.9551 KOps/s | 19.4484 KOps/s | |
test_tdseq | 55.1830μs | 29.6158μs | 33.7657 KOps/s | 33.1559 KOps/s | |
test_tdseq_dispatch | 83.9070μs | 54.0318μs | 18.5076 KOps/s | 17.9974 KOps/s | |
test_instantiation_functorch | 1.6254ms | 1.5100ms | 662.2430 Ops/s | 641.6381 Ops/s | |
test_exec_functorch | 0.3016ms | 0.1806ms | 5.5372 KOps/s | 5.4893 KOps/s | |
test_exec_functional_call | 0.2620ms | 0.1715ms | 5.8302 KOps/s | 5.7787 KOps/s | |
test_exec_td_decorator | 0.5190ms | 0.2317ms | 4.3161 KOps/s | 4.2737 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 0.8718ms | 0.6524ms | 1.5327 KOps/s | 1.5044 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.9382ms | 0.6527ms | 1.5321 KOps/s | 1.5241 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.8409ms | 0.5285ms | 1.8922 KOps/s | 1.8934 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7540ms | 0.5280ms | 1.8938 KOps/s | 1.8858 KOps/s | |
test_to_module_speed[True] | 2.0239ms | 1.3158ms | 760.0066 Ops/s | 757.9523 Ops/s | |
test_to_module_speed[False] | 1.4068ms | 1.2738ms | 785.0615 Ops/s | 766.9220 Ops/s | |
test_tc_init | 81.2530μs | 47.7485μs | 20.9431 KOps/s | 21.6198 KOps/s | |
test_tc_init_nested | 0.2034ms | 95.2511μs | 10.4986 KOps/s | 10.4635 KOps/s | |
test_tc_first_layer_tensor | 28.8200μs | 1.5465μs | 646.6378 KOps/s | 656.4066 KOps/s | |
test_tc_first_layer_nontensor | 27.3010μs | 4.7142μs | 212.1257 KOps/s | 215.6275 KOps/s | |
test_tc_second_layer_tensor | 32.3910μs | 2.8737μs | 347.9850 KOps/s | 356.9225 KOps/s | |
test_tc_second_layer_nontensor | 53.0990μs | 6.0495μs | 165.3019 KOps/s | 166.6428 KOps/s | |
test_unbind | 0.2693s | 14.2253ms | 70.2974 Ops/s | 60.3136 Ops/s | |
test_full_like | 11.9154ms | 9.2372ms | 108.2574 Ops/s | 111.6149 Ops/s | |
test_zeros_like | 7.5362ms | 3.3587ms | 297.7301 Ops/s | 310.3354 Ops/s | |
test_ones_like | 5.0787ms | 3.5163ms | 284.3901 Ops/s | 262.7444 Ops/s | |
test_clone | 9.9492ms | 6.9424ms | 144.0415 Ops/s | 149.7034 Ops/s | |
test_squeeze | 87.7250μs | 12.5236μs | 79.8492 KOps/s | 78.5591 KOps/s | |
test_unsqueeze | 0.1561ms | 93.4249μs | 10.7038 KOps/s | 10.7465 KOps/s | |
test_split | 0.3517ms | 0.1969ms | 5.0789 KOps/s | 5.1950 KOps/s | |
test_permute | 0.2921ms | 0.2020ms | 4.9515 KOps/s | 5.0711 KOps/s | |
test_stack | 32.9651ms | 27.6642ms | 36.1478 Ops/s | 35.0862 Ops/s | |
test_cat | 32.9034ms | 27.9696ms | 35.7531 Ops/s | 34.0832 Ops/s |
vmoens
added a commit
that referenced
this pull request
Mar 5, 2025
ghstack-source-id: 8972977b8317ad78d98ad20d6ee7ecf0337d0bd4 Pull Request resolved: #1252
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Something isn't working
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):