Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Used hardware PRNG for dropout #26

Merged
merged 1 commit into from
Aug 1, 2024
Merged

Used hardware PRNG for dropout #26

merged 1 commit into from
Aug 1, 2024

Conversation

amahmudTT
Copy link
Contributor

Implemented the dropout.

Used 31 bits from PRNG generator for perf.
Used 32 bit float for scale factor, in BUDA it is using 16 bits.

The seed initialization is very slow and requires 500 NOPS at least.
We could pull some code out of the dropout function and put in the initialization for performance but I tried to follow existing code.

Post commit tests pass (it was not being used by anything else) , I will run them again though when I make the pr to tt-metal.
https://github.com/tenstorrent/tt-metal/actions/runs/10087888953

@amahmudTT amahmudTT added the enhancement New feature or request label Jul 25, 2024
@amahmudTT amahmudTT self-assigned this Jul 25, 2024
@amahmudTT
Copy link
Contributor Author

amahmudTT commented Jul 25, 2024

It seems the PRNG is very crude (as long as my code is correct).

I created a test that uses a vector of size 131072 and applied differing probabilities for dropout.
The PRNG showed great dependence on the seed value and here are some example outputs.

                   Verif | INFO     | Created a constant vector of size 131072 with value 9, bf16 = 1091584272
                   Test | INFO     | dropout probability=0, dropout_rate=0               
                   Test | INFO     | dropout probability=0.1, dropout_rate=0.11258316
                   Test | ERROR    | Dropout rate & probability mismatch probability=0.2, dropout_rate=0.1400299
                   Test | INFO     | dropout probability=0.3, dropout_rate=0.31822968
                   Test | ERROR    | Dropout rate & probability mismatch probability=0.4, dropout_rate=0.4645157
                   Test | INFO     | dropout probability=0.5, dropout_rate=0.5048866
                   Test | ERROR    | Dropout rate & probability mismatch probability=0.6, dropout_rate=0.520134
                   Test | ERROR    | Dropout rate & probability mismatch probability=0.7, dropout_rate=0.573844
                   Test | ERROR    | Dropout rate & probability mismatch probability=0.8, dropout_rate=0.8557396
                   Test | INFO     | dropout probability=0.9, dropout_rate=0.9181709
                   Test | INFO     | dropout probability=1, dropout_rate=1

Half the time the dropout rate deviated quite a bit from probability. It seems it achieves good results when the rates are
0.25, 0.5 (0&1 are trivial). Which signifies the algorithm was tuned to distribute the numbers evenly in halves/quarters but not overall. I tried using 1 bit shift instead of removing the sign bit from the generated random number, but still it did not help.

@ttmtrajkovic
Copy link
Contributor

This is great Anil.
Every SFPU has its own PRNG and given that there are 32 SFPUs, every PRNG will have total_number / 32 samples. every PRNG should be producing uniform distribution, as long as there's sufficient number of samples.
also, if you are using 31 bits from PRNG as your random sample, is your probability also scaled to 31 bits?

let's talk about seed init on Monday.

rand ^= mask;
}
v_endif;
TTI_SFPENCC(0,0,0,0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldnt need to enable CC explicitly as by default CC is always enabled.
you just have to worry about reverting it to the correct state if CC is modified using your insstruction

@amahmudTT
Copy link
Contributor Author

This is great Anil. Every SFPU has its own PRNG and given that there are 32 SFPUs, every PRNG will have total_number / 32 samples. every PRNG should be producing uniform distribution, as long as there's sufficient number of samples. also, if you are using 31 bits from PRNG as your random sample, is your probability also scaled to 31 bits?

let's talk about seed init on Monday.

Yes I did scale it but I assume its the responsibility of the op to make sure the input is that way.

@amahmudTT
Copy link
Contributor Author

I actually had a bug in my test kernel , I tried to follow other kernels and initialized the PRNG before every tile. So effectively my results reflected the RNG for only one tile and got inaccurate results. After initializing only once before all the tiles I get very good results.

                   Test | INFO     | dropout probability=0, dropout_rate=0
                   Test | INFO     | dropout probability=0.1, dropout_rate=0.10077286
                   Test | INFO     | dropout probability=0.2, dropout_rate=0.20359802
                   Test | INFO     | dropout probability=0.3, dropout_rate=0.29244232
                   Test | INFO     | dropout probability=0.4, dropout_rate=0.42268372
                   Test | INFO     | dropout probability=0.5, dropout_rate=0.5007248
                   Test | INFO     | dropout probability=0.6, dropout_rate=0.6022377
                   Test | INFO     | dropout probability=0.7, dropout_rate=0.7030449
                   Test | INFO     | dropout probability=0.8, dropout_rate=0.80612946
                   Test | INFO     | dropout probability=0.9, dropout_rate=0.8924713
                   Test | INFO     | dropout probability=1, dropout_rate=1

@ttmtrajkovic
Copy link
Contributor

I actually had a bug in my test kernel , I tried to follow other kernels and initialized the PRNG before every tile. So effectively my results reflected the RNG for only one tile and got inaccurate results. After initializing only once before all the tiles I get very good results.

                   Test | INFO     | dropout probability=0, dropout_rate=0
                   Test | INFO     | dropout probability=0.1, dropout_rate=0.10077286
                   Test | INFO     | dropout probability=0.2, dropout_rate=0.20359802
                   Test | INFO     | dropout probability=0.3, dropout_rate=0.29244232
                   Test | INFO     | dropout probability=0.4, dropout_rate=0.42268372
                   Test | INFO     | dropout probability=0.5, dropout_rate=0.5007248
                   Test | INFO     | dropout probability=0.6, dropout_rate=0.6022377
                   Test | INFO     | dropout probability=0.7, dropout_rate=0.7030449
                   Test | INFO     | dropout probability=0.8, dropout_rate=0.80612946
                   Test | INFO     | dropout probability=0.9, dropout_rate=0.8924713
                   Test | INFO     | dropout probability=1, dropout_rate=1

This looks great. The distribution should be even better for larger data sets

@amahmudTT amahmudTT force-pushed the amahmud/wh_llk_dropout branch from 004ad1d to 7dd490c Compare July 28, 2024 07:57
@amahmudTT amahmudTT force-pushed the amahmud/wh_llk_dropout branch from 7dd490c to 6f9f9eb Compare August 1, 2024 14:28
@amahmudTT
Copy link
Contributor Author

Post commits passes with this branch https://github.com/tenstorrent/tt-metal/actions/runs/10200615586
I think I can merge this one, once approved. I separated the call to PRNG initialization at seems multiple functions will need to call it.

@amahmudTT amahmudTT merged commit c92609b into main Aug 1, 2024
1 check passed
@amahmudTT amahmudTT deleted the amahmud/wh_llk_dropout branch August 12, 2024 16:47
@corsix
Copy link

corsix commented Sep 29, 2024

This is great Anil. Every SFPU has its own PRNG and given that there are 32 SFPUs, every PRNG will have total_number / 32 samples. every PRNG should be producing uniform distribution

How does the single seed get distributed to the 32 SFPUs? In my testing, I'm seeing all 32 vector lanes generate the same random number, which is not great, and implies I'm missing a step to cause the lanes to diverge.

@amahmudTT
Copy link
Contributor Author

This is great Anil. Every SFPU has its own PRNG and given that there are 32 SFPUs, every PRNG will have total_number / 32 samples. every PRNG should be producing uniform distribution

How does the single seed get distributed to the 32 SFPUs? In my testing, I'm seeing all 32 vector lanes generate the same random number, which is not great, and implies I'm missing a step to cause the lanes to diverge.

Are you trying to generate random numbers in a separate kernel ? Could you try initializing the seed and see if you get the same result ?

init_prng_seed

As for the dropout OP in python is implemented in such a way that the seed is reinitialized for every tile. Thus all the tiles will have similar pattern and this is going to be addressed in the future.

@corsix
Copy link

corsix commented Sep 30, 2024

My exact setup is somewhat bespoke, so I've tried to reproduce what I'm seeing using a minimally modified tt_metal programming example. Starting from add_2_integers_in_compute.cpp and its kernel add_2_tiles.cpp, I've got the lightly modified code at https://gist.github.com/corsix/3cea499e7bffa1c1551be0d0ab20131f, though it boils down to generating a single 32x32 tile of int32, and then printing it, with the crux of the kernel being:

// Init:
    dropout_tile_init(0xDEADBEEF);
    TTI_SFPMOV(0, p_sfpu::LTILEID, p_sfpu::LREG0, 0);
    TTI_SFPSHFT(-1 & 0xfff, 0, p_sfpu::LREG0, 1);
// Per iteration:
    TTI_SFPMOV(0, 9, p_sfpu::LREG0, 8);
    TTI_SFPSTORE(0, 4, 3, 0);
    TTI_SFPIADD(32, p_sfpu::LREG0, p_sfpu::LREG0, 5);
    dst_reg++;

Running with TTI_SFPMOV(0, 9, p_sfpu::LREG0, 8); commented out, the 32x32 tile ends up with each of the integers 0 through 1023 exactly once each. The integers 0 through 31 are the 32 vector lanes from the first iteration, 32 through 63 are from the 2nd iteration, and so forth. This provides a nice sanity check that we really are writing to the entire 32x32 tile, and that the general setup is capable of writing 1024 distinct values. It prints:

0x00000000, 0x00000020, 0x00000001, 0x00000021, 0x00000002, 0x00000022, 0x00000003, 0x00000023, 0x00000004, 0x00000024, 0x00000005, 0x00000025, 0x00000006, 0x00000026, 0x00000007, 0x00000027, 0x00000008, 0x00000028, 0x00000009, 0x00000029, 0x0000000a, 0x0000002a, 0x0000000b, 0x0000002b, 0x0000000c, 0x0000002c, 0x0000000d, 0x0000002d, 0x0000000e, 0x0000002e, 0x0000000f, 0x0000002f, 
0x00000010, 0x00000030, 0x00000011, 0x00000031, 0x00000012, 0x00000032, 0x00000013, 0x00000033, 0x00000014, 0x00000034, 0x00000015, 0x00000035, 0x00000016, 0x00000036, 0x00000017, 0x00000037, 0x00000018, 0x00000038, 0x00000019, 0x00000039, 0x0000001a, 0x0000003a, 0x0000001b, 0x0000003b, 0x0000001c, 0x0000003c, 0x0000001d, 0x0000003d, 0x0000001e, 0x0000003e, 0x0000001f, 0x0000003f, 
0x00000040, 0x00000060, 0x00000041, 0x00000061, 0x00000042, 0x00000062, 0x00000043, 0x00000063, 0x00000044, 0x00000064, 0x00000045, 0x00000065, 0x00000046, 0x00000066, 0x00000047, 0x00000067, 0x00000048, 0x00000068, 0x00000049, 0x00000069, 0x0000004a, 0x0000006a, 0x0000004b, 0x0000006b, 0x0000004c, 0x0000006c, 0x0000004d, 0x0000006d, 0x0000004e, 0x0000006e, 0x0000004f, 0x0000006f, 
0x00000050, 0x00000070, 0x00000051, 0x00000071, 0x00000052, 0x00000072, 0x00000053, 0x00000073, 0x00000054, 0x00000074, 0x00000055, 0x00000075, 0x00000056, 0x00000076, 0x00000057, 0x00000077, 0x00000058, 0x00000078, 0x00000059, 0x00000079, 0x0000005a, 0x0000007a, 0x0000005b, 0x0000007b, 0x0000005c, 0x0000007c, 0x0000005d, 0x0000007d, 0x0000005e, 0x0000007e, 0x0000005f, 0x0000007f, 
0x00000080, 0x000000a0, 0x00000081, 0x000000a1, 0x00000082, 0x000000a2, 0x00000083, 0x000000a3, 0x00000084, 0x000000a4, 0x00000085, 0x000000a5, 0x00000086, 0x000000a6, 0x00000087, 0x000000a7, 0x00000088, 0x000000a8, 0x00000089, 0x000000a9, 0x0000008a, 0x000000aa, 0x0000008b, 0x000000ab, 0x0000008c, 0x000000ac, 0x0000008d, 0x000000ad, 0x0000008e, 0x000000ae, 0x0000008f, 0x000000af, 
0x00000090, 0x000000b0, 0x00000091, 0x000000b1, 0x00000092, 0x000000b2, 0x00000093, 0x000000b3, 0x00000094, 0x000000b4, 0x00000095, 0x000000b5, 0x00000096, 0x000000b6, 0x00000097, 0x000000b7, 0x00000098, 0x000000b8, 0x00000099, 0x000000b9, 0x0000009a, 0x000000ba, 0x0000009b, 0x000000bb, 0x0000009c, 0x000000bc, 0x0000009d, 0x000000bd, 0x0000009e, 0x000000be, 0x0000009f, 0x000000bf, 
0x000000c0, 0x000000e0, 0x000000c1, 0x000000e1, 0x000000c2, 0x000000e2, 0x000000c3, 0x000000e3, 0x000000c4, 0x000000e4, 0x000000c5, 0x000000e5, 0x000000c6, 0x000000e6, 0x000000c7, 0x000000e7, 0x000000c8, 0x000000e8, 0x000000c9, 0x000000e9, 0x000000ca, 0x000000ea, 0x000000cb, 0x000000eb, 0x000000cc, 0x000000ec, 0x000000cd, 0x000000ed, 0x000000ce, 0x000000ee, 0x000000cf, 0x000000ef, 
0x000000d0, 0x000000f0, 0x000000d1, 0x000000f1, 0x000000d2, 0x000000f2, 0x000000d3, 0x000000f3, 0x000000d4, 0x000000f4, 0x000000d5, 0x000000f5, 0x000000d6, 0x000000f6, 0x000000d7, 0x000000f7, 0x000000d8, 0x000000f8, 0x000000d9, 0x000000f9, 0x000000da, 0x000000fa, 0x000000db, 0x000000fb, 0x000000dc, 0x000000fc, 0x000000dd, 0x000000fd, 0x000000de, 0x000000fe, 0x000000df, 0x000000ff, 
0x00000100, 0x00000120, 0x00000101, 0x00000121, 0x00000102, 0x00000122, 0x00000103, 0x00000123, 0x00000104, 0x00000124, 0x00000105, 0x00000125, 0x00000106, 0x00000126, 0x00000107, 0x00000127, 0x00000108, 0x00000128, 0x00000109, 0x00000129, 0x0000010a, 0x0000012a, 0x0000010b, 0x0000012b, 0x0000010c, 0x0000012c, 0x0000010d, 0x0000012d, 0x0000010e, 0x0000012e, 0x0000010f, 0x0000012f, 
0x00000110, 0x00000130, 0x00000111, 0x00000131, 0x00000112, 0x00000132, 0x00000113, 0x00000133, 0x00000114, 0x00000134, 0x00000115, 0x00000135, 0x00000116, 0x00000136, 0x00000117, 0x00000137, 0x00000118, 0x00000138, 0x00000119, 0x00000139, 0x0000011a, 0x0000013a, 0x0000011b, 0x0000013b, 0x0000011c, 0x0000013c, 0x0000011d, 0x0000013d, 0x0000011e, 0x0000013e, 0x0000011f, 0x0000013f, 
0x00000140, 0x00000160, 0x00000141, 0x00000161, 0x00000142, 0x00000162, 0x00000143, 0x00000163, 0x00000144, 0x00000164, 0x00000145, 0x00000165, 0x00000146, 0x00000166, 0x00000147, 0x00000167, 0x00000148, 0x00000168, 0x00000149, 0x00000169, 0x0000014a, 0x0000016a, 0x0000014b, 0x0000016b, 0x0000014c, 0x0000016c, 0x0000014d, 0x0000016d, 0x0000014e, 0x0000016e, 0x0000014f, 0x0000016f, 
0x00000150, 0x00000170, 0x00000151, 0x00000171, 0x00000152, 0x00000172, 0x00000153, 0x00000173, 0x00000154, 0x00000174, 0x00000155, 0x00000175, 0x00000156, 0x00000176, 0x00000157, 0x00000177, 0x00000158, 0x00000178, 0x00000159, 0x00000179, 0x0000015a, 0x0000017a, 0x0000015b, 0x0000017b, 0x0000015c, 0x0000017c, 0x0000015d, 0x0000017d, 0x0000015e, 0x0000017e, 0x0000015f, 0x0000017f, 
0x00000180, 0x000001a0, 0x00000181, 0x000001a1, 0x00000182, 0x000001a2, 0x00000183, 0x000001a3, 0x00000184, 0x000001a4, 0x00000185, 0x000001a5, 0x00000186, 0x000001a6, 0x00000187, 0x000001a7, 0x00000188, 0x000001a8, 0x00000189, 0x000001a9, 0x0000018a, 0x000001aa, 0x0000018b, 0x000001ab, 0x0000018c, 0x000001ac, 0x0000018d, 0x000001ad, 0x0000018e, 0x000001ae, 0x0000018f, 0x000001af, 
0x00000190, 0x000001b0, 0x00000191, 0x000001b1, 0x00000192, 0x000001b2, 0x00000193, 0x000001b3, 0x00000194, 0x000001b4, 0x00000195, 0x000001b5, 0x00000196, 0x000001b6, 0x00000197, 0x000001b7, 0x00000198, 0x000001b8, 0x00000199, 0x000001b9, 0x0000019a, 0x000001ba, 0x0000019b, 0x000001bb, 0x0000019c, 0x000001bc, 0x0000019d, 0x000001bd, 0x0000019e, 0x000001be, 0x0000019f, 0x000001bf, 
0x000001c0, 0x000001e0, 0x000001c1, 0x000001e1, 0x000001c2, 0x000001e2, 0x000001c3, 0x000001e3, 0x000001c4, 0x000001e4, 0x000001c5, 0x000001e5, 0x000001c6, 0x000001e6, 0x000001c7, 0x000001e7, 0x000001c8, 0x000001e8, 0x000001c9, 0x000001e9, 0x000001ca, 0x000001ea, 0x000001cb, 0x000001eb, 0x000001cc, 0x000001ec, 0x000001cd, 0x000001ed, 0x000001ce, 0x000001ee, 0x000001cf, 0x000001ef, 
0x000001d0, 0x000001f0, 0x000001d1, 0x000001f1, 0x000001d2, 0x000001f2, 0x000001d3, 0x000001f3, 0x000001d4, 0x000001f4, 0x000001d5, 0x000001f5, 0x000001d6, 0x000001f6, 0x000001d7, 0x000001f7, 0x000001d8, 0x000001f8, 0x000001d9, 0x000001f9, 0x000001da, 0x000001fa, 0x000001db, 0x000001fb, 0x000001dc, 0x000001fc, 0x000001dd, 0x000001fd, 0x000001de, 0x000001fe, 0x000001df, 0x000001ff, 
0x00000200, 0x00000220, 0x00000201, 0x00000221, 0x00000202, 0x00000222, 0x00000203, 0x00000223, 0x00000204, 0x00000224, 0x00000205, 0x00000225, 0x00000206, 0x00000226, 0x00000207, 0x00000227, 0x00000208, 0x00000228, 0x00000209, 0x00000229, 0x0000020a, 0x0000022a, 0x0000020b, 0x0000022b, 0x0000020c, 0x0000022c, 0x0000020d, 0x0000022d, 0x0000020e, 0x0000022e, 0x0000020f, 0x0000022f, 
0x00000210, 0x00000230, 0x00000211, 0x00000231, 0x00000212, 0x00000232, 0x00000213, 0x00000233, 0x00000214, 0x00000234, 0x00000215, 0x00000235, 0x00000216, 0x00000236, 0x00000217, 0x00000237, 0x00000218, 0x00000238, 0x00000219, 0x00000239, 0x0000021a, 0x0000023a, 0x0000021b, 0x0000023b, 0x0000021c, 0x0000023c, 0x0000021d, 0x0000023d, 0x0000021e, 0x0000023e, 0x0000021f, 0x0000023f, 
0x00000240, 0x00000260, 0x00000241, 0x00000261, 0x00000242, 0x00000262, 0x00000243, 0x00000263, 0x00000244, 0x00000264, 0x00000245, 0x00000265, 0x00000246, 0x00000266, 0x00000247, 0x00000267, 0x00000248, 0x00000268, 0x00000249, 0x00000269, 0x0000024a, 0x0000026a, 0x0000024b, 0x0000026b, 0x0000024c, 0x0000026c, 0x0000024d, 0x0000026d, 0x0000024e, 0x0000026e, 0x0000024f, 0x0000026f, 
0x00000250, 0x00000270, 0x00000251, 0x00000271, 0x00000252, 0x00000272, 0x00000253, 0x00000273, 0x00000254, 0x00000274, 0x00000255, 0x00000275, 0x00000256, 0x00000276, 0x00000257, 0x00000277, 0x00000258, 0x00000278, 0x00000259, 0x00000279, 0x0000025a, 0x0000027a, 0x0000025b, 0x0000027b, 0x0000025c, 0x0000027c, 0x0000025d, 0x0000027d, 0x0000025e, 0x0000027e, 0x0000025f, 0x0000027f, 
0x00000280, 0x000002a0, 0x00000281, 0x000002a1, 0x00000282, 0x000002a2, 0x00000283, 0x000002a3, 0x00000284, 0x000002a4, 0x00000285, 0x000002a5, 0x00000286, 0x000002a6, 0x00000287, 0x000002a7, 0x00000288, 0x000002a8, 0x00000289, 0x000002a9, 0x0000028a, 0x000002aa, 0x0000028b, 0x000002ab, 0x0000028c, 0x000002ac, 0x0000028d, 0x000002ad, 0x0000028e, 0x000002ae, 0x0000028f, 0x000002af, 
0x00000290, 0x000002b0, 0x00000291, 0x000002b1, 0x00000292, 0x000002b2, 0x00000293, 0x000002b3, 0x00000294, 0x000002b4, 0x00000295, 0x000002b5, 0x00000296, 0x000002b6, 0x00000297, 0x000002b7, 0x00000298, 0x000002b8, 0x00000299, 0x000002b9, 0x0000029a, 0x000002ba, 0x0000029b, 0x000002bb, 0x0000029c, 0x000002bc, 0x0000029d, 0x000002bd, 0x0000029e, 0x000002be, 0x0000029f, 0x000002bf, 
0x000002c0, 0x000002e0, 0x000002c1, 0x000002e1, 0x000002c2, 0x000002e2, 0x000002c3, 0x000002e3, 0x000002c4, 0x000002e4, 0x000002c5, 0x000002e5, 0x000002c6, 0x000002e6, 0x000002c7, 0x000002e7, 0x000002c8, 0x000002e8, 0x000002c9, 0x000002e9, 0x000002ca, 0x000002ea, 0x000002cb, 0x000002eb, 0x000002cc, 0x000002ec, 0x000002cd, 0x000002ed, 0x000002ce, 0x000002ee, 0x000002cf, 0x000002ef, 
0x000002d0, 0x000002f0, 0x000002d1, 0x000002f1, 0x000002d2, 0x000002f2, 0x000002d3, 0x000002f3, 0x000002d4, 0x000002f4, 0x000002d5, 0x000002f5, 0x000002d6, 0x000002f6, 0x000002d7, 0x000002f7, 0x000002d8, 0x000002f8, 0x000002d9, 0x000002f9, 0x000002da, 0x000002fa, 0x000002db, 0x000002fb, 0x000002dc, 0x000002fc, 0x000002dd, 0x000002fd, 0x000002de, 0x000002fe, 0x000002df, 0x000002ff, 
0x00000300, 0x00000320, 0x00000301, 0x00000321, 0x00000302, 0x00000322, 0x00000303, 0x00000323, 0x00000304, 0x00000324, 0x00000305, 0x00000325, 0x00000306, 0x00000326, 0x00000307, 0x00000327, 0x00000308, 0x00000328, 0x00000309, 0x00000329, 0x0000030a, 0x0000032a, 0x0000030b, 0x0000032b, 0x0000030c, 0x0000032c, 0x0000030d, 0x0000032d, 0x0000030e, 0x0000032e, 0x0000030f, 0x0000032f, 
0x00000310, 0x00000330, 0x00000311, 0x00000331, 0x00000312, 0x00000332, 0x00000313, 0x00000333, 0x00000314, 0x00000334, 0x00000315, 0x00000335, 0x00000316, 0x00000336, 0x00000317, 0x00000337, 0x00000318, 0x00000338, 0x00000319, 0x00000339, 0x0000031a, 0x0000033a, 0x0000031b, 0x0000033b, 0x0000031c, 0x0000033c, 0x0000031d, 0x0000033d, 0x0000031e, 0x0000033e, 0x0000031f, 0x0000033f, 
0x00000340, 0x00000360, 0x00000341, 0x00000361, 0x00000342, 0x00000362, 0x00000343, 0x00000363, 0x00000344, 0x00000364, 0x00000345, 0x00000365, 0x00000346, 0x00000366, 0x00000347, 0x00000367, 0x00000348, 0x00000368, 0x00000349, 0x00000369, 0x0000034a, 0x0000036a, 0x0000034b, 0x0000036b, 0x0000034c, 0x0000036c, 0x0000034d, 0x0000036d, 0x0000034e, 0x0000036e, 0x0000034f, 0x0000036f, 
0x00000350, 0x00000370, 0x00000351, 0x00000371, 0x00000352, 0x00000372, 0x00000353, 0x00000373, 0x00000354, 0x00000374, 0x00000355, 0x00000375, 0x00000356, 0x00000376, 0x00000357, 0x00000377, 0x00000358, 0x00000378, 0x00000359, 0x00000379, 0x0000035a, 0x0000037a, 0x0000035b, 0x0000037b, 0x0000035c, 0x0000037c, 0x0000035d, 0x0000037d, 0x0000035e, 0x0000037e, 0x0000035f, 0x0000037f, 
0x00000380, 0x000003a0, 0x00000381, 0x000003a1, 0x00000382, 0x000003a2, 0x00000383, 0x000003a3, 0x00000384, 0x000003a4, 0x00000385, 0x000003a5, 0x00000386, 0x000003a6, 0x00000387, 0x000003a7, 0x00000388, 0x000003a8, 0x00000389, 0x000003a9, 0x0000038a, 0x000003aa, 0x0000038b, 0x000003ab, 0x0000038c, 0x000003ac, 0x0000038d, 0x000003ad, 0x0000038e, 0x000003ae, 0x0000038f, 0x000003af, 
0x00000390, 0x000003b0, 0x00000391, 0x000003b1, 0x00000392, 0x000003b2, 0x00000393, 0x000003b3, 0x00000394, 0x000003b4, 0x00000395, 0x000003b5, 0x00000396, 0x000003b6, 0x00000397, 0x000003b7, 0x00000398, 0x000003b8, 0x00000399, 0x000003b9, 0x0000039a, 0x000003ba, 0x0000039b, 0x000003bb, 0x0000039c, 0x000003bc, 0x0000039d, 0x000003bd, 0x0000039e, 0x000003be, 0x0000039f, 0x000003bf, 
0x000003c0, 0x000003e0, 0x000003c1, 0x000003e1, 0x000003c2, 0x000003e2, 0x000003c3, 0x000003e3, 0x000003c4, 0x000003e4, 0x000003c5, 0x000003e5, 0x000003c6, 0x000003e6, 0x000003c7, 0x000003e7, 0x000003c8, 0x000003e8, 0x000003c9, 0x000003e9, 0x000003ca, 0x000003ea, 0x000003cb, 0x000003eb, 0x000003cc, 0x000003ec, 0x000003cd, 0x000003ed, 0x000003ce, 0x000003ee, 0x000003cf, 0x000003ef, 
0x000003d0, 0x000003f0, 0x000003d1, 0x000003f1, 0x000003d2, 0x000003f2, 0x000003d3, 0x000003f3, 0x000003d4, 0x000003f4, 0x000003d5, 0x000003f5, 0x000003d6, 0x000003f6, 0x000003d7, 0x000003f7, 0x000003d8, 0x000003f8, 0x000003d9, 0x000003f9, 0x000003da, 0x000003fa, 0x000003db, 0x000003fb, 0x000003dc, 0x000003fc, 0x000003dd, 0x000003fd, 0x000003de, 0x000003fe, 0x000003df, 0x000003ff,

Running with TTI_SFPMOV(0, 9, p_sfpu::LREG0, 8); present, the 32x32 output tile is instead:

0xe85e456e, 0xf42f22b7, 0xa17915b8, 0xd0bc8adc, 0x85e456e2, 0x42f22b71, 0x17915b89, 0x0bc8adc4, 0x5e456e26, 0x2f22b713, 0x7915b898, 0xbc8adc4c, 0xe456e262, 0xf22b7131, 0x915b8989, 0xc8adc4c4, 0x456e2624, 0x22b71312, 0x15b89891, 0x8adc4c48, 0x56e26247, 0x2b713123, 0x5b89891f, 0xadc4c48f, 0x6e26247e, 0xb713123f, 0xb89891f9, 0xdc4c48fc, 0xe26247e5, 0x713123f2, 0x89891f95, 0xc4c48fca, 
0x26247e57, 0x13123f2b, 0x9891f95c, 0x4c48fcae, 0x6247e572, 0x3123f2b9, 0x891f95c9, 0xc48fcae4, 0x247e5724, 0x123f2b92, 0x91f95c91, 0x48fcae48, 0x47e57244, 0x23f2b922, 0x1f95c910, 0x8fcae488, 0x7e572441, 0x3f2b9220, 0xf95c9105, 0xfcae4882, 0xe5724417, 0xf2b9220b, 0x95c9105e, 0xcae4882f, 0x57244178, 0x2b9220bc, 0x5c9105e0, 0xae4882f0, 0x72441783, 0xb9220bc1, 0xc9105e0e, 0xe4882f07, 
0xfa17915b, 0x7d0bc8ad, 0xe85e456e, 0xf42f22b7, 0xa17915b8, 0xd0bc8adc, 0x85e456e2, 0x42f22b71, 0x17915b89, 0x0bc8adc4, 0x5e456e26, 0x2f22b713, 0x7915b898, 0xbc8adc4c, 0xe456e262, 0xf22b7131, 0x915b8989, 0xc8adc4c4, 0x456e2624, 0x22b71312, 0x15b89891, 0x8adc4c48, 0x56e26247, 0x2b713123, 0x5b89891f, 0xadc4c48f, 0x6e26247e, 0xb713123f, 0xb89891f9, 0xdc4c48fc, 0xe26247e5, 0x713123f2, 
0x89891f95, 0xc4c48fca, 0x26247e57, 0x13123f2b, 0x9891f95c, 0x4c48fcae, 0x6247e572, 0x3123f2b9, 0x891f95c9, 0xc48fcae4, 0x247e5724, 0x123f2b92, 0x91f95c91, 0x48fcae48, 0x47e57244, 0x23f2b922, 0x1f95c910, 0x8fcae488, 0x7e572441, 0x3f2b9220, 0xf95c9105, 0xfcae4882, 0xe5724417, 0xf2b9220b, 0x95c9105e, 0xcae4882f, 0x57244178, 0x2b9220bc, 0x5c9105e0, 0xae4882f0, 0x72441783, 0xb9220bc1, 
0x3e85e456, 0x1f42f22b, 0xfa17915b, 0x7d0bc8ad, 0xe85e456e, 0xf42f22b7, 0xa17915b8, 0xd0bc8adc, 0x85e456e2, 0x42f22b71, 0x17915b89, 0x0bc8adc4, 0x5e456e26, 0x2f22b713, 0x7915b898, 0xbc8adc4c, 0xe456e262, 0xf22b7131, 0x915b8989, 0xc8adc4c4, 0x456e2624, 0x22b71312, 0x15b89891, 0x8adc4c48, 0x56e26247, 0x2b713123, 0x5b89891f, 0xadc4c48f, 0x6e26247e, 0xb713123f, 0xb89891f9, 0xdc4c48fc, 
0xe26247e5, 0x713123f2, 0x89891f95, 0xc4c48fca, 0x26247e57, 0x13123f2b, 0x9891f95c, 0x4c48fcae, 0x6247e572, 0x3123f2b9, 0x891f95c9, 0xc48fcae4, 0x247e5724, 0x123f2b92, 0x91f95c91, 0x48fcae48, 0x47e57244, 0x23f2b922, 0x1f95c910, 0x8fcae488, 0x7e572441, 0x3f2b9220, 0xf95c9105, 0xfcae4882, 0xe5724417, 0xf2b9220b, 0x95c9105e, 0xcae4882f, 0x57244178, 0x2b9220bc, 0x5c9105e0, 0xae4882f0, 
0x8fa17915, 0x47d0bc8a, 0x3e85e456, 0x1f42f22b, 0xfa17915b, 0x7d0bc8ad, 0xe85e456e, 0xf42f22b7, 0xa17915b8, 0xd0bc8adc, 0x85e456e2, 0x42f22b71, 0x17915b89, 0x0bc8adc4, 0x5e456e26, 0x2f22b713, 0x7915b898, 0xbc8adc4c, 0xe456e262, 0xf22b7131, 0x915b8989, 0xc8adc4c4, 0x456e2624, 0x22b71312, 0x15b89891, 0x8adc4c48, 0x56e26247, 0x2b713123, 0x5b89891f, 0xadc4c48f, 0x6e26247e, 0xb713123f, 
0xb89891f9, 0xdc4c48fc, 0xe26247e5, 0x713123f2, 0x89891f95, 0xc4c48fca, 0x26247e57, 0x13123f2b, 0x9891f95c, 0x4c48fcae, 0x6247e572, 0x3123f2b9, 0x891f95c9, 0xc48fcae4, 0x247e5724, 0x123f2b92, 0x91f95c91, 0x48fcae48, 0x47e57244, 0x23f2b922, 0x1f95c910, 0x8fcae488, 0x7e572441, 0x3f2b9220, 0xf95c9105, 0xfcae4882, 0xe5724417, 0xf2b9220b, 0x95c9105e, 0xcae4882f, 0x57244178, 0x2b9220bc, 
0x23e85e45, 0x91f42f22, 0x8fa17915, 0x47d0bc8a, 0x3e85e456, 0x1f42f22b, 0xfa17915b, 0x7d0bc8ad, 0xe85e456e, 0xf42f22b7, 0xa17915b8, 0xd0bc8adc, 0x85e456e2, 0x42f22b71, 0x17915b89, 0x0bc8adc4, 0x5e456e26, 0x2f22b713, 0x7915b898, 0xbc8adc4c, 0xe456e262, 0xf22b7131, 0x915b8989, 0xc8adc4c4, 0x456e2624, 0x22b71312, 0x15b89891, 0x8adc4c48, 0x56e26247, 0x2b713123, 0x5b89891f, 0xadc4c48f, 
0x6e26247e, 0xb713123f, 0xb89891f9, 0xdc4c48fc, 0xe26247e5, 0x713123f2, 0x89891f95, 0xc4c48fca, 0x26247e57, 0x13123f2b, 0x9891f95c, 0x4c48fcae, 0x6247e572, 0x3123f2b9, 0x891f95c9, 0xc48fcae4, 0x247e5724, 0x123f2b92, 0x91f95c91, 0x48fcae48, 0x47e57244, 0x23f2b922, 0x1f95c910, 0x8fcae488, 0x7e572441, 0x3f2b9220, 0xf95c9105, 0xfcae4882, 0xe5724417, 0xf2b9220b, 0x95c9105e, 0xcae4882f, 
0x48fa1791, 0xa47d0bc8, 0x23e85e45, 0x91f42f22, 0x8fa17915, 0x47d0bc8a, 0x3e85e456, 0x1f42f22b, 0xfa17915b, 0x7d0bc8ad, 0xe85e456e, 0xf42f22b7, 0xa17915b8, 0xd0bc8adc, 0x85e456e2, 0x42f22b71, 0x17915b89, 0x0bc8adc4, 0x5e456e26, 0x2f22b713, 0x7915b898, 0xbc8adc4c, 0xe456e262, 0xf22b7131, 0x915b8989, 0xc8adc4c4, 0x456e2624, 0x22b71312, 0x15b89891, 0x8adc4c48, 0x56e26247, 0x2b713123, 
0x5b89891f, 0xadc4c48f, 0x6e26247e, 0xb713123f, 0xb89891f9, 0xdc4c48fc, 0xe26247e5, 0x713123f2, 0x89891f95, 0xc4c48fca, 0x26247e57, 0x13123f2b, 0x9891f95c, 0x4c48fcae, 0x6247e572, 0x3123f2b9, 0x891f95c9, 0xc48fcae4, 0x247e5724, 0x123f2b92, 0x91f95c91, 0x48fcae48, 0x47e57244, 0x23f2b922, 0x1f95c910, 0x8fcae488, 0x7e572441, 0x3f2b9220, 0xf95c9105, 0xfcae4882, 0xe5724417, 0xf2b9220b, 
0xd23e85e4, 0xe91f42f2, 0x48fa1791, 0xa47d0bc8, 0x23e85e45, 0x91f42f22, 0x8fa17915, 0x47d0bc8a, 0x3e85e456, 0x1f42f22b, 0xfa17915b, 0x7d0bc8ad, 0xe85e456e, 0xf42f22b7, 0xa17915b8, 0xd0bc8adc, 0x85e456e2, 0x42f22b71, 0x17915b89, 0x0bc8adc4, 0x5e456e26, 0x2f22b713, 0x7915b898, 0xbc8adc4c, 0xe456e262, 0xf22b7131, 0x915b8989, 0xc8adc4c4, 0x456e2624, 0x22b71312, 0x15b89891, 0x8adc4c48, 
0x56e26247, 0x2b713123, 0x5b89891f, 0xadc4c48f, 0x6e26247e, 0xb713123f, 0xb89891f9, 0xdc4c48fc, 0xe26247e5, 0x713123f2, 0x89891f95, 0xc4c48fca, 0x26247e57, 0x13123f2b, 0x9891f95c, 0x4c48fcae, 0x6247e572, 0x3123f2b9, 0x891f95c9, 0xc48fcae4, 0x247e5724, 0x123f2b92, 0x91f95c91, 0x48fcae48, 0x47e57244, 0x23f2b922, 0x1f95c910, 0x8fcae488, 0x7e572441, 0x3f2b9220, 0xf95c9105, 0xfcae4882, 
0xf48fa179, 0xfa47d0bc, 0xd23e85e4, 0xe91f42f2, 0x48fa1791, 0xa47d0bc8, 0x23e85e45, 0x91f42f22, 0x8fa17915, 0x47d0bc8a, 0x3e85e456, 0x1f42f22b, 0xfa17915b, 0x7d0bc8ad, 0xe85e456e, 0xf42f22b7, 0xa17915b8, 0xd0bc8adc, 0x85e456e2, 0x42f22b71, 0x17915b89, 0x0bc8adc4, 0x5e456e26, 0x2f22b713, 0x7915b898, 0xbc8adc4c, 0xe456e262, 0xf22b7131, 0x915b8989, 0xc8adc4c4, 0x456e2624, 0x22b71312, 
0x15b89891, 0x8adc4c48, 0x56e26247, 0x2b713123, 0x5b89891f, 0xadc4c48f, 0x6e26247e, 0xb713123f, 0xb89891f9, 0xdc4c48fc, 0xe26247e5, 0x713123f2, 0x89891f95, 0xc4c48fca, 0x26247e57, 0x13123f2b, 0x9891f95c, 0x4c48fcae, 0x6247e572, 0x3123f2b9, 0x891f95c9, 0xc48fcae4, 0x247e5724, 0x123f2b92, 0x91f95c91, 0x48fcae48, 0x47e57244, 0x23f2b922, 0x1f95c910, 0x8fcae488, 0x7e572441, 0x3f2b9220, 
0x7d23e85e, 0xbe91f42f, 0xf48fa179, 0xfa47d0bc, 0xd23e85e4, 0xe91f42f2, 0x48fa1791, 0xa47d0bc8, 0x23e85e45, 0x91f42f22, 0x8fa17915, 0x47d0bc8a, 0x3e85e456, 0x1f42f22b, 0xfa17915b, 0x7d0bc8ad, 0xe85e456e, 0xf42f22b7, 0xa17915b8, 0xd0bc8adc, 0x85e456e2, 0x42f22b71, 0x17915b89, 0x0bc8adc4, 0x5e456e26, 0x2f22b713, 0x7915b898, 0xbc8adc4c, 0xe456e262, 0xf22b7131, 0x915b8989, 0xc8adc4c4, 
0x456e2624, 0x22b71312, 0x15b89891, 0x8adc4c48, 0x56e26247, 0x2b713123, 0x5b89891f, 0xadc4c48f, 0x6e26247e, 0xb713123f, 0xb89891f9, 0xdc4c48fc, 0xe26247e5, 0x713123f2, 0x89891f95, 0xc4c48fca, 0x26247e57, 0x13123f2b, 0x9891f95c, 0x4c48fcae, 0x6247e572, 0x3123f2b9, 0x891f95c9, 0xc48fcae4, 0x247e5724, 0x123f2b92, 0x91f95c91, 0x48fcae48, 0x47e57244, 0x23f2b922, 0x1f95c910, 0x8fcae488, 
0x5f48fa17, 0xafa47d0b, 0x7d23e85e, 0xbe91f42f, 0xf48fa179, 0xfa47d0bc, 0xd23e85e4, 0xe91f42f2, 0x48fa1791, 0xa47d0bc8, 0x23e85e45, 0x91f42f22, 0x8fa17915, 0x47d0bc8a, 0x3e85e456, 0x1f42f22b, 0xfa17915b, 0x7d0bc8ad, 0xe85e456e, 0xf42f22b7, 0xa17915b8, 0xd0bc8adc, 0x85e456e2, 0x42f22b71, 0x17915b89, 0x0bc8adc4, 0x5e456e26, 0x2f22b713, 0x7915b898, 0xbc8adc4c, 0xe456e262, 0xf22b7131, 
0x915b8989, 0xc8adc4c4, 0x456e2624, 0x22b71312, 0x15b89891, 0x8adc4c48, 0x56e26247, 0x2b713123, 0x5b89891f, 0xadc4c48f, 0x6e26247e, 0xb713123f, 0xb89891f9, 0xdc4c48fc, 0xe26247e5, 0x713123f2, 0x89891f95, 0xc4c48fca, 0x26247e57, 0x13123f2b, 0x9891f95c, 0x4c48fcae, 0x6247e572, 0x3123f2b9, 0x891f95c9, 0xc48fcae4, 0x247e5724, 0x123f2b92, 0x91f95c91, 0x48fcae48, 0x47e57244, 0x23f2b922, 
0xd7d23e85, 0xebe91f42, 0x5f48fa17, 0xafa47d0b, 0x7d23e85e, 0xbe91f42f, 0xf48fa179, 0xfa47d0bc, 0xd23e85e4, 0xe91f42f2, 0x48fa1791, 0xa47d0bc8, 0x23e85e45, 0x91f42f22, 0x8fa17915, 0x47d0bc8a, 0x3e85e456, 0x1f42f22b, 0xfa17915b, 0x7d0bc8ad, 0xe85e456e, 0xf42f22b7, 0xa17915b8, 0xd0bc8adc, 0x85e456e2, 0x42f22b71, 0x17915b89, 0x0bc8adc4, 0x5e456e26, 0x2f22b713, 0x7915b898, 0xbc8adc4c, 
0xe456e262, 0xf22b7131, 0x915b8989, 0xc8adc4c4, 0x456e2624, 0x22b71312, 0x15b89891, 0x8adc4c48, 0x56e26247, 0x2b713123, 0x5b89891f, 0xadc4c48f, 0x6e26247e, 0xb713123f, 0xb89891f9, 0xdc4c48fc, 0xe26247e5, 0x713123f2, 0x89891f95, 0xc4c48fca, 0x26247e57, 0x13123f2b, 0x9891f95c, 0x4c48fcae, 0x6247e572, 0x3123f2b9, 0x891f95c9, 0xc48fcae4, 0x247e5724, 0x123f2b92, 0x91f95c91, 0x48fcae48, 
0x75f48fa1, 0xbafa47d0, 0xd7d23e85, 0xebe91f42, 0x5f48fa17, 0xafa47d0b, 0x7d23e85e, 0xbe91f42f, 0xf48fa179, 0xfa47d0bc, 0xd23e85e4, 0xe91f42f2, 0x48fa1791, 0xa47d0bc8, 0x23e85e45, 0x91f42f22, 0x8fa17915, 0x47d0bc8a, 0x3e85e456, 0x1f42f22b, 0xfa17915b, 0x7d0bc8ad, 0xe85e456e, 0xf42f22b7, 0xa17915b8, 0xd0bc8adc, 0x85e456e2, 0x42f22b71, 0x17915b89, 0x0bc8adc4, 0x5e456e26, 0x2f22b713, 
0x7915b898, 0xbc8adc4c, 0xe456e262, 0xf22b7131, 0x915b8989, 0xc8adc4c4, 0x456e2624, 0x22b71312, 0x15b89891, 0x8adc4c48, 0x56e26247, 0x2b713123, 0x5b89891f, 0xadc4c48f, 0x6e26247e, 0xb713123f, 0xb89891f9, 0xdc4c48fc, 0xe26247e5, 0x713123f2, 0x89891f95, 0xc4c48fca, 0x26247e57, 0x13123f2b, 0x9891f95c, 0x4c48fcae, 0x6247e572, 0x3123f2b9, 0x891f95c9, 0xc48fcae4, 0x247e5724, 0x123f2b92, 
0xdd7d23e8, 0xeebe91f4, 0x75f48fa1, 0xbafa47d0, 0xd7d23e85, 0xebe91f42, 0x5f48fa17, 0xafa47d0b, 0x7d23e85e, 0xbe91f42f, 0xf48fa179, 0xfa47d0bc, 0xd23e85e4, 0xe91f42f2, 0x48fa1791, 0xa47d0bc8, 0x23e85e45, 0x91f42f22, 0x8fa17915, 0x47d0bc8a, 0x3e85e456, 0x1f42f22b, 0xfa17915b, 0x7d0bc8ad, 0xe85e456e, 0xf42f22b7, 0xa17915b8, 0xd0bc8adc, 0x85e456e2, 0x42f22b71, 0x17915b89, 0x0bc8adc4, 
0x5e456e26, 0x2f22b713, 0x7915b898, 0xbc8adc4c, 0xe456e262, 0xf22b7131, 0x915b8989, 0xc8adc4c4, 0x456e2624, 0x22b71312, 0x15b89891, 0x8adc4c48, 0x56e26247, 0x2b713123, 0x5b89891f, 0xadc4c48f, 0x6e26247e, 0xb713123f, 0xb89891f9, 0xdc4c48fc, 0xe26247e5, 0x713123f2, 0x89891f95, 0xc4c48fca, 0x26247e57, 0x13123f2b, 0x9891f95c, 0x4c48fcae, 0x6247e572, 0x3123f2b9, 0x891f95c9, 0xc48fcae4, 
0xf75f48fa, 0xfbafa47d, 0xdd7d23e8, 0xeebe91f4, 0x75f48fa1, 0xbafa47d0, 0xd7d23e85, 0xebe91f42, 0x5f48fa17, 0xafa47d0b, 0x7d23e85e, 0xbe91f42f, 0xf48fa179, 0xfa47d0bc, 0xd23e85e4, 0xe91f42f2, 0x48fa1791, 0xa47d0bc8, 0x23e85e45, 0x91f42f22, 0x8fa17915, 0x47d0bc8a, 0x3e85e456, 0x1f42f22b, 0xfa17915b, 0x7d0bc8ad, 0xe85e456e, 0xf42f22b7, 0xa17915b8, 0xd0bc8adc, 0x85e456e2, 0x42f22b71, 
0x17915b89, 0x0bc8adc4, 0x5e456e26, 0x2f22b713, 0x7915b898, 0xbc8adc4c, 0xe456e262, 0xf22b7131, 0x915b8989, 0xc8adc4c4, 0x456e2624, 0x22b71312, 0x15b89891, 0x8adc4c48, 0x56e26247, 0x2b713123, 0x5b89891f, 0xadc4c48f, 0x6e26247e, 0xb713123f, 0xb89891f9, 0xdc4c48fc, 0xe26247e5, 0x713123f2, 0x89891f95, 0xc4c48fca, 0x26247e57, 0x13123f2b, 0x9891f95c, 0x4c48fcae, 0x6247e572, 0x3123f2b9, 
0x7dd7d23e, 0x3eebe91f, 0xf75f48fa, 0xfbafa47d, 0xdd7d23e8, 0xeebe91f4, 0x75f48fa1, 0xbafa47d0, 0xd7d23e85, 0xebe91f42, 0x5f48fa17, 0xafa47d0b, 0x7d23e85e, 0xbe91f42f, 0xf48fa179, 0xfa47d0bc, 0xd23e85e4, 0xe91f42f2, 0x48fa1791, 0xa47d0bc8, 0x23e85e45, 0x91f42f22, 0x8fa17915, 0x47d0bc8a, 0x3e85e456, 0x1f42f22b, 0xfa17915b, 0x7d0bc8ad, 0xe85e456e, 0xf42f22b7, 0xa17915b8, 0xd0bc8adc, 
0x85e456e2, 0x42f22b71, 0x17915b89, 0x0bc8adc4, 0x5e456e26, 0x2f22b713, 0x7915b898, 0xbc8adc4c, 0xe456e262, 0xf22b7131, 0x915b8989, 0xc8adc4c4, 0x456e2624, 0x22b71312, 0x15b89891, 0x8adc4c48, 0x56e26247, 0x2b713123, 0x5b89891f, 0xadc4c48f, 0x6e26247e, 0xb713123f, 0xb89891f9, 0xdc4c48fc, 0xe26247e5, 0x713123f2, 0x89891f95, 0xc4c48fca, 0x26247e57, 0x13123f2b, 0x9891f95c, 0x4c48fcae, 
0x1f75f48f, 0x0fbafa47, 0x7dd7d23e, 0x3eebe91f, 0xf75f48fa, 0xfbafa47d, 0xdd7d23e8, 0xeebe91f4, 0x75f48fa1, 0xbafa47d0, 0xd7d23e85, 0xebe91f42, 0x5f48fa17, 0xafa47d0b, 0x7d23e85e, 0xbe91f42f, 0xf48fa179, 0xfa47d0bc, 0xd23e85e4, 0xe91f42f2, 0x48fa1791, 0xa47d0bc8, 0x23e85e45, 0x91f42f22, 0x8fa17915, 0x47d0bc8a, 0x3e85e456, 0x1f42f22b, 0xfa17915b, 0x7d0bc8ad, 0xe85e456e, 0xf42f22b7, 
0xa17915b8, 0xd0bc8adc, 0x85e456e2, 0x42f22b71, 0x17915b89, 0x0bc8adc4, 0x5e456e26, 0x2f22b713, 0x7915b898, 0xbc8adc4c, 0xe456e262, 0xf22b7131, 0x915b8989, 0xc8adc4c4, 0x456e2624, 0x22b71312, 0x15b89891, 0x8adc4c48, 0x56e26247, 0x2b713123, 0x5b89891f, 0xadc4c48f, 0x6e26247e, 0xb713123f, 0xb89891f9, 0xdc4c48fc, 0xe26247e5, 0x713123f2, 0x89891f95, 0xc4c48fca, 0x26247e57, 0x13123f2b,

On the surface, this looks plenty random, but problems appear as we dig in. The value in the top-left of the tile is 0xe85e456e, but this value occurs 16 times throughout the 32x32 tile. The value to its right is 0xf42f22b7, which again occurs 16 times throughout the 32x32 tile. If we were hoping for 1024 distinct random numbers across the 32x32 tile, then we're out of luck, as it is looking more like 64 distinct random numbers in a repeating pattern. It gets even worse though; if we pair up the two 32x32 outputs, and render the random values in binary rather than hex, each row's random value is mostly the value from the previous row shifted left by two:

Counter Random (hex) Random (binary)
0x00000000 0xe85e456e 0b11101000010111100100010101101110
0x00000001 0xa17915b8 0b10100001011110010001010110111000
0x00000002 0x85e456e2 0b10000101111001000101011011100010
0x00000003 0x17915b89 0b00010111100100010101101110001001
0x00000004 0x5e456e26 0b01011110010001010110111000100110
0x00000005 0x7915b898 0b01111001000101011011100010011000
0x00000006 0xe456e262 0b11100100010101101110001001100010
0x00000007 0x915b8989 0b10010001010110111000100110001001
0x00000008 0x456e2624 0b01000101011011100010011000100100
0x00000009 0x15b89891 0b00010101101110001001100010010001
0x0000000a 0x56e26247 0b01010110111000100110001001000111
0x0000000b 0x5b89891f 0b01011011100010011000100100011111
0x0000000c 0x6e26247e 0b01101110001001100010010001111110
0x0000000d 0xb89891f9 0b10111000100110001001000111111001
0x0000000e 0xe26247e5 0b11100010011000100100011111100101
0x0000000f 0x89891f95 0b10001001100010010001111110010101
0x00000010 0x26247e57 0b00100110001001000111111001010111
0x00000011 0x9891f95c 0b10011000100100011111100101011100
0x00000012 0x6247e572 0b01100010010001111110010101110010
0x00000013 0x891f95c9 0b10001001000111111001010111001001
0x00000014 0x247e5724 0b00100100011111100101011100100100
0x00000015 0x91f95c91 0b10010001111110010101110010010001
0x00000016 0x47e57244 0b01000111111001010111001001000100
0x00000017 0x1f95c910 0b00011111100101011100100100010000
0x00000018 0x7e572441 0b01111110010101110010010001000001
0x00000019 0xf95c9105 0b11111001010111001001000100000101
0x0000001a 0xe5724417 0b11100101011100100100010000010111
0x0000001b 0x95c9105e 0b10010101110010010001000001011110
0x0000001c 0x57244178 0b01010111001001000100000101111000
0x0000001d 0x5c9105e0 0b01011100100100010000010111100000
0x0000001e 0x72441783 0b01110010010001000001011110000011
0x0000001f 0xc9105e0e 0b11001001000100000101111000001110

If we instead look at how the random values evolve as counter increases in steps of 32, we again see mostly a shift right by one each time:

Counter Random (hex) Random (binary)
0x00000000 0xe85e456e 0b11101000010111100100010101101110
0x00000020 0xf42f22b7 0b11110100001011110010001010110111
0x00000040 0xfa17915b 0b11111010000101111001000101011011
0x00000060 0x7d0bc8ad 0b01111101000010111100100010101101
0x00000080 0x3e85e456 0b00111110100001011110010001010110
0x000000a0 0x1f42f22b 0b00011111010000101111001000101011
0x000000c0 0x8fa17915 0b10001111101000010111100100010101

Based on this, I'd guess that:

  1. The hardware PRNG is an LFSR with 32 bits of state.
  2. Stepping the PRNG shifts it by one bit, and injects a new bit at the top based on the xor or xnor of the taps.
  3. Seeding the PRNG writes the seed to the PRNG state. Something present in the Metalium example (but not my more bespoke setup...) seems to step the PRNG in each lane by 64 - lane_id * 2 as part of seeding.

The result being that instead of 1024 independent random values, we've got 32 cursors into the same random sequence, with those cursors being so close together as to visit the same point in the sequence multiple times, meaning we end up with lots of correlation between the obtained values.

@amahmudTT
Copy link
Contributor Author

amahmudTT commented Oct 16, 2024

My exact setup is somewhat bespoke, so I've tried to reproduce what I'm seeing using a minimally modified tt_metal

Hello Peter,

Sorry for the delay in answering and thank you very much for pointing out this issue. We have created an issue tenstorrent/tt-metal#13904 to to track this problem and will try to fix it. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants