-
Notifications
You must be signed in to change notification settings - Fork 26
/
Copy pathinstructions.json
3153 lines (3153 loc) · 820 KB
/
instructions.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
[
{
"Name": "AAA",
"Alias": [],
"Brief": "ASCII Adjust After Addition",
"Description": "\nAdjusts the sum of two unpacked BCD values to create an unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAA instruction is only useful when it follows an ADD instruction that adds (binary addition) two unpacked BCD values and stores a byte result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result.\nIf the addition produces a decimal carry, the AH register increments by 1, and the CF and AF flags are set. If there was no decimal carry, the CF and AF flags are cleared and the AH register is unchanged. In either case, bits 4 through 7 of the AL register are set to 0.\nThis instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.\n"
},
{
"Name": "AAD",
"Alias": [],
"Brief": "ASCII Adjust AX Before Division",
"Description": "\nAdjusts two unpacked BCD digits (the least-significant digit in the AL register and the most-significant digit in the AH register) so that a division operation performed on the result will yield a correct unpacked BCD value. The AAD instruction is only useful when it precedes a DIV instruction that divides (binary division) the adjusted value in the AX register by an unpacked BCD value.\nThe AAD instruction sets the value in the AL register to (AL + (10 * AH)), and then clears the AH register to 00H. The value in the AX register is then equal to the binary equivalent of the original unpacked two-digit (base 10) number in registers AH and AL.\nThe generalized version of this instruction allows adjustment of two unpacked digits of any number base (see the “Operation” section below), by setting the imm8 byte to the selected number base (for example, 08H for octal, 0AH for decimal, or 0CH for base 12 numbers). The AAD mnemonic is interpreted by all assemblers to mean adjust ASCII (base 10) values. To adjust values in another number base, the instruction must be hand coded in machine code (D5 imm8).\nThis instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.\n"
},
{
"Name": "AAM",
"Alias": [],
"Brief": "ASCII Adjust AX After Multiply",
"Description": "\nAdjusts the result of the multiplication of two unpacked BCD values to create a pair of unpacked (base 10) BCD values. The AX register is the implied source and destination operand for this instruction. The AAM instruction is only useful when it follows an MUL instruction that multiplies (binary multiplication) two unpacked BCD values and stores a word result in the AX register. The AAM instruction then adjusts the contents of the AX register to contain the correct 2-digit unpacked (base 10) BCD result.\nThe generalized version of this instruction allows adjustment of the contents of the AX to create two unpacked digits of any number base (see the “Operation” section below). Here, the imm8 byte is set to the selected number base (for example, 08H for octal, 0AH for decimal, or 0CH for base 12 numbers). The AAM mnemonic is interpreted by all assemblers to mean adjust to ASCII (base 10) values. To adjust to values in another number base, the instruction must be hand coded in machine code (D4 imm8).\nThis instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.\n"
},
{
"Name": "AAS",
"Alias": [],
"Brief": "ASCII Adjust AL After Subtraction",
"Description": "\nAdjusts the result of the subtraction of two unpacked BCD values to create a unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAS instruction is only useful when it follows a SUB instruction that subtracts (binary subtraction) one unpacked BCD value from another and stores a byte result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result.\nIf the subtraction produced a decimal carry, the AH register decrements by 1, and the CF and AF flags are set. If no decimal carry occurred, the CF and AF flags are cleared, and the AH register is unchanged. In either case, the AL register is left with its top four bits set to 0.\nThis instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.\n"
},
{
"Name": "ADC",
"Alias": [],
"Brief": "Add with Carry",
"Description": "\nAdds the destination operand (first operand), the source operand (second operand), and the carry (CF) flag and stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. (However, two memory operands cannot be used in one instruction.) The state of the CF flag represents a carry from a previous addition. When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.\nThe ADC instruction does not distinguish between signed or unsigned operands. Instead, the processor evaluates the result for both data types and sets the OF and CF flags to indicate a carry in the signed or unsigned result, respectively. The SF flag indicates the sign of the signed result.\nThe ADC instruction is usually executed as part of a multibyte or multiword addition in which an ADD instruction is followed by an ADC instruction.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n"
},
{
"Name": "ADCX",
"Alias": [],
"Brief": "Unsigned Integer Addition of Two Operands with Carry Flag",
"Description": "\nPerforms an unsigned addition of the destination operand (first operand), the source operand (second operand) and the carry-flag (CF) and stores the result in the destination operand. The destination operand is a general-purpose register, whereas the source operand can be a general-purpose register or memory location. The state of CF can represent a carry from a previous addition. The instruction sets the CF flag with the carry generated by the unsigned addition of the operands.\nThe ADCX instruction is executed in the context of multi-precision addition, where we add a series of operands with a carry-chain. At the beginning of a chain of additions, we need to make sure the CF is in a desired initial state. Often, this initial state needs to be 0, which can be achieved with an instruction to zero the CF (e.g. XOR).\nThis instruction is supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode.\nIn 64-bit mode, the default operation size is 32 bits. Using a REX Prefix in the form of REX.R permits access to addi-tional registers (R8-15). Using REX Prefix in the form of REX.W promotes operation to 64 bits.\nADCX executes normally either inside or outside a transaction region.\nNote: ADCX defines the OF flag differently than the ADD/ADC instructions as defined in Intel® 64 and IA-32 Archi-tectures Software Developer’s Manual, Volume 2A.\n"
},
{
"Name": "ADD",
"Alias": [],
"Brief": "Add",
"Description": "\nAdds the destination operand (first operand) and the source operand (second operand) and then stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. (However, two memory operands cannot be used in one instruction.) When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.\nThe ADD instruction performs integer addition. It evaluates the result for both signed and unsigned integer oper-ands and sets the OF and CF flags to indicate a carry (overflow) in the signed or unsigned result, respectively. The SF flag indicates the sign of the signed result.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n"
},
{
"Name": "ADDPD",
"Alias": [],
"Brief": "Add Packed Double",
"Description": "\nPerforms a SIMD add of the two packed double-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the packed double-precision floating-point results in the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified. See Chapter 11 in the Intel® 64 and IA-32 Architectures Software Devel-oper’s Manual, Volume 1, for an overview of SIMD double-precision floating-point operation.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n"
},
{
"Name": "ADDPS",
"Alias": [],
"Brief": "Add Packed Single",
"Description": "\nPerforms a SIMD add of the four packed single-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the packed single-precision floating-point results in the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified. See Chapter 10 in the Intel® 64 and IA-32 Architectures Software Devel-oper’s Manual, Volume 1, for an overview of SIMD single-precision floating-point operation.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n"
},
{
"Name": "ADDSD",
"Alias": [],
"Brief": "Add Scalar Double",
"Description": "\nAdds the low double-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the double-precision floating-point result in the destination operand.\nThe source operand can be an XMM register or a 64-bit memory location. The destination operand is an XMM register. See Chapter 11 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an overview of a scalar double-precision floating-point operation.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: Bits (VLMAX-1:64) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (127:64) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n"
},
{
"Name": "ADDSS",
"Alias": [],
"Brief": "Add Scalar Single",
"Description": "\nAdds the low single-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the single-precision floating-point result in the destination operand.\nThe source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. See Chapter 10 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an overview of a scalar single-precision floating-point operation.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (127:32) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n"
},
{
"Name": "ADDSUBPD",
"Alias": [],
"Brief": "Packed Double",
"Description": "\nAdds odd-numbered double-precision floating-point values of the first source operand (second operand) with the corresponding double-precision floating-point values from the second source operand (third operand); stores the result in the odd-numbered values of the destination operand (first operand). Subtracts the even-numbered double-precision floating-point values from the second source operand from the corresponding double-precision floating values in the first source operand; stores the result into the even-numbered values of the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified. See Figure 3-3.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n\nADDSUBPD xmm1, xmm2/m128\nxmm2/m128\n[127:64]\n[63:0]\nRESULT:\nxmm1[127:64] + xmm2/m128[127:64]\nxmm1[63:0] - xmm2/m128[63:0]\nxmm1\n[127:64]\n[63:0]\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFigure 3-3. ADDSUBPD—Packed Double-FP Add/Subtract\n"
},
{
"Name": "ADDSUBPS",
"Alias": [],
"Brief": "Packed Single",
"Description": "\nAdds odd-numbered single-precision floating-point values of the first source operand (second operand) with the corresponding single-precision floating-point values from the second source operand (third operand); stores the result in the odd-numbered values of the destination operand (first operand). Subtracts the even-numbered single-precision floating-point values from the second source operand from the corresponding single-precision floating values in the first source operand; stores the result into the even-numbered values of the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified. See Figure 3-4.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n\nADDSUBPS xmm1, xmm2/m128\nxmm2/\n[127:96]\n[95:64]\n[63:32]\n[31:0]\nm128\nRESULT:\nxmm1[127:96] +\nxmm1[95:64] - xmm2/\nxmm1[63:32] +\nxmm1[31:0] -\nxmm1\nxmm2/m128[127:96]\nm128[95:64]\nxmm2/m128[63:32]\nxmm2/m128[31:0]\n[127:96]\n[95:64]\n[63:32]\n[31:0]\nOM15992\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFigure 3-4. ADDSUBPS—Packed Single-FP Add/Subtract\n"
},
{
"Name": "ADOX",
"Alias": [],
"Brief": "Unsigned Integer Addition of Two Operands with Overflow Flag",
"Description": "\nPerforms an unsigned addition of the destination operand (first operand), the source operand (second operand) and the overflow-flag (OF) and stores the result in the destination operand. The destination operand is a general-purpose register, whereas the source operand can be a general-purpose register or memory location. The state of OF represents a carry from a previous addition. The instruction sets the OF flag with the carry generated by the unsigned addition of the operands.\nThe ADOX instruction is executed in the context of multi-precision addition, where we add a series of operands with a carry-chain. At the beginning of a chain of additions, we execute an instruction to zero the OF (e.g. XOR).\nThis instruction is supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode.\nIn 64-bit mode, the default operation size is 32 bits. Using a REX Prefix in the form of REX.R permits access to addi-tional registers (R8-15). Using REX Prefix in the form of REX.W promotes operation to 64-bits.\nADOX executes normally either inside or outside a transaction region.\nNote: ADOX defines the CF and OF flags differently than the ADD/ADC instructions as defined in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A.\n"
},
{
"Name": "AESDEC",
"Alias": [],
"Brief": "Perform One Round of an AES Decryption Flow",
"Description": "\nThis instruction performs a single round of the AES decryption flow using the Equivalent Inverse Cipher, with the round key from the second source operand, operating on a 128-bit data (state) from the first source operand, and store the result in the destination operand.\nUse the AESDEC instruction for all but the last decryption round. For the last decryption round, use the AESDEC-CLAST instruction.\n128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n"
},
{
"Name": "AESDECLAST",
"Alias": [],
"Brief": "Perform Last Round of an AES Decryption Flow",
"Description": "\nThis instruction performs the last round of the AES decryption flow using the Equivalent Inverse Cipher, with the round key from the second source operand, operating on a 128-bit data (state) from the first source operand, and store the result in the destination operand.\n128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n"
},
{
"Name": "AESENC",
"Alias": [],
"Brief": "Perform One Round of an AES Encryption Flow",
"Description": "\nThis instruction performs a single round of an AES encryption flow using a round key from the second source operand, operating on 128-bit data (state) from the first source operand, and store the result in the destination operand.\nUse the AESENC instruction for all but the last encryption rounds. For the last encryption round, use the AESENC-CLAST instruction.\n128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n"
},
{
"Name": "AESENCLAST",
"Alias": [],
"Brief": "Perform Last Round of an AES Encryption Flow",
"Description": "\nThis instruction performs the last round of an AES encryption flow using a round key from the second source operand, operating on 128-bit data (state) from the first source operand, and store the result in the destination operand.\n128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n"
},
{
"Name": "AESIMC",
"Alias": [],
"Brief": "Perform the AES InvMixColumn Transformation",
"Description": "\nPerform the InvMixColumns transformation on the source operand and store the result in the destination operand. The destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory loca-tion.\nNote: the AESIMC instruction should be applied to the expanded AES round keys (except for the first and last round key) in order to prepare them for decryption using the “Equivalent Inverse Cipher” (defined in FIPS 197).\n128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n"
},
{
"Name": "AESKEYGENASSIST",
"Alias": [],
"Brief": "AES Round Key Generation Assist",
"Description": "\nAssist in expanding the AES cipher key, by computing steps towards generating a round key for encryption, using 128-bit data specified in the source operand and an 8-bit round constant specified as an immediate, store the result in the destination operand.\nThe destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory loca-tion.\n128-bit Legacy SSE version:Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n"
},
{
"Name": "AND",
"Alias": [],
"Brief": "Logical AND",
"Description": "\nPerforms a bitwise AND operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. (However, two memory operands cannot be used in one instruction.) Each bit of the result is set to 1 if both corresponding bits of the first and second operands are 1; otherwise, it is set to 0.\nThis instruction can be used with a LOCK prefix to allow the it to be executed atomically.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n"
},
{
"Name": "ANDN",
"Alias": [],
"Brief": "Logical AND NOT",
"Description": "\nPerforms a bitwise logical AND of inverted second operand (the first source operand) with the third operand (the second source operand). The result is stored in the first operand (destination operand).\nThis instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.\n"
},
{
"Name": "ANDNPD",
"Alias": [],
"Brief": "Bitwise Logical AND NOT of Packed Double",
"Description": "\nPerforms a bitwise logical AND NOT of the two or four packed double-precision floating-point values from the first source operand and the second source operand, and stores the result in the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n"
},
{
"Name": "ANDNPS",
"Alias": [],
"Brief": "Bitwise Logical AND NOT of Packed Single",
"Description": "\nInverts the bits of the four packed single-precision floating-point values in the destination operand (first operand), performs a bitwise logical AND of the four packed single-precision floating-point values in the source operand (second operand) and the temporary inverted result, and stores the result in the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n"
},
{
"Name": "ANDPD",
"Alias": [],
"Brief": "Bitwise Logical AND of Packed Double",
"Description": "\nPerforms a bitwise logical AND of the two packed double-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the result in the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n"
},
{
"Name": "ANDPS",
"Alias": [],
"Brief": "Bitwise Logical AND of Packed Single",
"Description": "\nPerforms a bitwise logical AND of the four or eight packed single-precision floating-point values from the first source operand and the second source operand, and stores the result in the destination operand.\nIn 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n"
},
{
"Name": "ARPL",
"Alias": [],
"Brief": "Adjust RPL Field of Segment Selector",
"Description": "\nCompares the RPL fields of two segment selectors. The first operand (the destination operand) contains one segment selector and the second operand (source operand) contains the other. (The RPL field is located in bits 0 and 1 of each operand.) If the RPL field of the destination operand is less than the RPL field of the source operand, the ZF flag is set and the RPL field of the destination operand is increased to match that of the source operand. Otherwise, the ZF flag is cleared and no change is made to the destination operand. (The destination operand can be a word register or a memory location; the source operand must be a word register.)\nThe ARPL instruction is provided for use by operating-system procedures (however, it can also be used by applica-tions). It is generally used to adjust the RPL of a segment selector that has been passed to the operating system by an application program to match the privilege level of the application program. Here the segment selector passed to the operating system is placed in the destination operand and segment selector for the application program’s code segment is placed in the source operand. (The RPL field in the source operand represents the priv-ilege level of the application program.) Execution of the ARPL instruction then ensures that the RPL of the segment selector received by the operating system is no lower (does not have a higher privilege) than the privilege level of the application program (the segment selector for the application program’s code segment can be read from the stack following a procedure call).\nThis instruction executes as described in compatibility mode and legacy mode. It is not encodable in 64-bit mode.\nSee “Checking Caller Access Privileges” in Chapter 3, “Protected-Mode Memory Management,” of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for more information about the use of this instruc-tion.\n"
},
{
"Name": "BEXTR",
"Alias": [],
"Brief": "Bit Field Extract",
"Description": "\nExtracts contiguous bits from the first source operand (the second operand) using an index value and length value specified in the second source operand (the third operand). Bit 7:0 of the second source operand specifies the starting bit position of bit extraction. A START value exceeding the operand size will not extract any bits from the second source operand. Bit 15:8 of the second source operand specifies the maximum number of bits (LENGTH) beginning at the START position to extract. Only bit positions up to (OperandSize -1) of the first source operand are extracted. The extracted bits are written to the destination register, starting from the least significant bit. All higher order bits in the destination operand (starting at bit position LENGTH) are zeroed. The destination register is cleared if no bits are extracted.\nThis instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.\n"
},
{
"Name": "BLENDPD",
"Alias": [],
"Brief": "Blend Packed Double Precision Floating",
"Description": "\nDouble-precision floating-point values from the second source operand (third operand) are conditionally merged with values from the first source operand (second operand) and written to the destination operand (first operand). The immediate bits [3:0] determine whether the corresponding double-precision floating-point value in the desti-nation is copied from the second source or first source. If a bit in the mask, corresponding to a word, is “1\", then the double-precision floating-point value in the second source operand is copied, else the value in the first source operand is copied.\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n"
},
{
"Name": "BLENDPS",
"Alias": [],
"Brief": "Blend Packed Single Precision Floating",
"Description": "\nPacked single-precision floating-point values from the second source operand (third operand) are conditionally merged with values from the first source operand (second operand) and written to the destination operand (first operand). The immediate bits [7:0] determine whether the corresponding single precision floating-point value in the destination is copied from the second source or first source. If a bit in the mask, corresponding to a word, is “1\", then the single-precision floating-point value in the second source operand is copied, else the value in the first source operand is copied.\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The first source operand an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n"
},
{
"Name": "BLENDVPD",
"Alias": [],
"Brief": "Variable Blend Packed Double Precision Floating",
"Description": "\nConditionally copy each quadword data element of double-precision floating-point value from the second source operand and the first source operand depending on mask bits defined in the mask register operand. The mask bits are the most significant bit in each quadword element of the mask register.\nEach quadword element of the destination operand is copied from:\nThe register assignment of the implicit mask operand for BLENDVPD is defined to be the architectural register XMM0.\n128-bit Legacy SSE version: The first source operand and the destination operand is the same. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. The mask register operand is implicitly defined to be the architectural register XMM0. An attempt to execute BLENDVPD with a VEX prefix will cause #UD.\nVEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand is an XMM register or 128-bit memory location. The mask operand is the third source register, and encoded in bits[7:4] of the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode, imm8[7] is ignored. The upper bits (VLMAX-1:128) of the corresponding YMM register (destination register) are zeroed. VEX.W must be 0, otherwise, the instruction will #UD.\nVEX.256 encoded version: The first source operand and destination operand are YMM registers. The second source operand can be a YMM register or a 256-bit memory location. The mask operand is the third source register, and encoded in bits[7:4] of the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode, imm8[7] is ignored. VEX.W must be 0, otherwise, the instruction will #UD.\nVBLENDVPD permits the mask to be any XMM or YMM register. In contrast, BLENDVPD treats XMM0 implicitly as the mask and do not support non-destructive destination operation.\n"
},
{
"Name": "BLENDVPS",
"Alias": [],
"Brief": "Variable Blend Packed Single Precision Floating",
"Description": "\nConditionally copy each dword data element of single-precision floating-point value from the second source operand and the first source operand depending on mask bits defined in the mask register operand. The mask bits are the most significant bit in each dword element of the mask register.\nEach quadword element of the destination operand is copied from:\nThe register assignment of the implicit mask operand for BLENDVPS is defined to be the architectural register XMM0.\n128-bit Legacy SSE version: The first source operand and the destination operand is the same. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. The mask register operand is implicitly defined to be the architectural register XMM0. An attempt to execute BLENDVPS with a VEX prefix will cause #UD.\nVEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand is an XMM register or 128-bit memory location. The mask operand is the third source register, and encoded in bits[7:4] of the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode, imm8[7] is ignored. The upper bits (VLMAX-1:128) of the corresponding YMM register (destination register) are zeroed. VEX.W must be 0, otherwise, the instruction will #UD.\nVEX.256 encoded version: The first source operand and destination operand are YMM registers. The second source operand can be a YMM register or a 256-bit memory location. The mask operand is the third source register, and encoded in bits[7:4] of the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode, imm8[7] is ignored. VEX.W must be 0, otherwise, the instruction will #UD.\nVBLENDVPS permits the mask to be any XMM or YMM register. In contrast, BLENDVPS treats XMM0 implicitly as the mask and do not support non-destructive destination operation.\n"
},
{
"Name": "BLSI",
"Alias": [],
"Brief": "Extract Lowest Set Isolated Bit",
"Description": "\nExtracts the lowest set bit from the source operand and set the corresponding bit in the destination register. All other bits in the destination operand are zeroed. If no bits are set in the source operand, BLSI sets all the bits in the destination to 0 and sets ZF and CF.\nThis instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.\n"
},
{
"Name": "BLSMSK",
"Alias": [],
"Brief": "Get Mask Up to Lowest Set Bit",
"Description": "\nSets all the lower bits of the destination operand to “1” up to and including lowest set bit (=1) in the source operand. If source operand is zero, BLSMSK sets all bits of the destination operand to 1 and also sets CF to 1.\nThis instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.\n"
},
{
"Name": "BLSR",
"Alias": [],
"Brief": "Reset Lowest Set Bit",
"Description": "\nCopies all bits from the source operand to the destination operand and resets (=0) the bit position in the destina-tion operand that corresponds to the lowest set bit of the source operand. If the source operand is zero BLSR sets CF.\nThis instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.\n"
},
{
"Name": "BOUND",
"Alias": [],
"Brief": "Check Array Index Against Bounds",
"Description": "\nBOUND determines if the first operand (array index) is within the bounds of an array specified the second operand (bounds operand). The array index is a signed integer located in a register. The bounds operand is a memory loca-tion that contains a pair of signed doubleword-integers (when the operand-size attribute is 32) or a pair of signed word-integers (when the operand-size attribute is 16). The first doubleword (or word) is the lower bound of the array and the second doubleword (or word) is the upper bound of the array. The array index must be greater than or equal to the lower bound and less than or equal to the upper bound plus the operand size in bytes. If the index is not within bounds, a BOUND range exceeded exception (#BR) is signaled. When this exception is generated, the saved return instruction pointer points to the BOUND instruction.\nThe bounds limit data structure (two words or doublewords containing the lower and upper limits of the array) is usually placed just before the array itself, making the limits addressable via a constant offset from the beginning of the array. Because the address of the array already will be present in a register, this practice avoids extra bus cycles to obtain the effective address of the array bounds.\nThis instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.\n"
},
{
"Name": "BSF",
"Alias": [],
"Brief": "Bit Scan Forward",
"Description": "\nSearches the source operand (second operand) for the least significant set bit (1 bit). If a least significant 1 bit is found, its bit index is stored in the destination operand (first operand). The source operand can be a register or a memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source operand. If the content of the source operand is 0, the content of the destination operand is undefined.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n"
},
{
"Name": "BSR",
"Alias": [],
"Brief": "Bit Scan Reverse",
"Description": "\nSearches the source operand (second operand) for the most significant set bit (1 bit). If a most significant 1 bit is found, its bit index is stored in the destination operand (first operand). The source operand can be a register or a memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source operand. If the content source operand is 0, the content of the destination operand is undefined.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n"
},
{
"Name": "BSWAP",
"Alias": [],
"Brief": "Byte Swap",
"Description": "\nReverses the byte order of a 32-bit or 64-bit (destination) register. This instruction is provided for converting little-endian values to big-endian format and vice versa. To swap bytes in a word value (16-bit register), use the XCHG instruction. When the BSWAP instruction references a 16-bit register, the result is undefined.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n"
},
{
"Name": "BT",
"Alias": [],
"Brief": "Bit Test",
"Description": "\nSelects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset (specified by the second operand) and stores the value of the bit in the CF flag. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value:\nSee also: Bit(BitBase, BitOffset) on page 3-10.\nSome assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combina-tion with the displacement field of the memory operand. In this case, the low-order 3 or 5 bits (3 for 16-bit oper-ands, 5 for 32-bit operands) of the immediate bit offset are stored in the immediate bit offset field, and the high-order bits are shifted and combined with the byte displacement in the addressing mode by the assembler. The processor will ignore the high order bits if they are not zero.\nWhen accessing a bit in memory, the processor may access 4 bytes starting from the memory address for a 32-bit operand size, using by the following relationship:\nEffective Address + (4 ∗ (BitOffset DIV 32))\nOr, it may access 2 bytes starting from the memory address for a 16-bit operand, using this relationship:\nEffective Address + (2 ∗ (BitOffset DIV 16))\nIt may do so even when only a single byte needs to be accessed to reach the given bit. When using this bit addressing mechanism, software should avoid referencing areas of memory close to address space holes. In partic-ular, it should avoid references to memory-mapped I/O registers. Instead, software should use the MOV instruc-tions to load from or store to these addresses, and use the register form of these instructions to manipulate the data.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bit oper-ands. See the summary chart at the beginning of this section for encoding data and limits.\n"
},
{
"Name": "BTC",
"Alias": [],
"Brief": "Bit Test and Complement",
"Description": "\nSelects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand), stores the value of the bit in the CF flag, and complements the selected bit in the bit string. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value:\nSee also: Bit(BitBase, BitOffset) on page 3-10.\nSome assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combina-tion with the displacement field of the memory operand. See “BT—Bit Test” in this chapter for more information on this addressing mechanism.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n"
},
{
"Name": "BTR",
"Alias": [],
"Brief": "Bit Test and Reset",
"Description": "\nSelects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand), stores the value of the bit in the CF flag, and clears the selected bit in the bit string to 0. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value:\nSee also: Bit(BitBase, BitOffset) on page 3-10.\nSome assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combina-tion with the displacement field of the memory operand. See “BT—Bit Test” in this chapter for more information on this addressing mechanism.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n"
},
{
"Name": "BTS",
"Alias": [],
"Brief": "Bit Test and Set",
"Description": "\nSelects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand), stores the value of the bit in the CF flag, and sets the selected bit in the bit string to 1. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value:\nSee also: Bit(BitBase, BitOffset) on page 3-10.\nSome assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combina-tion with the displacement field of the memory operand. See “BT—Bit Test” in this chapter for more information on this addressing mechanism.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n"
},
{
"Name": "BZHI",
"Alias": [],
"Brief": "Zero High Bits Starting with Specified Bit Position",
"Description": "\nBZHI copies the bits of the first source operand (the second operand) into the destination operand (the first operand) and clears the higher bits in the destination according to the INDEX value specified by the second source operand (the third operand). The INDEX is specified by bits 7:0 of the second source operand. The INDEX value is saturated at the value of OperandSize -1. CF is set, if the number contained in the 8 low bits of the third operand is greater than OperandSize -1.\nThis instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.\n"
},
{
"Name": "CALL",
"Alias": [],
"Brief": "Call Procedure",
"Description": "\nSaves procedure linking information on the stack and branches to the called procedure specified using the target operand. The target operand specifies the address of the first instruction in the called procedure. The operand can be an immediate value, a general-purpose register, or a memory location.\nThis instruction can be used to execute four types of calls:\nThe latter two call types (inter-privilege-level call and task switch) can only be executed in protected mode. See “Calling Procedures Using Call and RET” in Chapter 6 of the Intel® 64 and IA-32 Architectures Software Devel-oper’s Manual, Volume 1, for additional information on near, far, and inter-privilege-level calls. See Chapter 7, “Task Management,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for infor-mation on performing task switches with the CALL instruction.\nNear Call. When executing a near call, the processor pushes the value of the EIP register (which contains the offset of the instruction following the CALL instruction) on the stack (for use later as a return-instruction pointer). The processor then branches to the address in the current code segment specified by the target operand. The target operand specifies either an absolute offset in the code segment (an offset from the base of the code segment) or a relative offset (a signed displacement relative to the current value of the instruction pointer in the EIP register; this value points to the instruction following the CALL instruction). The CS register is not changed on near calls.\nFor a near call absolute, an absolute offset is specified indirectly in a general-purpose register or a memory location (r/m16, r/m32, or r/m64). The operand-size attribute determines the size of the target operand (16, 32 or 64 bits). When in 64-bit mode, the operand size for near call (and all near branches) is forced to 64-bits. Absolute offsets are loaded directly into the EIP(RIP) register. If the operand size attribute is 16, the upper two bytes of the EIP register are cleared, resulting in a maximum instruction pointer size of 16 bits. When accessing an absolute offset indirectly using the stack pointer [ESP] as the base register, the base value used is the value of the ESP before the instruction executes.\nA relative offset (rel16 or rel32) is generally specified as a label in assembly code. But at the machine code level, it is encoded as a signed, 16- or 32-bit immediate value. This value is added to the value in the EIP(RIP) register. In 64-bit mode the relative offset is always a 32-bit immediate value which is sign extended to 64-bits before it is added to the value in the RIP register for the target calculation. As with absolute offsets, the operand-size attribute determines the size of the target operand (16, 32, or 64 bits). In 64-bit mode the target operand will always be 64-bits because the operand size is forced to 64-bits for near branches.\nFar Calls in Real-Address or Virtual-8086 Mode. When executing a far call in real- address or virtual-8086 mode, the processor pushes the current value of both the CS and EIP registers on the stack for use as a return-instruction pointer. The processor then performs a “far branch” to the code segment and offset specified with the target operand for the called procedure. The target operand specifies an absolute far address either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). With the pointer method, the segment and offset of the called procedure is encoded in the instruction using a 4-byte (16-bit operand size) or 6-byte (32-bit operand size) far address immediate. With the indirect method, the target operand specifies a memory location that contains a 4-byte (16-bit operand size) or 6-byte (32-bit operand size) far address. The operand-size attribute determines the size of the offset (16 or 32 bits) in the far address. The far address is loaded directly into the CS and EIP registers. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared.\nFar Calls in Protected Mode. When the processor is operating in protected mode, the CALL instruction can be used to perform the following types of far calls:\nIn protected mode, the processor always uses the segment selector part of the far address to access the corre-sponding descriptor in the GDT or LDT. The descriptor type (code segment, call gate, task gate, or TSS) and access rights determine the type of call operation to be performed.\nIf the selected descriptor is for a code segment, a far call to a code segment at the same privilege level is performed. (If the selected code segment is at a different privilege level and the code segment is non-conforming, a general-protection exception is generated.) A far call to the same privilege level in protected mode is very similar to one carried out in real-address or virtual-8086 mode. The target operand specifies an absolute far address either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). The operand- size attribute determines the size of the offset (16 or 32 bits) in the far address. The new code segment selector and its descriptor are loaded into CS register; the offset from the instruction is loaded into the EIP register.\nA call gate (described in the next paragraph) can also be used to perform a far call to a code segment at the same privilege level. Using this mechanism provides an extra level of indirection and is the preferred method of making calls between 16-bit and 32-bit code segments.\nWhen executing an inter-privilege-level far call, the code segment for the procedure being called must be accessed through a call gate. The segment selector specified by the target operand identifies the call gate. The target\noperand can specify the call gate segment selector either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). The processor obtains the segment selector for the new code segment and the new instruction pointer (offset) from the call gate descriptor. (The offset from the target operand is ignored when a call gate is used.)\nOn inter-privilege-level calls, the processor switches to the stack for the privilege level of the called procedure. The segment selector for the new stack segment is specified in the TSS for the currently running task. The branch to the new code segment occurs after the stack switch. (Note that when using a call gate to perform a far call to a segment at the same privilege level, no stack switch occurs.) On the new stack, the processor pushes the segment selector and stack pointer for the calling procedure’s stack, an optional set of parameters from the calling proce-dures stack, and the segment selector and instruction pointer for the calling procedure’s code segment. (A value in the call gate descriptor determines how many parameters to copy to the new stack.) Finally, the processor branches to the address of the procedure being called within the new code segment.\nExecuting a task switch with the CALL instruction is similar to executing a call through a call gate. The target operand specifies the segment selector of the task gate for the new task activated by the switch (the offset in the target operand is ignored). The task gate in turn points to the TSS for the new task, which contains the segment selectors for the task’s code and stack segments. Note that the TSS also contains the EIP value for the next instruc-tion that was to be executed before the calling task was suspended. This instruction pointer value is loaded into the EIP register to re-start the calling task.\nThe CALL instruction can also specify the segment selector of the TSS directly, which eliminates the indirection of the task gate. See Chapter 7, “Task Management,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for information on the mechanics of a task switch.\nWhen you execute at task switch with a CALL instruction, the nested task flag (NT) is set in the EFLAGS register and the new TSS’s previous task link field is loaded with the old task’s TSS selector. Code is expected to suspend this nested task by executing an IRET instruction which, because the NT flag is set, automatically uses the previous task link to return to the calling task. (See “Task Linking” in Chapter 7 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for information on nested tasks.) Switching tasks with the CALL instruc-tion differs in this regard from JMP instruction. JMP does not set the NT flag and therefore does not expect an IRET instruction to suspend the task.\nMixing 16-Bit and 32-Bit Calls. When making far calls between 16-bit and 32-bit code segments, use a call gate. If the far call is from a 32-bit code segment to a 16-bit code segment, the call should be made from the first 64 KBytes of the 32-bit code segment. This is because the operand-size attribute of the instruction is set to 16, so only a 16-bit return address offset can be saved. Also, the call should be made using a 16-bit call gate so that 16-bit values can be pushed on the stack. See Chapter 21, “Mixing 16-Bit and 32-Bit Code,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B, for more information.\nFar Calls in Compatibility Mode. When the processor is operating in compatibility mode, the CALL instruction can be used to perform the following types of far calls:\nNote that a CALL instruction can not be used to cause a task switch in compatibility mode since task switches are not supported in IA-32e mode.\nIn compatibility mode, the processor always uses the segment selector part of the far address to access the corre-sponding descriptor in the GDT or LDT. The descriptor type (code segment, call gate) and access rights determine the type of call operation to be performed.\nIf the selected descriptor is for a code segment, a far call to a code segment at the same privilege level is performed. (If the selected code segment is at a different privilege level and the code segment is non-conforming, a general-protection exception is generated.) A far call to the same privilege level in compatibility mode is very similar to one carried out in protected mode. The target operand specifies an absolute far address either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). The operand-size attribute determines the size of the offset (16 or 32 bits) in the far address. The new code segment selector and its descriptor are loaded into CS register and the offset from the instruction is loaded into the EIP register. The differ-ence is that 64-bit mode may be entered. This specified by the L bit in the new code segment descriptor.\nNote that a 64-bit call gate (described in the next paragraph) can also be used to perform a far call to a code segment at the same privilege level. However, using this mechanism requires that the target code segment descriptor have the L bit set, causing an entry to 64-bit mode.\nWhen executing an inter-privilege-level far call, the code segment for the procedure being called must be accessed through a 64-bit call gate. The segment selector specified by the target operand identifies the call gate. The target operand can specify the call gate segment selector either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). The processor obtains the segment selector for the new code segment and the new instruction pointer (offset) from the 16-byte call gate descriptor. (The offset from the target operand is ignored when a call gate is used.)\nOn inter-privilege-level calls, the processor switches to the stack for the privilege level of the called procedure. The segment selector for the new stack segment is set to NULL. The new stack pointer is specified in the TSS for the currently running task. The branch to the new code segment occurs after the stack switch. (Note that when using a call gate to perform a far call to a segment at the same privilege level, an implicit stack switch occurs as a result of entering 64-bit mode. The SS selector is unchanged, but stack segment accesses use a segment base of 0x0, the limit is ignored, and the default stack size is 64-bits. The full value of RSP is used for the offset, of which the upper 32-bits are undefined.) On the new stack, the processor pushes the segment selector and stack pointer for the calling procedure’s stack and the segment selector and instruction pointer for the calling procedure’s code segment. (Parameter copy is not supported in IA-32e mode.) Finally, the processor branches to the address of the procedure being called within the new code segment.\nNear/(Far) Calls in 64-bit Mode. When the processor is operating in 64-bit mode, the CALL instruction can be used to perform the following types of far calls:\nNote that in this mode the CALL instruction can not be used to cause a task switch in 64-bit mode since task switches are not supported in IA-32e mode.\nIn 64-bit mode, the processor always uses the segment selector part of the far address to access the corresponding descriptor in the GDT or LDT. The descriptor type (code segment, call gate) and access rights determine the type of call operation to be performed.\nIf the selected descriptor is for a code segment, a far call to a code segment at the same privilege level is performed. (If the selected code segment is at a different privilege level and the code segment is non-conforming, a general-protection exception is generated.) A far call to the same privilege level in 64-bit mode is very similar to one carried out in compatibility mode. The target operand specifies an absolute far address indirectly with a memory location (m16:16, m16:32 or m16:64). The form of CALL with a direct specification of absolute far address is not defined in 64-bit mode. The operand-size attribute determines the size of the offset (16, 32, or 64 bits) in the far address. The new code segment selector and its descriptor are loaded into the CS register; the offset from the instruction is loaded into the EIP register. The new code segment may specify entry either into compati-bility or 64-bit mode, based on the L bit value.\nA 64-bit call gate (described in the next paragraph) can also be used to perform a far call to a code segment at the same privilege level. However, using this mechanism requires that the target code segment descriptor have the L bit set.\nWhen executing an inter-privilege-level far call, the code segment for the procedure being called must be accessed through a 64-bit call gate. The segment selector specified by the target operand identifies the call gate. The target operand can only specify the call gate segment selector indirectly with a memory location (m16:16, m16:32 or m16:64). The processor obtains the segment selector for the new code segment and the new instruction pointer (offset) from the 16-byte call gate descriptor. (The offset from the target operand is ignored when a call gate is used.)\nOn inter-privilege-level calls, the processor switches to the stack for the privilege level of the called procedure. The segment selector for the new stack segment is set to NULL. The new stack pointer is specified in the TSS for the currently running task. The branch to the new code segment occurs after the stack switch.\nNote that when using a call gate to perform a far call to a segment at the same privilege level, an implicit stack switch occurs as a result of entering 64-bit mode. The SS selector is unchanged, but stack segment accesses use a segment base of 0x0, the limit is ignored, and the default stack size is 64-bits. (The full value of RSP is used for the\noffset.) On the new stack, the processor pushes the segment selector and stack pointer for the calling procedure’s stack and the segment selector and instruction pointer for the calling procedure’s code segment. (Parameter copy is not supported in IA-32e mode.) Finally, the processor branches to the address of the procedure being called within the new code segment.\n"
},
{
"Name": "CBW",
"Alias": [
"CWDE",
"CDQE"
],
"Brief": "Convert Byte to Word/Convert Word to Doubleword/Convert Doubleword to Quadword",
"Description": "\nDouble the size of the source operand by means of sign extension. The CBW (convert byte to word) instruction copies the sign (bit 7) in the source operand into every bit in the AH register. The CWDE (convert word to double-word) instruction copies the sign (bit 15) of the word in the AX register into the high 16 bits of the EAX register.\nCBW and CWDE reference the same opcode. The CBW instruction is intended for use when the operand-size attri-bute is 16; CWDE is intended for use when the operand-size attribute is 32. Some assemblers may force the operand size. Others may treat these two mnemonics as synonyms (CBW/CWDE) and use the setting of the operand-size attribute to determine the size of values to be converted.\nIn 64-bit mode, the default operation size is the size of the destination register. Use of the REX.W prefix promotes this instruction (CDQE when promoted) to operate on 64-bit operands. In which case, CDQE copies the sign (bit 31) of the doubleword in the EAX register into the high 32 bits of RAX.\n"
},
{
"Name": "CWD",
"Alias": [
"CDQ",
"CQO"
],
"Brief": "Convert Word to Doubleword/Convert Doubleword to Quadword",
"Description": "\nDoubles the size of the operand in register AX, EAX, or RAX (depending on the operand size) by means of sign extension and stores the result in registers DX:AX, EDX:EAX, or RDX:RAX, respectively. The CWD instruction copies the sign (bit 15) of the value in the AX register into every bit position in the DX register. The CDQ instruction copies the sign (bit 31) of the value in the EAX register into every bit position in the EDX register. The CQO instruc-tion (available in 64-bit mode only) copies the sign (bit 63) of the value in the RAX register into every bit position in the RDX register.\nThe CWD instruction can be used to produce a doubleword dividend from a word before word division. The CDQ instruction can be used to produce a quadword dividend from a doubleword before doubleword division. The CQO instruction can be used to produce a double quadword dividend from a quadword before a quadword division.\nThe CWD and CDQ mnemonics reference the same opcode. The CWD instruction is intended for use when the operand-size attribute is 16 and the CDQ instruction for when the operand-size attribute is 32. Some assemblers may force the operand size to 16 when CWD is used and to 32 when CDQ is used. Others may treat these mnemonics as synonyms (CWD/CDQ) and use the current setting of the operand-size attribute to determine the size of values to be converted, regardless of the mnemonic used.\nIn 64-bit mode, use of the REX.W prefix promotes operation to 64 bits. The CQO mnemonics reference the same opcode as CWD/CDQ. See the summary chart at the beginning of this section for encoding data and limits.\n"
},
{
"Name": "CLAC",
"Alias": [],
"Brief": "Clear AC Flag in EFLAGS Register",
"Description": "\nClears the AC flag bit in EFLAGS register. This disables any alignment checking of user-mode data accesses. If the SMAP bit is set in the CR4 register, this disallows explicit supervisor-mode data accesses to user-mode pages.\nThis instruction's operation is the same in non-64-bit modes and 64-bit mode. Attempts to execute CLAC when CPL > 0 cause #UD.\n"
},
{
"Name": "CLC",
"Alias": [],
"Brief": "Clear Carry Flag",
"Description": "\nClears the CF flag in the EFLAGS register. Operation is the same in all modes.\n"
},
{
"Name": "CLD",
"Alias": [],
"Brief": "Clear Direction Flag",
"Description": "\nClears the DF flag in the EFLAGS register. When the DF flag is set to 0, string operations increment the index regis-ters (ESI and/or EDI). Operation is the same in all modes.\n"
},
{
"Name": "CLFLUSH",
"Alias": [],
"Brief": "Flush Cache Line",
"Description": "\nInvalidates the cache line that contains the linear address specified with the source operand from all levels of the processor cache hierarchy (data and instruction). The invalidation is broadcast throughout the cache coherence domain. If, at any level of the cache hierarchy, the line is inconsistent with memory (dirty) it is written to memory before invalidation. The source operand is a byte memory location.\nThe availability of CLFLUSH is indicated by the presence of the CPUID feature flag CLFSH (bit 19 of the EDX register, see “CPUID—CPU Identification” in this chapter). The aligned cache line size affected is also indicated with the CPUID instruction (bits 8 through 15 of the EBX register when the initial value in the EAX register is 1).\nThe memory attribute of the page containing the affected line has no effect on the behavior of this instruction. It should be noted that processors are free to speculatively fetch and cache data from system memory regions assigned a memory-type allowing for speculative reads (such as, the WB, WC, and WT memory types). PREFETCHh instructions can be used to provide the processor with hints for this speculative behavior. Because this speculative fetching can occur at any time and is not tied to instruction execution, the CLFLUSH instruction is not ordered with respect to PREFETCHh instructions or any of the speculative fetching mechanisms (that is, data can be specula-tively loaded into a cache line just before, during, or after the execution of a CLFLUSH instruction that references the cache line).\nCLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed to be ordered by any other fencing or seri-alizing instructions or by another CLFLUSH instruction. For example, software can use an MFENCE instruction to ensure that previous stores are included in the write-back.\nThe CLFLUSH instruction can be used at all privilege levels and is subject to all permission checking and faults asso-ciated with a byte load (and in addition, a CLFLUSH instruction is allowed to flush a linear address in an execute-only segment). Like a load, the CLFLUSH instruction sets the A bit but not the D bit in the page tables.\nThe CLFLUSH instruction was introduced with the SSE2 extensions; however, because it has its own CPUID feature flag, it can be implemented in IA-32 processors that do not include the SSE2 extensions. Also, detecting the pres-ence of the SSE2 extensions with the CPUID instruction does not guarantee that the CLFLUSH instruction is imple-mented in the processor.\nCLFLUSH operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "CLI",
"Alias": [],
"Brief": "Clear Interrupt Flag",
"Description": "\nIf protected-mode virtual interrupts are not enabled, CLI clears the IF flag in the EFLAGS register. No other flags are affected. Clearing the IF flag causes the processor to ignore maskable external interrupts. The IF flag and the CLI and STI instruction have no affect on the generation of exceptions and NMI interrupts.\nWhen protected-mode virtual interrupts are enabled, CPL is 3, and IOPL is less than 3; CLI clears the VIF flag in the EFLAGS register, leaving IF unaffected. Table 3-6 indicates the action of the CLI instruction depending on the processor operating mode and the CPL/IOPL of the running program or procedure.\nOperation is the same in all modes.\nTable 3-6. Decision Table for CLI Results\n\n\nPE\nVM\nIOPL\nCPL\nPVI\nVIP\nVME\nCLI Result\n\n0\nX\nX\nX\nX\nX\nX\nIF = 0\n\n1\n0\n≥ CPL\nX\nX\nX\nX\nIF = 0\n\n1\n0\n< CPL\n3\n1\nX\nX\nVIF = 0\n\n1\n0\n< CPL\n< 3\nX\nX\nX\nGP Fault\n\n1\n0\n< CPL\nX\n0\nX\nX\nGP Fault\n\n1\n1\n3\nX\nX\nX\nX\nIF = 0\n\n1\n1\n< 3\nX\nX\nX\n1\nVIF = 0\n\n1\n1\n< 3\nX\nX\nX\n0\nGP Fault\nNOTES:\n* X = This setting has no impact.\n"
},
{
"Name": "CLTS",
"Alias": [],
"Brief": "Clear Task",
"Description": "\nClears the task-switched (TS) flag in the CR0 register. This instruction is intended for use in operating-system procedures. It is a privileged instruction that can only be executed at a CPL of 0. It is allowed to be executed in real-address mode to allow initialization for protected mode.\nThe processor sets the TS flag every time a task switch occurs. The flag is used to synchronize the saving of FPU context in multitasking applications. See the description of the TS flag in the section titled “Control Registers” in Chapter 2 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for more information about this flag.\nCLTS operation is the same in non-64-bit modes and 64-bit mode.\nSee Chapter 25, “VMX Non-Root Operation,” of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3C, for more information about the behavior of this instruction in VMX non-root operation.\n"
},
{
"Name": "CMC",
"Alias": [],
"Brief": "Complement Carry Flag",
"Description": "\nComplements the CF flag in the EFLAGS register. CMC operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "CMOVcc",
"Alias": [
"CMOVA", "CMOVAE", "CMOVB", "CMOVBE", "CMOVC", "CMOVE", "CMOVG", "CMOVGE", "CMOVL", "CMOVLE", "CMOVNA", "CMOVNAE", "CMOVNB", "CMOVNBE", "CMOVNC", "CMOVNE", "CMOVNG", "CMOVNGE", "CMOVNL",
"CMOVNLE", "CMOVNO", "CMOVNP", "CMOVNS", "CMOVNZ", "CMOVO", "CMOVP", "CMOVPE", "CMOVPO", "CMOVS", "CMOVZ"
],
"Brief": "Conditional Move",
"Description": "\nThe CMOVcc instructions check the state of one or more of the status flags in the EFLAGS register (CF, OF, PF, SF, and ZF) and perform a move operation if the flags are in a specified state (or condition). A condition code (cc) is associated with each instruction to indicate the condition being tested for. If the condition is not satisfied, a move is not performed and execution continues with the instruction following the CMOVcc instruction.\nThese instructions can move 16-bit, 32-bit or 64-bit values from memory to a general-purpose register or from one general-purpose register to another. Conditional moves of 8-bit register operands are not supported.\nThe condition for each CMOVcc mnemonic is given in the description column of the above table. The terms “less” and “greater” are used for comparisons of signed integers and the terms “above” and “below” are used for unsigned integers.\nBecause a particular state of the status flags can sometimes be interpreted in two ways, two mnemonics are defined for some opcodes. For example, the CMOVA (conditional move if above) instruction and the CMOVNBE (conditional move if not below or equal) instruction are alternate mnemonics for the opcode 0F 47H.\nThe CMOVcc instructions were introduced in P6 family processors; however, these instructions may not be supported by all IA-32 processors. Software can determine if the CMOVcc instructions are supported by checking the processor’s feature information with the CPUID instruction (see “CPUID—CPU Identification” in this chapter).\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to addi-tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n"
},
{
"Name": "CMP",
"Alias": [],
"Brief": "Compare Two Operands",
"Description": "\nCompares the first source operand with the second source operand and sets the status flags in the EFLAGS register according to the results. The comparison is performed by subtracting the second operand from the first operand and then setting the status flags in the same manner as the SUB instruction. When an immediate value is used as an operand, it is sign-extended to the length of the first operand.\nThe condition codes used by the Jcc, CMOVcc, and SETcc instructions are based on the results of a CMP instruction. Appendix B, “EFLAGS Condition Codes,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, shows the relationship of the status flags and the condition codes.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to addi-tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n"
},
{
"Name": "CMPPD",
"Alias": [],
"Brief": "Compare Packed Double",
"Description": "\nPerforms a SIMD compare of the packed double-precision floating-point values in the source operand (second operand) and the destination operand (first operand) and returns the results of the comparison to the destination operand. The comparison predicate operand (third operand) specifies the type of comparison performed on each of the pairs of packed values. The result of each comparison is a quadword mask of all 1s (comparison true) or all 0s (comparison false). The sign of zero is ignored for comparisons, so that –0.0 is equal to +0.0.\n128-bit Legacy SSE version: The first source and destination operand (first operand) is an XMM register. The second source operand (second operand) can be an XMM register or 128-bit memory location. The comparison predicate operand is an 8-bit immediate, bits 2:0 of the immediate define the type of comparison to be performed (see Table 3-7). Bits 7:3 of the immediate is reserved. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. Two comparisons are performed with results written to bits 127:0 of the destination operand.\nTable 3-7. Comparison Predicate for CMPPD and CMPPS Instructions\n\n\nPredi-cate\nimm8 Encoding\nDescription\nRelation where: A Is 1st Operand B Is 2nd Operand\nEmulation\nResult if NaN Operand\nQNaN Oper-and Signals Invalid\n\nEQ\n000B\nEqual\nA = B\n\nFalse\nNo\n\nLT\n001B\nLess-than\nA < B\n\nFalse\nYes\n\nLE\n010B\nLess-than-or-equal\nA ≤ B\n\nFalse\nYes\n\n\n\nGreater than\nA > B\nSwap Operands, Use LT\nFalse\nYes\n\n\n\nGreater-than-or-equal\nA ≥ B\nSwap Operands, Use LE\nFalse\nYes\n\nUNORD\n011B\nUnordered\nA, B = Unordered\n\nTrue\nNo\n\nNEQ\n100B\nNot-equal\nA ≠ B\n\nTrue\nNo\n\nNLT\n101B\nNot-less-than\nNOT(A < B)\n\nTrue\nYes\nTable 3-7. Comparison Predicate for CMPPD and CMPPS Instructions (Contd.)\n\n\nPredi-cate\nimm8 Encoding\nDescription\nRelation where: A Is 1st Operand B Is 2nd Operand\nEmulation\nResult if NaN Operand\nQNaN Oper-and Signals Invalid\n\nNLE\n110B\nNot-less-than-or-equal\nNOT(A ≤ B)\n\nTrue\nYes\n\n\n\nNot-greater-than\nNOT(A > B)\nSwap Operands, Use NLT\nTrue\nYes\n\n\n\nNot-greater-than-or-equal\nNOT(A ≥ B)\nSwap Operands, Use NLE\nTrue\nYes\n\nORD\n111B\nOrdered\nA , B = Ordered\n\nFalse\nNo\nThe unordered relationship is true when at least one of the two source operands being compared is a NaN; the ordered relationship is true when neither source operand is a NaN.\nA subsequent computational instruction that uses the mask result in the destination operand as an input operand will not generate an exception, because a mask of all 0s corresponds to a floating-point value of +0.0 and a mask of all 1s corresponds to a QNaN.\nNote that the processors with “CPUID.1H:ECX.AVX =0” do not implement the greater-than, greater-than-or-equal, not-greater-than, and not-greater-than-or-equal relations. These comparisons can be made either by using the inverse relationship (that is, use the “not-less-than-or-equal” to make a “greater-than” comparison) or by using software emulation. When using software emulation, the program must swap the operands (copying registers when necessary to protect the data that will now be in the destination), and then perform the compare using a different predicate. The predicate to be used for these emulations is listed in Table 3-7 under the heading Emula-tion.\nCompilers and assemblers may implement the following two-operand pseudo-ops in addition to the three-operand CMPPD instruction, for processors with “CPUID.1H:ECX.AVX =0”. See Table 3-8. Compiler should treat reserved Imm8 values as illegal syntax.\nTable 3-8. Pseudo-Op and CMPPD Implementation\n:\n\n\nPseudo-Op\nCMPPD Implementation\n\nCMPEQPD xmm1, xmm2\nCMPPD xmm1, xmm2, 0\n\nCMPLTPD xmm1, xmm2\nCMPPD xmm1, xmm2, 1\n\nCMPLEPD xmm1, xmm2\nCMPPD xmm1, xmm2, 2\n\nCMPUNORDPD xmm1, xmm2\nCMPPD xmm1, xmm2, 3\n\nCMPNEQPD xmm1, xmm2\nCMPPD xmm1, xmm2, 4\n\nCMPNLTPD xmm1, xmm2\nCMPPD xmm1, xmm2, 5\n\nCMPNLEPD xmm1, xmm2\nCMPPD xmm1, xmm2, 6\n\nCMPORDPD xmm1, xmm2\nCMPPD xmm1, xmm2, 7\nThe greater-than relations that the processor does not implement, require more than one instruction to emulate in software and therefore should not be implemented as pseudo-ops. (For these, the programmer should reverse the operands of the corresponding less than relations and use move instructions to ensure that the mask is moved to the correct destination register and that the source operand is left intact.)\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nEnhanced Comparison Predicate for VEX-Encoded VCMPPD\nVEX.128 encoded version: The first source operand (second operand) is an XMM register. The second source operand (third operand) can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destina-tion YMM register are zeroed. Two comparisons are performed with results written to bits 127:0 of the destination operand.\nVEX.256 encoded version: The first source operand (second operand) is a YMM register. The second source operand (third operand) can be a YMM register or a 256-bit memory location. The destination operand (first operand) is a YMM register. Four comparisons are performed with results written to the destination operand.\nThe comparison predicate operand is an 8-bit immediate:\nTable 3-9. Comparison Predicate for VCMPPD and VCMPPS Instructions\n\n\nPredicate\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nA >B\nA < B\nA = B\nUnordered1\non QNAN\n\nEQ_OQ (EQ)\n0H\nEqual (ordered, non-signaling)\nFalse\nFalse\nTrue\nFalse\nNo\n\nLT_OS (LT)\n1H\nLess-than (ordered, signaling)\nFalse\nTrue\nFalse\nFalse\nYes\n\nLE_OS (LE)\n2H\nLess-than-or-equal (ordered, signaling)\nFalse\nTrue\nTrue\nFalse\nYes\n\nUNORD_Q (UNORD)\n3H\nUnordered (non-signaling)\nFalse\nFalse\nFalse\nTrue\nNo\n\nNEQ_UQ (NEQ)\n4H\nNot-equal (unordered, non-signaling)\nTrue\nTrue\nFalse\nTrue\nNo\n\nNLT_US (NLT)\n5H\nNot-less-than (unordered, signaling)\nTrue\nFalse\nTrue\nTrue\nYes\n\nNLE_US (NLE)\n6H\nNot-less-than-or-equal (unordered, signaling)\nTrue\nFalse\nFalse\nTrue\nYes\n\nORD_Q (ORD)\n7H\nOrdered (non-signaling)\nTrue\nTrue\nTrue\nFalse\nNo\n\nEQ_UQ\n8H\nEqual (unordered, non-signaling)\nFalse\nFalse\nTrue\nTrue\nNo\n\nNGE_US (NGE)\n9H\nNot-greater-than-or-equal (unordered, signaling)\nFalse\nTrue\nFalse\nTrue\nYes\n\nNGT_US (NGT)\nAH\nNot-greater-than (unordered, sig-naling)\nFalse\nTrue\nTrue\nTrue\nYes\n\nFALSE_OQ(FALSE)\nBH\nFalse (ordered, non-signaling)\nFalse\nFalse\nFalse\nFalse\nNo\n\nNEQ_OQ\nCH\nNot-equal (ordered, non-signaling)\nTrue\nTrue\nFalse\nFalse\nNo\n\nGE_OS (GE)\nDH\nGreater-than-or-equal (ordered, sig-naling)\nTrue\nFalse\nTrue\nFalse\nYes\n\nGT_OS (GT)\nEH\nGreater-than (ordered, signaling)\nTrue\nFalse\nFalse\nFalse\nYes\n\nTRUE_UQ(TRUE)\nFH\nTrue (unordered, non-signaling)\nTrue\nTrue\nTrue\nTrue\nNo\n\nEQ_OS\n10H\nEqual (ordered, signaling)\nFalse\nFalse\nTrue\nFalse\nYes\n\nLT_OQ\n11H\nLess-than (ordered, nonsignaling)\nFalse\nTrue\nFalse\nFalse\nNo\n\nLE_OQ\n12H\nLess-than-or-equal (ordered, non-signaling)\nFalse\nTrue\nTrue\nFalse\nNo\n\nUNORD_S\n13H\nUnordered (signaling)\nFalse\nFalse\nFalse\nTrue\nYes\n\nNEQ_US\n14H\nNot-equal (unordered, signaling)\nTrue\nTrue\nFalse\nTrue\nYes\n\nNLT_UQ\n15H\nNot-less-than (unordered, nonsig-naling)\nTrue\nFalse\nTrue\nTrue\nNo\n\nNLE_UQ\n16H\nNot-less-than-or-equal (unordered, nonsignaling)\nTrue\nFalse\nFalse\nTrue\nNo\n\nORD_S\n17H\nOrdered (signaling)\nTrue\nTrue\nTrue\nFalse\nYes\n\nEQ_US\n18H\nEqual (unordered, signaling)\nFalse\nFalse\nTrue\nTrue\nYes\nTable 3-9. Comparison Predicate for VCMPPD and VCMPPS Instructions (Contd.)\n\n\nPredicate\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nimm8\nDescription\nResult: A Is 1st Operand, B Is 2nd Operand\nSignals #IA\nValue\n\nA >B\nA < B\nA = B\nUnordered1\non QNAN\n\nNGE_UQ\n19H\nNot-greater-than-or-equal (unor-dered, nonsignaling)\nFalse\nTrue\nFalse\nTrue\nNo\n\nNGT_UQ\n1AH\nNot-greater-than (unordered, non-signaling)\nFalse\nTrue\nTrue\nTrue\nNo\n\nFALSE_OS\n1BH\nFalse (ordered, signaling)\nFalse\nFalse\nFalse\nFalse\nYes\n\nNEQ_OS\n1CH\nNot-equal (ordered, signaling)\nTrue\nTrue\nFalse\nFalse\nYes\n\nGE_OQ\n1DH\nGreater-than-or-equal (ordered, nonsignaling)\nTrue\nFalse\nTrue\nFalse\nNo\n\nGT_OQ\n1EH\nGreater-than (ordered, nonsignal-ing)\nTrue\nFalse\nFalse\nFalse\nNo\n\nTRUE_US\n1FH\nTrue (unordered, signaling)\nTrue\nTrue\nTrue\nTrue\nYes\nNOTES:\n1. If either operand A or B is a NAN.\nProcessors with “CPUID.1H:ECX.AVX =1” implement the full complement of 32 predicates shown in Table 3-9, soft-ware emulation is no longer needed. Compilers and assemblers may implement the following three-operand pseudo-ops in addition to the four-operand VCMPPD instruction. See Table 3-10, where the notations of reg1 reg2, and reg3 represent either XMM registers or YMM registers. Compiler should treat reserved Imm8 values as illegal syntax. Alternately, intrinsics can map the pseudo-ops to pre-defined constants to support a simpler intrinsic inter-face.\nTable 3-10. Pseudo-Op and VCMPPD Implementation\n:\n\n\nPseudo-Op\nCMPPD Implementation\n\nVCMPEQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 0\n\nVCMPLTPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 1\n\nVCMPLEPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 2\n\nVCMPUNORDPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 3\n\nVCMPNEQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 4\n\nVCMPNLTPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 5\n\nVCMPNLEPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 6\n\nVCMPORDPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 7\n\nVCMPEQ_UQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 8\n\nVCMPNGEPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 9\n\nVCMPNGTPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 0AH\n\nVCMPFALSEPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 0BH\n\nVCMPNEQ_OQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 0CH\n\nVCMPGEPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 0DH\n\nVCMPGTPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 0EH\n\nVCMPTRUEPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 0FH\n\nVCMPEQ_OSPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 10H\n\nVCMPLT_OQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 11H\n\nVCMPLE_OQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 12H\nTable 3-10. Pseudo-Op and VCMPPD Implementation\n\n\nPseudo-Op\nCMPPD Implementation\n\nVCMPUNORD_SPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 13H\n\nVCMPNEQ_USPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 14H\n\nVCMPNLT_UQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 15H\n\nVCMPNLE_UQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 16H\n\nVCMPORD_SPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 17H\n\nVCMPEQ_USPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 18H\n\nVCMPNGE_UQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 19H\n\nVCMPNGT_UQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 1AH\n\nVCMPFALSE_OSPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 1BH\n\nVCMPNEQ_OSPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 1CH\n\nVCMPGE_OQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 1DH\n\nVCMPGT_OQPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 1EH\n\nVCMPTRUE_USPD reg1, reg2, reg3\nVCMPPD reg1, reg2, reg3, 1FH\n"
},
{
"Name": "CMPPS",
"Alias": [],
"Brief": "Compare Packed Single",
"Description": "\nPerforms a SIMD compare of the packed single-precision floating-point values in the source operand (second operand) and the destination operand (first operand) and returns the results of the comparison to the destination operand. The comparison predicate operand (third operand) specifies the type of comparison performed on each of the pairs of packed values. The result of each comparison is a doubleword mask of all 1s (comparison true) or all 0s (comparison false). The sign of zero is ignored for comparisons, so that –0.0 is equal to +0.0.\n128-bit Legacy SSE version: The first source and destination operand (first operand) is an XMM register. The second source operand (second operand) can be an XMM register or 128-bit memory location. The comparison predicate operand is an 8-bit immediate, bits 2:0 of the immediate define the type of comparison to be performed (see Table 3-7). Bits 7:3 of the immediate is reserved. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged. Four comparisons are performed with results written to bits 127:0 of the destination operand.\nThe unordered relationship is true when at least one of the two source operands being compared is a NaN; the ordered relationship is true when neither source operand is a NaN.\nA subsequent computational instruction that uses the mask result in the destination operand as an input operand will not generate a fault, because a mask of all 0s corresponds to a floating-point value of +0.0 and a mask of all 1s corresponds to a QNaN.\nNote that processors with “CPUID.1H:ECX.AVX =0” do not implement the “greater-than”, “greater-than-or-equal”, “not-greater than”, and “not-greater-than-or-equal relations” predicates. These comparisons can be made either by using the inverse relationship (that is, use the “not-less-than-or-equal” to make a “greater-than” comparison) or by using software emulation. When using software emulation, the program must swap the operands (copying registers when necessary to protect the data that will now be in the destination), and then perform the compare using a different predicate. The predicate to be used for these emulations is listed in Table 3-7 under the heading Emulation.\nCompilers and assemblers may implement the following two-operand pseudo-ops in addition to the three-operand CMPPS instruction, for processors with “CPUID.1H:ECX.AVX =0”. See Table 3-11. Compiler should treat reserved Imm8 values as illegal syntax.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nTable 3-11. Pseudo-Ops and CMPPS\n\n\nPseudo-Op\nImplementation\n\nCMPEQPS xmm1, xmm2\nCMPPS xmm1, xmm2, 0\n\nCMPLTPS xmm1, xmm2\nCMPPS xmm1, xmm2, 1\n\nCMPLEPS xmm1, xmm2\nCMPPS xmm1, xmm2, 2\n\nCMPUNORDPS xmm1, xmm2\nCMPPS xmm1, xmm2, 3\n\nCMPNEQPS xmm1, xmm2\nCMPPS xmm1, xmm2, 4\n\nCMPNLTPS xmm1, xmm2\nCMPPS xmm1, xmm2, 5\n\nCMPNLEPS xmm1, xmm2\nCMPPS xmm1, xmm2, 6\n\nCMPORDPS xmm1, xmm2\nCMPPS xmm1, xmm2, 7\nThe greater-than relations not implemented by processor require more than one instruction to emulate in software and therefore should not be implemented as pseudo-ops. (For these, the programmer should reverse the operands of the corresponding less than relations and use move instructions to ensure that the mask is moved to the correct destination register and that the source operand is left intact.)\nEnhanced Comparison Predicate for VEX-Encoded VCMPPS\nVEX.128 encoded version: The first source operand (second operand) is an XMM register. The second source operand (third operand) can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destina-tion YMM register are zeroed. Four comparisons are performed with results written to bits 127:0 of the destination operand.\nVEX.256 encoded version: The first source operand (second operand) is a YMM register. The second source operand (third operand) can be a YMM register or a 256-bit memory location. The destination operand (first operand) is a YMM register. Eight comparisons are performed with results written to the destination operand.\nThe comparison predicate operand is an 8-bit immediate:\nProcessors with “CPUID.1H:ECX.AVX =1” implement the full complement of 32 predicates shown in Table 3-9, soft-ware emulation is no longer needed. Compilers and assemblers may implement the following three-operand pseudo-ops in addition to the four-operand VCMPPS instruction. See Table 3-12, where the notation of reg1 and reg2 represent either XMM registers or YMM registers. Compiler should treat reserved Imm8 values as illegal syntax. Alternately, intrinsics can map the pseudo-ops to pre-defined constants to support a simpler intrinsic inter-face.\nTable 3-12. Pseudo-Op and VCMPPS Implementation\n:\n\n\nPseudo-Op\nCMPPS Implementation\n\nVCMPEQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 0\n\nVCMPLTPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 1\n\nVCMPLEPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 2\n\nVCMPUNORDPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 3\n\nVCMPNEQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 4\n\nVCMPNLTPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 5\n\nVCMPNLEPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 6\n\nVCMPORDPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 7\n\nVCMPEQ_UQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 8\n\nVCMPNGEPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 9\n\nVCMPNGTPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 0AH\n\nVCMPFALSEPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 0BH\nTable 3-12. Pseudo-Op and VCMPPS Implementation\n\n\nPseudo-Op\nCMPPS Implementation\n\nVCMPNEQ_OQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 0CH\n\nVCMPGEPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 0DH\n\nVCMPGTPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 0EH\n\nVCMPTRUEPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 0FH\n\nVCMPEQ_OSPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 10H\n\nVCMPLT_OQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 11H\n\nVCMPLE_OQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 12H\n\nVCMPUNORD_SPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 13H\n\nVCMPNEQ_USPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 14H\n\nVCMPNLT_UQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 15H\n\nVCMPNLE_UQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 16H\n\nVCMPORD_SPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 17H\n\nVCMPEQ_USPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 18H\n\nVCMPNGE_UQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 19H\n\nVCMPNGT_UQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 1AH\n\nVCMPFALSE_OSPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 1BH\n\nVCMPNEQ_OSPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 1CH\n\nVCMPGE_OQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 1DH\n\nVCMPGT_OQPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 1EH\n\nVCMPTRUE_USPS reg1, reg2, reg3\nVCMPPS reg1, reg2, reg3, 1FH\n"
},
{
"Name": "CMPS",
"Alias": [
"CMPSB",
"CMPSW",
"CMPSD",
"CMPSQ"
],
"Brief": "Compare String Operands",
"Description": "\nCompares the byte, word, doubleword, or quadword specified with the first source operand with the byte, word, doubleword, or quadword specified with the second source operand and sets the status flags in the EFLAGS register according to the results.\nBoth source operands are located in memory. The address of the first source operand is read from DS:SI, DS:ESI or RSI (depending on the address-size attribute of the instruction is 16, 32, or 64, respectively). The address of the second source operand is read from ES:DI, ES:EDI or RDI (again depending on the address-size attribute of the\ninstruction is 16, 32, or 64). The DS segment may be overridden with a segment override prefix, but the ES segment cannot be overridden.\nAt the assembly-code level, two forms of this instruction are allowed: the “explicit-operands” form and the “no-operands” form. The explicit-operands form (specified with the CMPS mnemonic) allows the two source operands to be specified explicitly. Here, the source operands should be symbols that indicate the size and location of the source values. This explicit-operand form is provided to allow documentation. However, note that the documenta-tion provided by this form can be misleading. That is, the source operand symbols must specify the correct type (size) of the operands (bytes, words, or doublewords, quadwords), but they do not have to specify the correct loca-tion. Locations of the source operands are always specified by the DS:(E)SI (or RSI) and ES:(E)DI (or RDI) regis-ters, which must be loaded correctly before the compare string instruction is executed.\nThe no-operands form provides “short forms” of the byte, word, and doubleword versions of the CMPS instructions. Here also the DS:(E)SI (or RSI) and ES:(E)DI (or RDI) registers are assumed by the processor to specify the loca-tion of the source operands. The size of the source operands is selected with the mnemonic: CMPSB (byte compar-ison), CMPSW (word comparison), CMPSD (doubleword comparison), or CMPSQ (quadword comparison using REX.W).\nAfter the comparison, the (E/R)SI and (E/R)DI registers increment or decrement automatically according to the setting of the DF flag in the EFLAGS register. (If the DF flag is 0, the (E/R)SI and (E/R)DI register increment; if the DF flag is 1, the registers decrement.) The registers increment or decrement by 1 for byte operations, by 2 for word operations, 4 for doubleword operations. If operand size is 64, RSI and RDI registers increment by 8 for quadword operations.\nThe CMPS, CMPSB, CMPSW, CMPSD, and CMPSQ instructions can be preceded by the REP prefix for block compar-isons. More often, however, these instructions will be used in a LOOP construct that takes some action based on the setting of the status flags before the next comparison is made. See “REP/REPE/REPZ /REPNE/REPNZ—Repeat String Operation Prefix” in Chapter 4 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B, for a description of the REP prefix.\nIn 64-bit mode, the instruction’s default address size is 64 bits, 32 bit address size is supported using the prefix 67H. Use of the REX.W prefix promotes doubleword operation to 64 bits (see CMPSQ). See the summary chart at the beginning of this section for encoding data and limits.\n"
},
{
"Name": "CMPSS",
"Alias": [],
"Brief": "Compare Scalar Single",
"Description": "\nCompares the low single-precision floating-point values in the source operand (second operand) and the destina-tion operand (first operand) and returns the results of the comparison to the destination operand. The comparison predicate operand (third operand) specifies the type of comparison performed. The comparison result is a double-word mask of all 1s (comparison true) or all 0s (comparison false). The sign of zero is ignored for comparisons, so that –0.0 is equal to +0.0.\n128-bit Legacy SSE version: The first source and destination operand (first operand) is an XMM register. The second source operand (second operand) can be an XMM register or 64-bit memory location. The comparison pred-icate operand is an 8-bit immediate, bits 2:0 of the immediate define the type of comparison to be performed (see Table 3-7). Bits 7:3 of the immediate is reserved. Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged.\nThe unordered relationship is true when at least one of the two source operands being compared is a NaN; the ordered relationship is true when neither source operand is a NaN\nA subsequent computational instruction that uses the mask result in the destination operand as an input operand will not generate a fault, since a mask of all 0s corresponds to a floating-point value of +0.0 and a mask of all 1s corresponds to a QNaN.\nNote that processors with “CPUID.1H:ECX.AVX =0” do not implement the “greater-than”, “greater-than-or-equal”, “not-greater than”, and “not-greater-than-or-equal relations” predicates. These comparisons can be made either by using the inverse relationship (that is, use the “not-less-than-or-equal” to make a “greater-than” comparison) or by using software emulation. When using software emulation, the program must swap the operands (copying registers when necessary to protect the data that will now be in the destination operand), and then perform the compare using a different predicate. The predicate to be used for these emulations is listed in Table 3-7 under the heading Emulation.\nCompilers and assemblers may implement the following two-operand pseudo-ops in addition to the three-operand CMPSS instruction, for processors with “CPUID.1H:ECX.AVX =0”. See Table 3-15. Compiler should treat reserved Imm8 values as illegal syntax.\nTable 3-15. Pseudo-Ops and CMPSS\n\n\nPseudo-Op\nCMPSS Implementation\n\nCMPEQSS xmm1, xmm2\nCMPSS xmm1, xmm2, 0\n\nCMPLTSS xmm1, xmm2\nCMPSS xmm1, xmm2, 1\n\nCMPLESS xmm1, xmm2\nCMPSS xmm1, xmm2, 2\n\nCMPUNORDSS xmm1, xmm2\nCMPSS xmm1, xmm2, 3\n\nCMPNEQSS xmm1, xmm2\nCMPSS xmm1, xmm2, 4\n\nCMPNLTSS xmm1, xmm2\nCMPSS xmm1, xmm2, 5\n\nCMPNLESS xmm1, xmm2\nCMPSS xmm1, xmm2, 6\n\nCMPORDSS xmm1, xmm2\nCMPSS xmm1, xmm2, 7\nThe greater-than relations not implemented in the processor require more than one instruction to emulate in soft-ware and therefore should not be implemented as pseudo-ops. (For these, the programmer should reverse the operands of the corresponding less than relations and use move instructions to ensure that the mask is moved to the correct destination register and that the source operand is left intact.)\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nEnhanced Comparison Predicate for VEX-Encoded VCMPSD\nVEX.128 encoded version: The first source operand (second operand) is an XMM register. The second source operand (third operand) can be an XMM register or a 32-bit memory location. Bits (VLMAX-1:128) of the destina-tion YMM register are zeroed. The comparison predicate operand is an 8-bit immediate:\nProcessors with “CPUID.1H:ECX.AVX =1” implement the full complement of 32 predicates shown in Table 3-9, soft-ware emulation is no longer needed. Compilers and assemblers may implement the following three-operand pseudo-ops in addition to the four-operand VCMPSS instruction. See Table 3-16, where the notations of reg1 reg2, and reg3 represent either XMM registers or YMM registers. Compiler should treat reserved Imm8 values as illegal syntax. Alternately, intrinsics can map the pseudo-ops to pre-defined constants to support a simpler intrinsic inter-face.\nTable 3-16. Pseudo-Op and VCMPSS Implementation\n:\n\n\nPseudo-Op\nCMPSS Implementation\n\nVCMPEQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 0\n\nVCMPLTSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 1\n\nVCMPLESS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 2\n\nVCMPUNORDSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 3\n\nVCMPNEQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 4\n\nVCMPNLTSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 5\n\nVCMPNLESS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 6\n\nVCMPORDSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 7\n\nVCMPEQ_UQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 8\n\nVCMPNGESS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 9\n\nVCMPNGTSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 0AH\n\nVCMPFALSESS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 0BH\n\nVCMPNEQ_OQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 0CH\n\nVCMPGESS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 0DH\n\nVCMPGTSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 0EH\nTable 3-16. Pseudo-Op and VCMPSS Implementation (Contd.)\n\n\nPseudo-Op\nCMPSS Implementation\n\nVCMPTRUESS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 0FH\n\nVCMPEQ_OSSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 10H\n\nVCMPLT_OQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 11H\n\nVCMPLE_OQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 12H\n\nVCMPUNORD_SSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 13H\n\nVCMPNEQ_USSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 14H\n\nVCMPNLT_UQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 15H\n\nVCMPNLE_UQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 16H\n\nVCMPORD_SSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 17H\n\nVCMPEQ_USSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 18H\n\nVCMPNGE_UQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 19H\n\nVCMPNGT_UQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 1AH\n\nVCMPFALSE_OSSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 1BH\n\nVCMPNEQ_OSSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 1CH\n\nVCMPGE_OQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 1DH\n\nVCMPGT_OQSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 1EH\n\nVCMPTRUE_USSS reg1, reg2, reg3\nVCMPSS reg1, reg2, reg3, 1FH\n"
},
{
"Name": "CMPXCHG",
"Alias": [],
"Brief": "Compare and Exchange",
"Description": "\nCompares the value in the AL, AX, EAX, or RAX register with the first operand (destination operand). If the two values are equal, the second operand (source operand) is loaded into the destination operand. Otherwise, the destination operand is loaded into the AL, AX, EAX or RAX register. RAX register is available only in 64-bit mode.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to addi-tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.\n"
},
{
"Name": "CMPXCHG8B",
"Alias": [
"CMPXCHG16B"
],
"Brief": "Compare and Exchange Bytes",
"Description": "\nCompares the 64-bit value in EDX:EAX (or 128-bit value in RDX:RAX if operand size is 128 bits) with the operand (destination operand). If the values are equal, the 64-bit value in ECX:EBX (or 128-bit value in RCX:RBX) is stored in the destination operand. Otherwise, the value in the destination operand is loaded into EDX:EAX (or RDX:RAX). The destination operand is an 8-byte memory location (or 16-byte memory location if operand size is 128 bits). For the EDX:EAX and ECX:EBX register pairs, EDX and ECX contain the high-order 32 bits and EAX and EBX contain the low-order 32 bits of a 64-bit value. For the RDX:RAX and RCX:RBX register pairs, RDX and RCX contain the high-order 64 bits and RAX and RBX contain the low-order 64bits of a 128-bit value.\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)\nIn 64-bit mode, default operation size is 64 bits. Use of the REX.W prefix promotes operation to 128 bits. Note that CMPXCHG16B requires that the destination (memory) operand be 16-byte aligned. See the summary chart at the beginning of this section for encoding data and limits. For information on the CPUID flag that indicates CMPXCHG16B, see page 3-175.\n"
},
{
"Name": "COMISD",
"Alias": [],
"Brief": "Compare Scalar Ordered Double",
"Description": "\nCompares the double-precision floating-point values in the low quadwords of operand 1 (first operand) and operand 2 (second operand), and sets the ZF, PF, and CF flags in the EFLAGS register according to the result (unor-dered, greater than, less than, or equal). The OF, SF and AF flags in the EFLAGS register are set to 0. The unor-dered result is returned if either source operand is a NaN (QNaN or SNaN).The sign of zero is ignored for comparisons, so that –0.0 is equal to +0.0.\nOperand 1 is an XMM register; operand 2 can be an XMM register or a 64 bit memory location.\nThe COMISD instruction differs from the UCOMISD instruction in that it signals a SIMD floating-point invalid oper-ation exception (#I) when a source operand is either a QNaN or SNaN. The UCOMISD instruction signals an invalid numeric exception only if a source operand is an SNaN.\nThe EFLAGS register is not updated if an unmasked SIMD floating-point exception is generated.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n"
},
{
"Name": "COMISS",
"Alias": [],
"Brief": "Compare Scalar Ordered Single",
"Description": "\nCompares the single-precision floating-point values in the low doublewords of operand 1 (first operand) and operand 2 (second operand), and sets the ZF, PF, and CF flags in the EFLAGS register according to the result (unor-dered, greater than, less than, or equal). The OF, SF, and AF flags in the EFLAGS register are set to 0. The unor-dered result is returned if either source operand is a NaN (QNaN or SNaN). The sign of zero is ignored for comparisons, so that –0.0 is equal to +0.0.\nOperand 1 is an XMM register; Operand 2 can be an XMM register or a 32 bit memory location.\nThe COMISS instruction differs from the UCOMISS instruction in that it signals a SIMD floating-point invalid opera-tion exception (#I) when a source operand is either a QNaN or SNaN. The UCOMISS instruction signals an invalid numeric exception only if a source operand is an SNaN.\nThe EFLAGS register is not updated if an unmasked SIMD floating-point exception is generated.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n"
},
{
"Name": "CPUID",
"Alias": [],
"Brief": "CPU Identification",
"Description": "\nThe ID flag (bit 21) in the EFLAGS register indicates support for the CPUID instruction. If a software procedure can set and clear this flag, the processor executing the procedure supports the CPUID instruction. This instruction oper-ates the same in non-64-bit modes and 64-bit mode.\nCPUID returns processor identification and feature information in the EAX, EBX, ECX, and EDX registers.1 The instruction’s output is dependent on the contents of the EAX register upon execution (in some cases, ECX as well). For example, the following pseudocode loads EAX with 00H and causes CPUID to return a Maximum Return Value and the Vendor Identification String in the appropriate registers:\nMOV EAX, 00H\nCPUID\nTable 3-17 shows information returned, depending on the initial value loaded into the EAX register. Table 3-18 shows the maximum CPUID input value recognized for each family of IA-32 processors on which CPUID is imple-mented.\nTwo types of information are returned: basic and extended function information. If a value entered for CPUID.EAX is higher than the maximum input value for basic or extended function for that processor then the data for the highest basic information leaf is returned. For example, using the Intel Core i7 processor, the following is true:\nCPUID.EAX = 05H (* Returns MONITOR/MWAIT leaf. *)\nCPUID.EAX = 0AH (* Returns Architectural Performance Monitoring leaf. *)\nCPUID.EAX = 0BH (* Returns Extended Topology Enumeration leaf. *)\nCPUID.EAX = 0CH (* INVALID: Returns the same information as CPUID.EAX = 0BH. *)\nCPUID.EAX = 80000008H (* Returns linear/physical address size data. *)\nCPUID.EAX = 8000000AH (* INVALID: Returns same information as CPUID.EAX = 0BH. *)\nIf a value entered for CPUID.EAX is less than or equal to the maximum input value and the leaf is not supported on that processor then 0 is returned in all the registers. For example, using the Intel Core i7 processor, the following is true:\nCPUID.EAX = 07H (*Returns EAX=EBX=ECX=EDX=0. *)\nWhen CPUID returns the highest basic leaf information as a result of an invalid input EAX value, any dependence on input ECX value in the basic leaf is honored.\nCPUID can be executed at any privilege level to serialize instruction execution. Serializing instruction execution guarantees that any modifications to flags, registers, and memory for previous instructions are completed before the next instruction is fetched and executed.\nSee also:\n“Serializing Instructions” in Chapter 8, “Multiple-Processor Management,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A.\n1.\nOn Intel 64 processors, CPUID clears the high 32 bits of the RAX/RBX/RCX/RDX registers in all modes.\n“Caching Translation Information” in Chapter 4, “Paging,” in the Intel® 64 and IA-32 Architectures Software Devel-oper’s Manual, Volume 3A.\nTable 3-17. Information Returned by CPUID Instruction\nInitial EAX\nValue\nInformation Provided about the Processor\nBasic CPUID Information\n0H\nEAX\nMaximum Input Value for Basic CPUID Information (see Table 3-18)\nEBX\n“Genu”\nECX\n“ntel”\nEDX\n“ineI”\n01H\nEAX\nVersion Information: Type, Family, Model, and Stepping ID (see Figure 3-5)\nEBX\nBits 07-00: Brand Index Bits 15-08: CLFLUSH line size (Value ∗ 8 = cache line size in bytes) Bits 23-16: Maximum number of addressable IDs for logical processors in this physical package*. Bits 31-24: Initial APIC ID\nECX\nFeature Information (see Figure 3-6 and Table 3-20)\nEDX\nFeature Information (see Figure 3-7 and Table 3-21)\nNOTES:\n*\nThe nearest power-of-2 integer that is not smaller than EBX[23:16] is the number of unique initial APIC IDs reserved for addressing different logical processors in a physical package. This field is only valid if CPUID.1.EDX.HTT[bit 28]= 1.\n02H\nEAX\nCache and TLB Information (see Table 3-22)\nEBX\nCache and TLB Information\nECX\nCache and TLB Information\nEDX\nCache and TLB Information\n03H\nEAX\nReserved.\nEBX\nReserved.\nECX\nBits 00-31 of 96 bit processor serial number. (Available in Pentium III processor only; otherwise, the value in this register is reserved.)\nEDX\nBits 32-63 of 96 bit processor serial number. (Available in Pentium III processor only; otherwise, the value in this register is reserved.)\nNOTES:\nProcessor serial number (PSN) is not supported in the Pentium 4 processor or later. On all models, use the PSN flag (returned using CPUID) to check for PSN support before accessing the feature.\nSee AP-485, Intel Processor Identification and the CPUID Instruction (Order Number 241618) for more information on PSN.\nCPUID leaves > 3 < 80000000 are visible only when IA32_MISC_ENABLE.BOOT_NT4[bit 22] = 0 (default).\nDeterministic Cache Parameters Leaf\nNOTES:\n04H\nLeaf 04H output depends on the initial value in ECX.*\nSee also: “INPUT EAX = 4: Returns Deterministic Cache Parameters for each level on page 3-182.\nEAX\nBits 04-00: Cache Type Field\n0 = Null - No more caches 1 = Data Cache 2 = Instruction Cache 3 = Unified Cache 4-31 = Reserved\nTable 3-17. Information Returned by CPUID Instruction (Contd.)\n\n\n\nInitial EAX\nValue\nInformation Provided about the Processor\n\n\n\nBits 07-05: Cache Level (starts at 1) Bit 08: Self Initializing cache level (does not need SW initialization) Bit 09: Fully Associative cache\nBits 13-10: Reserved Bits 25-14: Maximum number of addressable IDs for logical processors sharing this cache**,\n***\nBits 31-26: Maximum number of addressable IDs for processor cores in the physical package**,\n****,\n*****\nEBX\nBits 11-00: L = System Coherency Line Size** Bits 21-12: P = Physical Line partitions** Bits 31-22: W = Ways of associativity**\nECX\nBits 31-00: S = Number of Sets**\nEDX\nBit 0: Write-Back Invalidate/Invalidate\n0 = WBINVD/INVD from threads sharing this cache acts upon lower level caches for threads sharing this cache. 1 = WBINVD/INVD is not guaranteed to act upon lower level caches of non-originating threads sharing this cache.\nBit 1: Cache Inclusiveness\n0 = Cache is not inclusive of lower cache levels. 1 = Cache is inclusive of lower cache levels.\nBit 2: Complex Cache Indexing\n0 = Direct mapped cache. 1 = A complex function is used to index the cache, potentially using all address bits.\nBits 31-03: Reserved = 0\n\n\n\n\n\n\nNOTES:\n* If ECX contains an invalid sub leaf index, EAX/EBX/ECX/EDX return 0. Sub-leaf index n+1 is invalid if sub-\nleaf n returns EAX[4:0] as 0.\n** Add one to the return value to get the result.\n***The nearest power-of-2 integer that is not smaller than (1 + EAX[25:14]) is the number of unique ini-\ntial APIC IDs reserved for addressing different logical processors sharing this cache\n**** The nearest power-of-2 integer that is not smaller than (1 + EAX[31:26]) is the number of unique\nCore_IDs reserved for addressing different processor cores in a physical package. Core ID is a subset of bits of the initial APIC ID.\n***** The returned value is constant for valid initial values in ECX. Valid ECX values start from 0.\nMONITOR/MWAIT Leaf\n05H\nEAX\nBits 15-00: Smallest monitor-line size in bytes (default is processor's monitor granularity) Bits 31-16: Reserved = 0\nEBX\nBits 15-00: Largest monitor-line size in bytes (default is processor's monitor granularity) Bits 31-16: Reserved = 0\nECX\nBit 00: Enumeration of Monitor-Mwait extensions (beyond EAX and EBX registers) supported\nBit 01: Supports treating interrupts as break-event for MWAIT, even when interrupts disabled\nBits 31 - 02: Reserved\nTable 3-17. Information Returned by CPUID Instruction (Contd.)\n\n\n\nInitial EAX\nValue\nInformation Provided about the Processor\n\n\n\nEDX\nBits 03 - 00: Number of C0* sub C-states supported using MWAIT Bits 07 - 04: Number of C1* sub C-states supported using MWAIT Bits 11 - 08: Number of C2* sub C-states supported using MWAIT Bits 15 - 12: Number of C3* sub C-states supported using MWAIT Bits 19 - 16: Number of C4* sub C-states supported using MWAIT Bits 23 - 20: Number of C5* sub C-states supported using MWAIT Bits 27 - 24: Number of C6* sub C-states supported using MWAIT Bits 31 - 28: Number of C7* sub C-states supported using MWAIT\nNOTE:\n* The definition of C0 through C7 states for MWAIT extension are processor-specific C-states, not ACPI C-\nstates.\n\n\nThermal and Power Management Leaf\n\n06H\n\nEAX\nBit 00: Digital temperature sensor is supported if set Bit 01: Intel Turbo Boost Technology Available (see description of IA32_MISC_ENABLE[38]). Bit 02: ARAT. APIC-Timer-always-running feature is supported if set. Bit 03: Reserved Bit 04: PLN. Power limit notification controls are supported if set. Bit 05: ECMD. Clock modulation duty cycle extension is supported if set. Bit 06: PTM. Package thermal management is supported if set. Bit 07: HWP. HWP base registers (IA32_PM_ENALBE[bit 0], IA32_HWP_CAPABILITIES, IA32_HWP_REQUEST, IA32_HWP_STATUS) are supported if set. Bit 08: HWP_Notification. IA32_HWP_INTERRUPT MSR is supported if set. Bit 09: HWP_Activity_Window. IA32_HWP_REQUEST[bits 41:32] is supported if set. Bit 10: HWP_Energy_Performance_Preference. IA32_HWP_REQUEST[bits 31:24] is supported if set. Bit 11: HWP_Package_Level_Request. IA32_HWP_REQUEST_PKG MSR is supported if set. Bit 12: Reserved. Bit 13: HDC. HDC base registers IA32_PKG_HDC_CTL, IA32_PM_CTL1, IA32_THREAD_STALL MSRs are supported if set. Bits 31 - 15: Reserved\nEBX\nBits 03 - 00: Number of Interrupt Thresholds in Digital Thermal Sensor Bits 31 - 04: Reserved\nECX\nBit 00: Hardware Coordination Feedback Capability (Presence of IA32_MPERF and IA32_APERF). The capability to provide a measure of delivered processor performance (since last reset of the counters), as a percentage of the expected processor performance when running at the TSC frequency. Bits 02 - 01: Reserved = 0 Bit 03: The processor supports performance-energy bias preference if CPUID.06H:ECX.SETBH[bit 3] is set and it also implies the presence of a new architectural MSR called IA32_ENERGY_PERF_BIAS (1B0H). Bits 31 - 04: Reserved = 0\nEDX\nReserved = 0\n\n\nStructured Extended Feature Flags Enumeration Leaf (Output depends on ECX input value)\n\n07H\n\nSub-leaf 0 (Input ECX = 0). *\nEAX\nBits 31-00: Reports the maximum input value for supported leaf 7 sub-leaves.\nTable 3-17. Information Returned by CPUID Instruction (Contd.)\n\n\n\nInitial EAX\nValue\nInformation Provided about the Processor\n\n\n\nEBX\nBit 00: FSGSBASE. Supports RDFSBASE/RDGSBASE/WRFSBASE/WRGSBASE if 1. Bit 01: IA32_TSC_ADJUST MSR is supported if 1. Bit 02: Reserved Bit 03: BMI1 Bit 04: HLE Bit 05: AVX2 Bit 06: Reserved Bit 07: SMEP. Supports Supervisor-Mode Execution Prevention if 1. Bit 08: BMI2 Bit 09: Supports Enhanced REP MOVSB/STOSB if 1. Bit 10: INVPCID. If 1, supports INVPCID instruction for system software that manages process-context identifiers. Bit 11: RTM Bit 12: Supports Platform Quality of Service Monitoring (PQM) capability if 1. Bit 13: Deprecates FPU CS and FPU DS values if 1. Bit 14: Reserved. Bit 15: Supports Platform Quality of Service Enforcement (PQE) capability if 1. Bits 17:16: Reserved Bit 18: RDSEED Bit 19: ADX Bit 20: SMAP Bits 31:21: Reserved\nECX\nBit 00: PREFETCHWT1 Bit 31-01: Reserved\nEDX\nReserved\nNOTE:\n* If ECX contains an invalid sub-leaf index, EAX/EBX/ECX/EDX return 0. Sub-leaf index n is invalid if n\nexceeds the value that sub-leaf 0 returns in EAX.\n\n\nDirect Cache Access Information Leaf\n\n09H\n\nValue of bits [31:0] of IA32_PLATFORM_DCA_CAP MSR (address 1F8H)\nEAX\nReserved\nEBX\nReserved\nECX\nReserved\nEDX\n\n\nArchitectural Performance Monitoring Leaf\n\n0AH\n\nEAX\nBits 07 - 00: Version ID of architectural performance monitoring Bits 15- 08: Number of general-purpose performance monitoring counter per logical processor Bits 23 - 16: Bit width of general-purpose, performance monitoring counter Bits 31 - 24: Length of EBX bit vector to enumerate architectural performance monitoring events\nEBX\nBit 00: Core cycle event not available if 1 Bit 01: Instruction retired event not available if 1 Bit 02: Reference cycles event not available if 1 Bit 03: Last-level cache reference event not available if 1 Bit 04: Last-level cache misses event not available if 1 Bit 05: Branch instruction retired event not available if 1 Bit 06: Branch mispredict retired event not available if 1 Bits 31- 07: Reserved = 0\nECX\nReserved = 0\nTable 3-17. Information Returned by CPUID Instruction (Contd.)\n\n\n\nInitial EAX\nValue\nInformation Provided about the Processor\n\n\n\nEDX\nBits 04 - 00: Number of fixed-function performance counters (if Version ID > 1) Bits 12- 05: Bit width of fixed-function performance counters (if Version ID > 1) Reserved = 0\n\n\nExtended Topology Enumeration Leaf\n\n\n\n\n\n\n\n\n\nNOTES:\n0BH\nMost of Leaf 0BH output depends on the initial value in ECX.\nThe EDX output of leaf 0BH is always valid and does not vary with input value in ECX.\nOutput value in ECX[7:0] always equals input value in ECX[7:0].\nFor sub-leaves that return an invalid level-type of 0 in ECX[15:8]; EAX and EBX will return 0.\n If an input value n in ECX returns the invalid level-type of 0 in ECX[15:8], other input values with ECX >\nn also return 0 in ECX[15:8].\nEAX\nBits 04-00: Number of bits to shift right on x2APIC ID to get a unique topology ID of the next level type*. All logical processors with the same next level ID share current level. Bits 31-05: Reserved.\nEBX\nBits 15 - 00: Number of logical processors at this level type. The number reflects configuration as shipped by Intel**. Bits 31- 16: Reserved.\nECX\nBits 07 - 00: Level number. Same value in ECX input Bits 15 - 08: Level type***. Bits 31 - 16:: Reserved.\nEDX\nBits 31- 00: x2APIC ID the current logical processor.\nNOTES: * Software should use this field (EAX[4:0]) to enumerate processor topology of the system.\n** Software must not use EBX[15:0] to enumerate processor topology of the system. This value in this field (EBX[15:0]) is only intended for display/diagnostic purposes. The actual number of logical processors available to BIOS/OS/Applications may be different from the value of EBX[15:0], depending on software and platform hardware configurations.\n*** The value of the “level type” field is not related to level numbers in any way, higher “level type” val-ues do not mean higher levels. Level type field has the following encoding: 0 : invalid 1 : SMT 2 : Core 3-255 : Reserved\nProcessor Extended State Enumeration Main Leaf (EAX = 0DH, ECX = 0)\nNOTES:\n0DH\nLeaf 0DH main leaf (ECX = 0).\nEAX\nBits 31-00: Reports the valid bit fields of the lower 32 bits of XCR0. If a bit is 0, the corresponding bit field in XCR0 is reserved. Bit 00: legacy x87 Bit 01: 128-bit SSE Bit 02: 256-bit AVX Bits 31- 03: Reserved\nEBX\nBits 31-00: Maximum size (bytes, from the beginning of the XSAVE/XRSTOR save area) required by enabled features in XCR0. May be different than ECX if some features at the end of the XSAVE save area are not enabled.\nTable 3-17. Information Returned by CPUID Instruction (Contd.)\n\n\n\nInitial EAX\nValue\nInformation Provided about the Processor\n\n\n\nECX\nBit 31-00: Maximum size (bytes, from the beginning of the XSAVE/XRSTOR save area) of the XSAVE/XRSTOR save area required by all supported features in the processor, i.e all the valid bit fields in XCR0.\nEDX\nBit 31-00: Reports the valid bit fields of the upper 32 bits of XCR0. If a bit is 0, the corresponding bit field in XCR0 is reserved.\n\n\nProcessor Extended State Enumeration Sub-leaf (EAX = 0DH, ECX = 1)\n\n0DH\n\nEAX\nBits 31-04: Reserved\nBit 00: XSAVEOPT is available\nBit 01: Supports XSAVEC and the compacted form of XRSTOR if set\nBit 02: Supports XGETBV with ECX = 1 if set\nBit 03: Supports XSAVES/XRSTORS and IA32_XSS if set\nEBX\nBits 31-00: The size in bytes of the XSAVE area containing all states enabled by XCRO | IA32_XSS.\nECX\nBits 31-00: Reports the valid bit fields of the lower 32 bits of IA32_XSS. If a bit is 0, the corresponding bit field in IA32_XSS is reserved.\nBits 07-00: Reserved\nBit 08: IA32_XSS[bit 8] is supported if 1\nBits 31-09: Reserved\nEDX\nBits 31-00: Reports the valid bit fields of the upper 32 bits of IA32_XSS. If a bit is 0, the corresponding bit field in IA32_XSS is reserved.\nBits 31-00: Reserved\n\n\nProcessor Extended State Enumeration Sub-leaves (EAX = 0DH, ECX = n, n > 1)\n\n\n\n\n\n\n\n\n\nNOTES:\n0DH\nLeaf 0DH output depends on the initial value in ECX.\nEach valid sub-leaf index maps to a valid bit in either the XCR0 register or the IA32_XSS MSR starting at bit position 2.\n* If ECX contains an invalid sub-leaf index, EAX/EBX/ECX/EDX return 0. Sub-leaf n (0 ≤ n ≤ 31) is invalid\nif sub-leaf 0 returns 0 in EAX[n] and sub-leaf 1 returns 0 in ECX[n]. Sub-leaf n (32 ≤ n ≤ 63) is invalid if sub-leaf 0 returns 0 in EDX[n-32] and sub-leaf 1 returns 0 in EDX[n-32].\nEAX\nBits 31-0: The size in bytes (from the offset specified in EBX) of the save area for an extended state fea-ture associated with a valid sub-leaf index, n.\nEBX\nBits 31-0: The offset in bytes of this extended state component’s save area from the beginning of the XSAVE/XRSTOR area. This field reports 0 if the sub-leaf index, n, does not map to a valid bit in the XCR0 register*.\nECX\nBit 0 is set if the sub-leaf index, n, maps to a valid bit in the IA32_XSS MSR and bit 0 is clear if n maps to a valid bit in XCR0. Bits 31-1 are reserved. This field reports 0 if the sub-leaf index, n, is invalid*.\nEDX\nThis field reports 0 if the sub-leaf index, n, is invalid*; otherwise it is reserved.\nPlatform QoS Monitoring Enumeration Sub-leaf (EAX = 0FH, ECX = 0)\nNOTES:\n0FH\nLeaf 0FH output depends on the initial value in ECX.\nSub-leaf index 0 reports valid resource type starting at bit position 1 of EDX\nEAX\nReserved.\nEBX\nBits 31-0: Maximum range (zero-based) of RMID within this physical processor of all types.\nECX\nReserved.\nTable 3-17. Information Returned by CPUID Instruction (Contd.)\n\n\n\nInitial EAX\nValue\nInformation Provided about the Processor\n\n\n\nEDX\nBit 00: Reserved. Bit 01: Supports L3 Cache QoS Monitoring if 1. Bits 31:02: Reserved\n\n\nL3 Cache QoS Monitoring Capability Enumeration Sub-leaf (EAX = 0FH, ECX = 1)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNOTES:\n0FH\nLeaf 0FH output depends on the initial value in ECX.\nEAX\nReserved.\nEBX\nBits 31-0: Conversion factor from reported IA32_QM_CTR value to occupancy metric (bytes).\nECX\nMaximum range (zero-based) of RMID of this resource type.\nEDX\nBit 00: Supports L3 occupancy monitoring if 1. Bits 31:01: Reserved\nPlatform QoS Enforcement Enumeration Sub-leaf (EAX = 10H, ECX = 0)\nNOTES:\n10H\nLeaf 10H output depends on the initial value in ECX.\nSub-leaf index 0 reports valid resource identification (ResID) starting at bit position 1 of EDX\nEAX\nReserved.\nEBX\nBit 00: Reserved. Bit 01: Supports L3 Cache QoS Enforcement if 1. Bits 31:02: Reserved\nECX\nReserved.\nEDX\nReserved.\nL3 Cache QoS Enforcement Enumeration Sub-leaf (EAX = 10H, ECX = ResID =1)\nNOTES:\n10H\nLeaf 10H output depends on the initial value in ECX.\nEAX\nBits 4:0: Length of the capacity bit mask for the corresponding ResID. Bits 31:05: Reserved\nEBX\nBits 31-0: Bit-granular map of isolation/contention of allocation units.\nECX\nBit 00: Reserved. Bit 01: Updates of COS should be infrequent if 1. Bits 31:02: Reserved\nEDX\nBits 15:0: Highest COS number supported for this ResID. Bits 31:16: Reserved\nIntel Processor Trace Enumeration Main Leaf (EAX = 14H, ECX = 0)\nNOTES:\n14H\nLeaf 14H main leaf (ECX = 0).\nEAX\nBits 31-0: Reports the maximum number sub-leaves that are supported in leaf 14H.\nEBX\nBit 00: If 1, Indicates that IA32_RTIT_CTL.CR3Filter can be set to 1, and that IA32_RTIT_CR3_MATCH MSR can be accessed. Bits 31- 01: Reserved\nTable 3-17. Information Returned by CPUID Instruction (Contd.)\n\n\n\nInitial EAX\nValue\nInformation Provided about the Processor\n\n\n\nECX\nBit 00: If 1, Tracing can be enabled with IA32_RTIT_CTL.ToPA = 1, hence utilizing the ToPA output scheme; IA32_RTIT_OUTPUT_BASE and IA32_RTIT_OUTPUT_MASK_PTRS MSRs can be accessed. Bit 01: If 1, ToPA tables can hold any number of output entries, up to the maximum allowed by the Mas-kOrTableOffset field of IA32_RTIT_OUTPUT_MASK_PTRS. Bit 30:02: Reserved Bit 31: If 1, Generated packets which contain IP payloads have LIP values, which include the CS base com-ponent.\nEDX\nBits 31- 00: Reserved\n\n\nUnimplemented CPUID Leaf Functions\n\n\n40000000H\n-\n4FFFFFFFH\nInvalid. No existing or future CPU will return processor identification or feature information if the initial EAX value is in the range 40000000H to 4FFFFFFFH.\n\n\nExtended Function CPUID Information\n\n80000000H\n\nEAX\nMaximum Input Value for Extended Function CPUID Information (see Table 3-18).\nEBX\nReserved\nECX\nReserved\nEDX\nReserved\n\n80000001H\n\nEAX\nExtended Processor Signature and Feature Bits.\nEBX\nReserved\nECX\nBit 00: LAHF/SAHF available in 64-bit mode Bits 04-01 Reserved Bit 05: LZCNT Bits 07-06 Reserved Bit 08: PREFETCHW Bits 31-09 Reserved\nEDX\nBits 10-00: Reserved Bit 11: SYSCALL/SYSRET available in 64-bit mode Bits 19-12: Reserved = 0 Bit 20: Execute Disable Bit available Bits 25-21: Reserved = 0 Bit 26: 1-GByte pages are available if 1 Bit 27: RDTSCP and IA32_TSC_AUX are available if 1 Bits 28: Reserved = 0\nBit 29: Intel® 64 Architecture available if 1 Bits 31-30: Reserved = 0\n\n80000002H\n\nEAX\nProcessor Brand String\nEBX\nProcessor Brand String Continued\nECX\nProcessor Brand String Continued\nEDX\nProcessor Brand String Continued\n\n80000003H\n\nEAX\nProcessor Brand String Continued\nEBX\nProcessor Brand String Continued\nECX\nProcessor Brand String Continued\nEDX\nProcessor Brand String Continued\nTable 3-17. Information Returned by CPUID Instruction (Contd.)\n\n\n\nInitial EAX\nValue\nInformation Provided about the Processor\n\n80000004H\n\nEAX\nProcessor Brand String Continued\nEBX\nProcessor Brand String Continued\nECX\nProcessor Brand String Continued\nEDX\nProcessor Brand String Continued\n\n80000005H\n\nEAX\nReserved = 0\nEBX\nReserved = 0\nECX\nReserved = 0\nEDX\nReserved = 0\n\n80000006H\n\nEAX\nReserved = 0\nEBX\nReserved = 0\nECX\nBits 07-00: Cache Line size in bytes Bits 11-08: Reserved Bits 15-12: L2 Associativity field * Bits 31-16: Cache size in 1K units\nEDX\nReserved = 0\n\n\n\n\n\n\nNOTES:\n* L2 associativity field encodings:\n00H - Disabled 01H - Direct mapped 02H - 2-way 04H - 4-way 06H - 8-way 08H - 16-way 0FH - Fully associative\n80000007H\nEAX\nReserved = 0\nEBX\nReserved = 0\nECX\nReserved = 0\nEDX\nBits 07-00: Reserved = 0 Bit 08: Invariant TSC available if 1 Bits 31-09: Reserved = 0\n80000008H\nEAX\nLinear/Physical Address size Bits 07-00: #Physical Address Bits* Bits 15-8: #Linear Address Bits Bits 31-16: Reserved = 0\nEBX\nReserved = 0\nECX\nReserved = 0\nEDX\nReserved = 0\nNOTES:\n*\nIf CPUID.80000008H:EAX[7:0] is supported, the maximum physical address number supported should come from this field.\n"
},
{
"Name": "CRC32",
"Alias": [],
"Brief": "Accumulate CRC32 Value",
"Description": "\nStarting with an initial value in the first operand (destination operand), accumulates a CRC32 (polynomial 11EDC6F41H) value for the second operand (source operand) and stores the result in the destination operand. The source operand can be a register or a memory location. The destination operand must be an r32 or r64 register. If the destination is an r64 register, then the 32-bit result is stored in the least significant double word and 00000000H is stored in the most significant double word of the r64 register.\nThe initial value supplied in the destination operand is a double word integer stored in the r32 register or the least significant double word of the r64 register. To incrementally accumulate a CRC32 value, software retains the result of the previous CRC32 operation in the destination operand, then executes the CRC32 instruction again with new input data in the source operand. Data contained in the source operand is processed in reflected bit order. This means that the most significant bit of the source operand is treated as the least significant bit of the quotient, and so on, for all the bits of the source operand. Likewise, the result of the CRC operation is stored in the destination operand in reflected bit order. This means that the most significant bit of the resulting CRC (bit 31) is stored in the least significant bit of the destination operand (bit 0), and so on, for all the bits of the CRC.\n"
},
{
"Name": "CVTDQ2PD",
"Alias": [],
"Brief": "Convert Packed Dword Integers to Packed Double",
"Description": "\nConverts two packed signed doubleword integers in the source operand (second operand) to two packed double-precision floating-point values in the destination operand (first operand).\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source operand is an XMM register or 64- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding XMM register destination are unmodified.\nVEX.128 encoded version: The source operand is an XMM register or 64- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is a YMM register or 128- bit memory location. The destination operation is a YMM register.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\nX3\nX2\nX1\nX0\nSRC\nX3\nX2\nX1\nX0\nDEST\nFigure 3-10. CVTDQ2PD (VEX.256 encoded version)\n"
},
{
"Name": "CVTDQ2PS",
"Alias": [],
"Brief": "Convert Packed Dword Integers to Packed Single",
"Description": "\nConverts four packed signed doubleword integers in the source operand (second operand) to four packed single-precision floating-point values in the destination operand (first operand).\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding XMM register destination are unmodified.\nVEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination operation is a YMM register.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n"
},
{
"Name": "CVTPD2DQ",
"Alias": [],
"Brief": "Convert Packed Double",
"Description": "\nConverts two packed double-precision floating-point values in the source operand (second operand) to two packed signed doubleword integers in the destination operand (first operand).\nThe source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register. The result is stored in the low quadword of the destination operand and the high quadword is cleared to all 0s.\nWhen a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination operation is an XMM register. Bits[127:64] of the destination XMM register are zeroed. However, the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination operation is a YMM register. The upper bits (VLMAX-1:64) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination operation is an XMM register. The upper bits (255:128) of the corresponding YMM register destination are zeroed.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n\n\n\n\n\nX3\nX2\nX1\nX0\nSRC\nDEST\n0\nX3\nX2\nX1\nX0\nFigure 3-11. VCVTPD2DQ (VEX.256 encoded version)\n"
},
{
"Name": "CVTPD2PI",
"Alias": [],
"Brief": "Convert Packed Double",
"Description": "\nConverts two packed double-precision floating-point values in the source operand (second operand) to two packed signed doubleword integers in the destination operand (first operand).\nThe source operand can be an XMM register or a 128-bit memory location. The destination operand is an MMX tech-nology register.\nWhen a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.\nThis instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the CVTPD2PI instruction is executed.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n"
},
{
"Name": "CVTPD2PS",
"Alias": [],
"Brief": "Convert Packed Double",
"Description": "\nConverts two packed double-precision floating-point values in the source operand (second operand) to two packed single-precision floating-point values in the destination operand (first operand).\nWhen a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination operation is an XMM register. Bits[127:64] of the destination XMM register are zeroed. However, the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination operation is a YMM register. The upper bits (VLMAX-1:64) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination operation is an XMM register. The upper bits (255:128) of the corresponding YMM register destination are zeroed.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\nSRC\nX3\nX2\nX1\nX0\nDEST\n0\nX3\nX2\nX1\nX0\nFigure 3-12. VCVTPD2PS (VEX.256 encoded version)\n"
},
{
"Name": "CVTPI2PD",
"Alias": [],
"Brief": "Convert Packed Dword Integers to Packed Double",
"Description": "\nConverts two packed signed doubleword integers in the source operand (second operand) to two packed double-precision floating-point values in the destination operand (first operand).\nThe source operand can be an MMX technology register or a 64-bit memory location. The destination operand is an XMM register. In addition, depending on the operand configuration:\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n"
},
{
"Name": "CVTPI2PS",
"Alias": [],
"Brief": "Convert Packed Dword Integers to Packed Single",
"Description": "\nConverts two packed signed doubleword integers in the source operand (second operand) to two packed single-precision floating-point values in the destination operand (first operand).\nThe source operand can be an MMX technology register or a 64-bit memory location. The destination operand is an XMM register. The results are stored in the low quadword of the destination operand, and the high quadword remains unchanged. When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register.\nThis instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the CVTPI2PS instruction is executed.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n"
},
{
"Name": "CVTPS2DQ",
"Alias": [],
"Brief": "Convert Packed Single",
"Description": "\nConverts four or eight packed single-precision floating-point values in the source operand to four or eight signed doubleword integers in the destination operand.\nWhen a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination operation is a YMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination operation is a YMM register.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\n"
},
{
"Name": "CVTPS2PD",
"Alias": [],
"Brief": "Convert Packed Single",
"Description": "\nConverts two or four packed single-precision floating-point values in the source operand (second operand) to two or four packed double-precision floating-point values in the destination operand (first operand).\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source operand is an XMM register or 64- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The source operand is an XMM register or 64- bit memory location. The destination operation is a YMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is an XMM register or 128- bit memory location. The destination operation is a YMM register.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\nX3\nX2\nX1\nX0\nSRC\nX3\nX2\nX1\nX0\nDEST\nFigure 3-13. CVTPS2PD (VEX.256 encoded version)\n"
},
{
"Name": "CVTPS2PI",
"Alias": [],
"Brief": "Convert Packed Single",
"Description": "\nConverts two packed single-precision floating-point values in the source operand (second operand) to two packed signed doubleword integers in the destination operand (first operand).\nThe source operand can be an XMM register or a 128-bit memory location. The destination operand is an MMX tech-nology register. When the source operand is an XMM register, the two single-precision floating-point values are contained in the low quadword of the register. When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indef-inite integer value (80000000H) is returned.\nCVTPS2PI causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the CVTPS2PI instruction is executed.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n"
},
{
"Name": "CVTSD2SI",
"Alias": [],
"Brief": "Convert Scalar Double",
"Description": "\nConverts a double-precision floating-point value in the source operand (second operand) to a signed doubleword integer in the destination operand (first operand). The source operand can be an XMM register or a 64-bit memory location. The destination operand is a general-purpose register. When the source operand is an XMM register, the double-precision floating-point value is contained in the low quadword of the register.\nWhen a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.\nIn 64-bit mode, the instruction can access additional registers (XMM8-XMM15, R8-R15) when used with a REX.R prefix. Use of the REX.W prefix promotes the instruction to 64-bit operation. See the summary chart at the begin-ning of this section for encoding data and limits.\nLegacy SSE instructions: Use of the REX.W prefix promotes the instruction to 64-bit operation. See the summary chart at the beginning of this section for encoding data and limits.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n"
},
{
"Name": "CVTSD2SS",
"Alias": [],
"Brief": "Convert Scalar Double",
"Description": "\nConverts a double-precision floating-point value in the source operand (second operand) to a single-precision floating-point value in the destination operand (first operand).\nThe source operand can be an XMM register or a 64-bit memory location. The destination operand is an XMM register. When the source operand is an XMM register, the double-precision floating-point value is contained in the low quadword of the register. The result is stored in the low doubleword of the destination operand, and the upper 3 doublewords are left unchanged. When the conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The destination and first source operand are the same. Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (127:64) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n"
},
{
"Name": "CVTSI2SD",
"Alias": [],
"Brief": "Convert Dword Integer to Scalar Double",
"Description": "\nConverts a signed doubleword integer (or signed quadword integer if operand size is 64 bits) in the second source operand to a double-precision floating-point value in the destination operand. The result is stored in the low quad-word of the destination operand, and the high quadword left unchanged. When conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register.\nLegacy SSE instructions: Use of the REX.W prefix promotes the instruction to 64-bit operands. See the summary chart at the beginning of this section for encoding data and limits.\nThe second source operand can be a general-purpose register or a 32/64-bit memory location. The first source and destination operands are XMM registers.\n128-bit Legacy SSE version: The destination and first source operand are the same. Bits (VLMAX-1:64) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (127:64) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n"
},
{
"Name": "CVTSI2SS",
"Alias": [],
"Brief": "Convert Dword Integer to Scalar Single",
"Description": "\nConverts a signed doubleword integer (or signed quadword integer if operand size is 64 bits) in the source operand (second operand) to a single-precision floating-point value in the destination operand (first operand). The source operand can be a general-purpose register or a memory location. The destination operand is an XMM register. The result is stored in the low doubleword of the destination operand, and the upper three doublewords are left unchanged. When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register.\nLegacy SSE instructions: In 64-bit mode, the instruction can access additional registers (XMM8-XMM15, R8-R15) when used with a REX.R prefix. Use of the REX.W prefix promotes the instruction to 64-bit operands. See the summary chart at the beginning of this section for encoding data and limits.\n128-bit Legacy SSE version: The destination and first source operand are the same. Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (127:32) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n"
},
{
"Name": "CVTSS2SD",
"Alias": [],
"Brief": "Convert Scalar Single",
"Description": "\nConverts a single-precision floating-point value in the source operand (second operand) to a double-precision floating-point value in the destination operand (first operand). The source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. When the source operand is an XMM register, the single-precision floating-point value is contained in the low doubleword of the register. The result is stored in the low quadword of the destination operand, and the high quadword is left unchanged.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The destination and first source operand are the same. Bits (VLMAX-1:64) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (127:64) of the XMM register destination are copied from corresponding bits in the first source operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n"
},
{
"Name": "CVTSS2SI",
"Alias": [],
"Brief": "Convert Scalar Single",
"Description": "\nConverts a single-precision floating-point value in the source operand (second operand) to a signed doubleword integer (or signed quadword integer if operand size is 64 bits) in the destination operand (first operand). The source operand can be an XMM register or a memory location. The destination operand is a general-purpose register. When the source operand is an XMM register, the single-precision floating-point value is contained in the low doubleword of the register.\nWhen a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.\nIn 64-bit mode, the instruction can access additional registers (XMM8-XMM15, R8-R15) when used with a REX.R prefix. Use of the REX.W prefix promotes the instruction to 64-bit operands. See the summary chart at the begin-ning of this section for encoding data and limits.\nLegacy SSE instructions: In 64-bit mode, Use of the REX.W prefix promotes the instruction to 64-bit operands. See the summary chart at the beginning of this section for encoding data and limits.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n"
},
{
"Name": "CVTTPD2DQ",
"Alias": [],
"Brief": "Convert with Truncation Packed Double",
"Description": "\nConverts two or four packed double-precision floating-point values in the source operand (second operand) to two or four packed signed doubleword integers in the destination operand (first operand).\nWhen a conversion is inexact, a truncated (round toward zero) value is returned.If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination operation is a YMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination operation is an XMM register. The upper bits (255:128) of the corresponding YMM register destination are zeroed.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n\n\n\nX2\nX1\nSRC\nX3\nX0\nDEST\n0\nX3\nX2\nX1\nX0\nFigure 3-14. VCVTTPD2DQ (VEX.256 encoded version)\n"
},
{
"Name": "CVTTPD2PI",
"Alias": [],
"Brief": "Convert with Truncation Packed Double",
"Description": "\nConverts two packed double-precision floating-point values in the source operand (second operand) to two packed signed doubleword integers in the destination operand (first operand). The source operand can be an XMM register or a 128-bit memory location. The destination operand is an MMX technology register.\nWhen a conversion is inexact, a truncated (round toward zero) result is returned. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.\nThis instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the CVTTPD2PI instruction is executed.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n"
},
{
"Name": "CVTTPS2DQ",
"Alias": [],
"Brief": "Convert with Truncation Packed Single",
"Description": "\nConverts four or eight packed single-precision floating-point values in the source operand to four or eight signed doubleword integers in the destination operand.\nWhen a conversion is inexact, a truncated (round toward zero) value is returned.If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination operation is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination operation is a YMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination operation is a YMM register.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\n"
},
{
"Name": "CVTTPS2PI",
"Alias": [],
"Brief": "Convert with Truncation Packed Single",
"Description": "\nConverts two packed single-precision floating-point values in the source operand (second operand) to two packed signed doubleword integers in the destination operand (first operand). The source operand can be an XMM register or a 64-bit memory location. The destination operand is an MMX technology register. When the source operand is an XMM register, the two single-precision floating-point values are contained in the low quadword of the register.\nWhen a conversion is inexact, a truncated (round toward zero) result is returned. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.\nThis instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the CVTTPS2PI instruction is executed.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n"
},
{
"Name": "CVTTSD2SI",
"Alias": [],
"Brief": "Convert with Truncation Scalar Double",
"Description": "\nConverts a double-precision floating-point value in the source operand (second operand) to a signed doubleword integer (or signed quadword integer if operand size is 64 bits) in the destination operand (first operand). The source operand can be an XMM register or a 64-bit memory location. The destination operand is a general purpose register. When the source operand is an XMM register, the double-precision floating-point value is contained in the low quadword of the register.\nWhen a conversion is inexact, a truncated (round toward zero) result is returned. If a converted result is larger than the maximum signed doubleword integer, the floating point invalid exception is raised. If this exception is masked, the indefinite integer value (80000000H) is returned.\nLegacy SSE instructions: In 64-bit mode, Use of the REX.W prefix promotes the instruction to 64-bit operation. See the summary chart at the beginning of this section for encoding data and limits.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n"
},
{
"Name": "CVTTSS2SI",
"Alias": [],
"Brief": "Convert with Truncation Scalar Single",
"Description": "\nConverts a single-precision floating-point value in the source operand (second operand) to a signed doubleword integer (or signed quadword integer if operand size is 64 bits) in the destination operand (first operand). The source operand can be an XMM register or a 32-bit memory location. The destination operand is a general-purpose register. When the source operand is an XMM register, the single-precision floating-point value is contained in the low doubleword of the register.\nWhen a conversion is inexact, a truncated (round toward zero) result is returned. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised. If this exception is masked, the indefinite integer value (80000000H) is returned.\nLegacy SSE instructions: In 64-bit mode, the instruction can access additional registers (XMM8-XMM15, R8-R15) when used with a REX.R prefix. Use of the REX.W prefix promotes the instruction to 64-bit operation. See the summary chart at the beginning of this section for encoding data and limits.\nNote: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.\n"
},
{
"Name": "DAA",
"Alias": [],
"Brief": "Decimal Adjust AL after Addition",
"Description": "\nAdjusts the sum of two packed BCD values to create a packed BCD result. The AL register is the implied source and destination operand. The DAA instruction is only useful when it follows an ADD instruction that adds (binary addi-tion) two 2-digit, packed BCD values and stores a byte result in the AL register. The DAA instruction then adjusts the contents of the AL register to contain the correct 2-digit, packed BCD result. If a decimal carry is detected, the CF and AF flags are set accordingly.\nThis instruction executes as described above in compatibility mode and legacy mode. It is not valid in 64-bit mode.\n"
},
{
"Name": "DAS",
"Alias": [],
"Brief": "Decimal Adjust AL after Subtraction",
"Description": "\nAdjusts the result of the subtraction of two packed BCD values to create a packed BCD result. The AL register is the implied source and destination operand. The DAS instruction is only useful when it follows a SUB instruction that subtracts (binary subtraction) one 2-digit, packed BCD value from another and stores a byte result in the AL register. The DAS instruction then adjusts the contents of the AL register to contain the correct 2-digit, packed BCD result. If a decimal borrow is detected, the CF and AF flags are set accordingly.\nThis instruction executes as described above in compatibility mode and legacy mode. It is not valid in 64-bit mode.\n"
},
{
"Name": "DEC",
"Alias": [],
"Brief": "Decrement by 1",
"Description": "\nSubtracts 1 from the destination operand, while preserving the state of the CF flag. The destination operand can be a register or a memory location. This instruction allows a loop counter to be updated without disturbing the CF flag. (To perform a decrement operation that updates the CF flag, use a SUB instruction with an immediate operand of 1.)\nThis instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.\nIn 64-bit mode, DEC r16 and DEC r32 are not encodable (because opcodes 48H through 4FH are REX prefixes). Otherwise, the instruction’s 64-bit mode default operation size is 32 bits. Use of the REX.R prefix permits access to additional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits.\nSee the summary chart at the beginning of this section for encoding data and limits.\n"
},
{
"Name": "DIV",
"Alias": [],
"Brief": "Unsigned Divide",
"Description": "\nDivides unsigned the value in the AX, DX:AX, EDX:EAX, or RDX:RAX registers (dividend) by the source operand (divisor) and stores the result in the AX (AH:AL), DX:AX, EDX:EAX, or RDX:RAX registers. The source operand can be a general-purpose register or a memory location. The action of this instruction depends on the operand size (dividend/divisor). Division using 64-bit operand is available only in 64-bit mode.\nNon-integral results are truncated (chopped) towards 0. The remainder is always less than the divisor in magni-tude. Overflow is indicated with the #DE (divide error) exception rather than with the CF flag.\nIn 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to addi-tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. In 64-bit mode when REX.W is applied, the instruction divides the unsigned value in RDX:RAX by the source operand and stores the quotient in RAX, the remainder in RDX.\nSee the summary chart at the beginning of this section for encoding data and limits. See Table 3-25.\nTable 3-25. DIV Action\n\n\n\nMaximum\nOperand Size\nDividend\nDivisor\nQuotient\nRemainder\nQuotient\nWord/byte\nAX\nr/m8\nAL\nAH\n255\nDoubleword/word\nDX:AX\nr/m16\nAX\nDX\n65,535\nQuadword/doubleword\nEDX:EAX\nr/m32\nEAX\nEDX\n232 − 1\nDoublequadword/\nRDX:RAX\nr/m64\nRAX\nRDX\n264 − 1\nquadword\n"
},
{
"Name": "DIVPD",
"Alias": [],
"Brief": "Divide Packed Double",
"Description": "\nPerforms an SIMD divide of the two or four packed double-precision floating-point values in the first source operand by the two or four packed double-precision floating-point values in the second source operand. See Chapter 11 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an overview of a SIMD double-precision floating-point operation.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n"
},
{
"Name": "DIVPS",
"Alias": [],
"Brief": "Divide Packed Single",
"Description": "\nPerforms an SIMD divide of the four or eight packed single-precision floating-point values in the first source operand by the four or eight packed single-precision floating-point values in the second source operand. See Chapter 10 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an overview of a SIMD single-precision floating-point operation.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n"
},
{
"Name": "DIVSD",
"Alias": [],
"Brief": "Divide Scalar Double",
"Description": "\nDivides the low double-precision floating-point value in the first source operand by the low double-precision floating-point value in the second source operand, and stores the double-precision floating-point result in the destination operand. The second source operand can be an XMM register or a 64-bit memory location. The first source and destination hyperons are XMM registers. The high quadword of the destination operand is copied from the high quadword of the first source operand. See Chapter 11 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for an overview of a scalar double-precision floating-point operation.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The first source operand and the destination operand are the same. Bits (VLMAX-1:64) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n"
},
{
"Name": "DIVSS",
"Alias": [],
"Brief": "Divide Scalar Single",
"Description": "\nDivides the low single-precision floating-point value in the first source operand by the low single-precision floating-point value in the second source operand, and stores the single-precision floating-point result in the destination operand. The second source operand can be an XMM register or a 32-bit memory location. The first source and destination operands are XMM registers. The three high-order doublewords of the destination are copied from the same dwords of the first source operand. See Chapter 10 in the Intel® 64 and IA-32 Architectures Software Devel-oper’s Manual, Volume 1, for an overview of a scalar single-precision floating-point operation.\nIn 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).\n128-bit Legacy SSE version: The first source operand and the destination operand are the same. Bits (VLMAX-1:32) of the corresponding YMM destination register remain unchanged.\nVEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.\n"
},
{
"Name": "DPPD",
"Alias": [],
"Brief": "Dot Product of Packed Double Precision Floating",
"Description": "\nConditionally multiplies the packed double-precision floating-point values in the destination operand (first operand) with the packed double-precision floating-point values in the source (second operand) depending on a mask extracted from bits [5:4] of the immediate operand (third operand). If a condition mask bit is zero, the corre-sponding multiplication is replaced by a value of 0.0 in the manner described by Section 12.8.4 of Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.\nThe two resulting double-precision values are summed into an intermediate result. The intermediate result is conditionally broadcasted to the destination using a broadcast mask specified by bits [1:0] of the immediate byte.\nIf a broadcast mask bit is \"1\", the intermediate result is copied to the corresponding qword element in the destina-tion operand. If a broadcast mask bit is zero, the corresponding element in the destination is set to zero.\nDPPD follows the NaN forwarding rules stated in the Software Developer’s Manual, vol. 1, table 4.7. These rules do not cover horizontal prioritization of NaNs. Horizontal propagation of NaNs to the destination and the positioning of those NaNs in the destination is implementation dependent. NaNs on the input sources or computationally gener-ated NaNs will have at least one NaN propagated to the destination.\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nIf VDPPD is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.\n"
},
{
"Name": "DPPS",
"Alias": [],
"Brief": "Dot Product of Packed Single Precision Floating",
"Description": "\nConditionally multiplies the packed single precision floating-point values in the destination operand (first operand) with the packed single-precision floats in the source (second operand) depending on a mask extracted from the high 4 bits of the immediate byte (third operand). If a condition mask bit in Imm8[7:4] is zero, the corresponding multiplication is replaced by a value of 0.0 in the manner described by Section 12.8.4 of Intel® 64 and IA-32 Archi-tectures Software Developer’s Manual, Volume 1.\nThe four resulting single-precision values are summed into an intermediate result. The intermediate result is condi-tionally broadcasted to the destination using a broadcast mask specified by bits [3:0] of the immediate byte.\nIf a broadcast mask bit is \"1\", the intermediate result is copied to the corresponding dword element in the destina-tion operand. If a broadcast mask bit is zero, the corresponding element in the destination is set to zero.\nDPPS follows the NaN forwarding rules stated in the Software Developer’s Manual, vol. 1, table 4.7. These rules do not cover horizontal prioritization of NaNs. Horizontal propagation of NaNs to the destination and the positioning of those NaNs in the destination is implementation dependent. NaNs on the input sources or computationally gener-ated NaNs will have at least one NaN propagated to the destination.\n128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are unmodified.\nVEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are zeroed.\nVEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register.\n"
},
{
"Name": "EMMS",
"Alias": [],
"Brief": "Empty MMX Technology State",
"Description": "\nSets the values of all the tags in the x87 FPU tag word to empty (all 1s). This operation marks the x87 FPU data registers (which are aliased to the MMX technology registers) as available for use by x87 FPU floating-point instruc-tions. (See Figure 8-7 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for the format of the x87 FPU tag word.) All other MMX instructions (other than the EMMS instruction) set all the tags in x87 FPU tag word to valid (all 0s).\nThe EMMS instruction must be used to clear the MMX technology state at the end of all MMX technology procedures or subroutines and before calling other procedures or subroutines that may execute x87 floating-point instructions. If a floating-point instruction loads one of the registers in the x87 FPU data register stack before the x87 FPU tag word has been reset by the EMMS instruction, an x87 floating-point register stack overflow can occur that will result in an x87 floating-point exception or incorrect result.\nEMMS operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "ENTER",
"Alias": [],
"Brief": "Make Stack Frame for Procedure Parameters",
"Description": "\nCreates a stack frame for a procedure. The first operand (size operand) specifies the size of the stack frame (that is, the number of bytes of dynamic storage allocated on the stack for the procedure). The second operand (nesting level operand) gives the lexical nesting level (0 to 31) of the procedure. The nesting level determines the number of stack frame pointers that are copied into the “display area” of the new stack frame from the preceding frame. Both of these operands are immediate values.\nThe stack-size attribute determines whether the BP (16 bits), EBP (32 bits), or RBP (64 bits) register specifies the current frame pointer and whether SP (16 bits), ESP (32 bits), or RSP (64 bits) specifies the stack pointer. In 64-bit mode, stack-size attribute is always 64-bits.\nThe ENTER and companion LEAVE instructions are provided to support block structured languages. The ENTER instruction (when used) is typically the first instruction in a procedure and is used to set up a new stack frame for a procedure. The LEAVE instruction is then used at the end of the procedure (just before the RET instruction) to release the stack frame.\nIf the nesting level is 0, the processor pushes the frame pointer from the BP/EBP/RBP register onto the stack, copies the current stack pointer from the SP/ESP/RSP register into the BP/EBP/RBP register, and loads the SP/ESP/RSP register with the current stack-pointer value minus the value in the size operand. For nesting levels of 1 or greater, the processor pushes additional frame pointers on the stack before adjusting the stack pointer. These additional frame pointers provide the called procedure with access points to other nested frames on the stack. See “Procedure Calls for Block-Structured Languages” in Chapter 6 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for more information about the actions of the ENTER instruction.\nThe ENTER instruction causes a page fault whenever a write using the final value of the stack pointer (within the current stack segment) would do so.\nIn 64-bit mode, default operation size is 64 bits; 32-bit operation size cannot be encoded.\n"
},
{
"Name": "EXTRACTPS",
"Alias": [],
"Brief": "Extract Packed Single Precision Floating",
"Description": "\nExtracts a single-precision floating-point value from the source operand (second operand) at the 32-bit offset spec-ified from imm8. Immediate bits higher than the most significant offset for the vector length are ignored.\nThe extracted single-precision floating-point value is stored in the low 32-bits of the destination operand\nIn 64-bit mode, destination register operand has default operand size of 64 bits. The upper 32-bits of the register are filled with zero. REX.W is ignored.\n128-bit Legacy SSE version: When a REX.W prefix is used in 64-bit mode with a general purpose register (GPR) as a destination operand, the packed single quantity is zero extended to 64 bits.\nVEX.128 encoded version: When VEX.128.66.0F3A.W1 17 form is used in 64-bit mode with a general purpose register (GPR) as a destination operand, the packed single quantity is zero extended to 64 bits. VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.\nThe source register is an XMM register. Imm8[1:0] determine the starting DWORD offset from which to extract the 32-bit floating-point value.\nIf VEXTRACTPS is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.\n"
},
{
"Name": "F2XM1",
"Alias": [],
"Brief": "Compute 2x–1",
"Description": "\nComputes the exponential value of 2 to the power of the source operand minus 1. The source operand is located in register ST(0) and the result is also stored in ST(0). The value of the source operand must lie in the range –1.0 to +1.0. If the source value is outside this range, the result is undefined.\nThe following table shows the results obtained when computing the exponential value of various classes of numbers, assuming that neither overflow nor underflow occurs.\nTable 3-26. Results Obtained from F2XM1\n\n\nST(0) SRC\nST(0) DEST\n\n− 1.0 to −0\n− 0.5 to − 0\n\n− 0\n− 0\n\n+ 0\n+ 0\n\n+ 0 to +1.0\n+ 0 to 1.0\nValues other than 2 can be exponentiated using the following formula:\nxy ← 2(y ∗ log2x)\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FABS",
"Alias": [],
"Brief": "Absolute Value",
"Description": "\nClears the sign bit of ST(0) to create the absolute value of the operand. The following table shows the results obtained when creating the absolute value of various classes of numbers.\nTable 3-27. Results Obtained from FABS\n\n\nST(0) SRC\nST(0) DEST\n\n− ∞\n+ ∞\n\n− F\n+ F\n\n− 0\n+ 0\n\n+ 0\n+ 0\n\n+ F\n+ F\n\n+ ∞\n+ ∞\n\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FADD",
"Alias": [
"FADDP",
"FIADD"
],
"Brief": "Add",
"Description": "\nAdds the destination and source operands and stores the sum in the destination location. The destination operand is always an FPU register; the source operand can be a register or a memory location. Source operands in memory can be in single-precision or double-precision floating-point format or in word or doubleword integer format.\nThe no-operand version of the instruction adds the contents of the ST(0) register to the ST(1) register. The one-operand version adds the contents of a memory location (either a floating-point or an integer value) to the contents of the ST(0) register. The two-operand version, adds the contents of the ST(0) register to the ST(i) register or vice versa. The value in ST(0) can be doubled by coding:\nFADD ST(0), ST(0);\nThe FADDP instructions perform the additional operation of popping the FPU register stack after storing the result. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. (The no-operand version of the floating-point add instructions always results in the register stack being popped. In some assemblers, the mnemonic for this instruction is FADD rather than FADDP.)\nThe FIADD instructions convert an integer source operand to double extended-precision floating-point format before performing the addition.\nThe table on the following page shows the results obtained when adding various classes of numbers, assuming that neither overflow nor underflow occurs.\nWhen the sum of two operands with opposite signs is 0, the result is +0, except for the round toward −∞ mode, in which case the result is −0. When the source operand is an integer 0, it is treated as a +0.\nWhen both operand are infinities of the same sign, the result is ∞ of the expected sign. If both operands are infini-ties of opposite signs, an invalid-operation exception is generated. See Table 3-28.\nTable 3-28. FADD/FADDP/FIADD Results\nDEST\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n− ∞\n− ∞\n− ∞\n− ∞\n− ∞\n− ∞\n*\nNaN\n− F or − I\n− ∞\n− F\n± F or ± 0\n+ ∞\nSRC\nSRC\nNaN\n−0\n− ∞\n− 0\n± 0\n+ ∞\nSRC\nDEST\nDEST\nNaN\n+ 0\n− ∞\n± 0\n+ 0\n+ ∞\nDEST\nDEST\nNaN\n+ F or + I\n− ∞\n± F or ± 0\n+ F\n+ ∞\nSRC\nSRC\nNaN\n+ ∞\n+ ∞\n+ ∞\n+ ∞\n+ ∞\n+ ∞\n*\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\nI\nMeans integer.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FBLD",
"Alias": [],
"Brief": "Load Binary Coded Decimal",
"Description": "\nConverts the BCD source operand into double extended-precision floating-point format and pushes the value onto the FPU stack. The source operand is loaded without rounding errors. The sign of the source operand is preserved, including that of −0.\nThe packed BCD digits are assumed to be in the range 0 through 9; the instruction does not check for invalid digits (AH through FH). Attempting to load an invalid encoding produces an undefined result.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FBSTP",
"Alias": [],
"Brief": "Store BCD Integer and Pop",
"Description": "\nConverts the value in the ST(0) register to an 18-digit packed BCD integer, stores the result in the destination operand, and pops the register stack. If the source value is a non-integral value, it is rounded to an integer value, according to rounding mode specified by the RC field of the FPU control word. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1.\nThe destination operand specifies the address where the first byte destination value is to be stored. The BCD value (including its sign bit) requires 10 bytes of space in memory.\nThe following table shows the results obtained when storing various classes of numbers in packed BCD format.\nTable 3-29. FBSTP Results\n\n\nST(0)\nDEST\n\n− ∞ or Value Too Large for DEST Format\n*\n\nF ≤ − 1\n− D\n\n−1 < F < -0\n**\n\n− 0\n− 0\n\n+ 0\n+ 0\n\n+ 0 < F < +1\n**\n\nF ≥ +1\n+ D\n\n+ ∞ or Value Too Large for DEST Format\n*\n\nNaN\n*\nNOTES:\nF Means finite floating-point value.\nD Means packed-BCD number.\n*\nIndicates floating-point invalid-operation (#IA) exception.\n** ±0 or ±1, depending on the rounding mode.\nIf the converted value is too large for the destination format, or if the source operand is an ∞, SNaN, QNAN, or is in an unsupported format, an invalid-arithmetic-operand condition is signaled. If the invalid-operation exception is not masked, an invalid-arithmetic-operand exception (#IA) is generated and no value is stored in the destination operand. If the invalid-operation exception is masked, the packed BCD indefinite value is stored in memory.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FCHS",
"Alias": [],
"Brief": "Change Sign",
"Description": "\nComplements the sign bit of ST(0). This operation changes a positive value into a negative value of equal magni-tude or vice versa. The following table shows the results obtained when changing the sign of various classes of numbers.\nTable 3-30. FCHS Results\n\n\nST(0) SRC\nST(0) DEST\n\n− ∞\n+ ∞\n\n− F\n+ F\n\n− 0\n+ 0\n\n+ 0\n− 0\n\n+ F\n− F\n\n+ ∞\n− ∞\n\nNaN\nNaN\nNOTES:\n*\nF means finite floating-point value.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FCLEX",
"Alias": [
"FNCLEX"
],
"Brief": "Clear Exceptions",
"Description": "\nClears the floating-point exception flags (PE, UE, OE, ZE, DE, and IE), the exception summary status flag (ES), the stack fault flag (SF), and the busy flag (B) in the FPU status word. The FCLEX instruction checks for and handles any pending unmasked floating-point exceptions before clearing the exception flags; the FNCLEX instruction does not.\nThe assembler issues two instructions for the FCLEX instruction (an FWAIT instruction followed by an FNCLEX instruction), and the processor executes each of these instructions separately. If an exception is generated for either of these instructions, the save EIP points to the instruction that caused the exception.\n"
},
{
"Name": "FCMOVcc",
"Alias": [
"FCMOVA", "FCMOVAE", "FCMOVB", "FCMOVBE", "FCMOVC", "FCMOVE", "FCMOVG", "FCMOVGE", "FCMOVL", "FCMOVLE", "FCMOVNA", "FCMOVNAE", "FCMOVNB", "FCMOVNBE", "FCMOVNC", "FCMOVNE", "FCMOVNG", "FCMOVNGE", "FCMOVNL",
"FCMOVNLE", "FCMOVNO", "FCMOVNP", "FCMOVNS", "FCMOVNZ", "FCMOVO", "FCMOVP", "FCMOVPE", "FCMOVPO", "FCMOVS", "FCMOVZ"
],
"Brief": "Floating",
"Description": "\nTests the status flags in the EFLAGS register and moves the source operand (second operand) to the destination operand (first operand) if the given test condition is true. The condition for each mnemonic os given in the Descrip-tion column above and in Chapter 8 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1. The source operand is always in the ST(i) register and the destination operand is always ST(0).\nThe FCMOVcc instructions are useful for optimizing small IF constructions. They also help eliminate branching over-head for IF operations and the possibility of branch mispredictions by the processor.\nA processor may not support the FCMOVcc instructions. Software can check if the FCMOVcc instructions are supported by checking the processor’s feature information with the CPUID instruction (see “COMISS—Compare Scalar Ordered Single-Precision Floating-Point Values and Set EFLAGS” in this chapter). If both the CMOV and FPU feature bits are set, the FCMOVcc instructions are supported.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FCOM",
"Alias": [
"FCOMP",
"FCOMPP"
],
"Brief": "Compare Floating Point Values",
"Description": "\nCompares the contents of register ST(0) and source value and sets condition code flags C0, C2, and C3 in the FPU status word according to the results (see the table below). The source operand can be a data register or a memory location. If no source operand is given, the value in ST(0) is compared with the value in ST(1). The sign of zero is ignored, so that –0.0 is equal to +0.0.\nTable 3-31. FCOM/FCOMP/FCOMPP Results\n\n\nCondition\nC3\nC2\nC0\n\nST(0) > SRC\n0\n0\n0\n\nST(0) < SRC\n0\n0\n1\n\nST(0) = SRC\n1\n0\n0\n\nUnordered*\n1\n1\n1\nNOTES:\n*\nFlags not set if unmasked invalid-arithmetic-operand (#IA) exception is generated.\nThis instruction checks the class of the numbers being compared (see “FXAM—Examine ModR/M” in this chapter). If either operand is a NaN or is in an unsupported format, an invalid-arithmetic-operand exception (#IA) is raised and, if the exception is masked, the condition flags are set to “unordered.” If the invalid-arithmetic-operand excep-tion is unmasked, the condition code flags are not set.\nThe FCOMP instruction pops the register stack following the comparison operation and the FCOMPP instruction pops the register stack twice following the comparison operation. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1.\nThe FCOM instructions perform the same operation as the FUCOM instructions. The only difference is how they handle QNaN operands. The FCOM instructions raise an invalid-arithmetic-operand exception (#IA) when either or both of the operands is a NaN value or is in an unsupported format. The FUCOM instructions perform the same operation as the FCOM instructions, except that they do not generate an invalid-arithmetic-operand exception for QNaNs.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FCOMI",
"Alias": [
"FCOMIP",
" FUCOMI",
"FUCOMIP"
],
"Brief": "Compare Floating Point Values and Set EFLAGS",
"Description": "\nPerforms an unordered comparison of the contents of registers ST(0) and ST(i) and sets the status flags ZF, PF, and CF in the EFLAGS register according to the results (see the table below). The sign of zero is ignored for compari-sons, so that –0.0 is equal to +0.0.\nTable 3-32. FCOMI/FCOMIP/ FUCOMI/FUCOMIP Results\n\n\nComparison Results*\nZF\nPF\nCF\n\nST0 > ST(i)\n0\n0\n0\n\nST0 < ST(i)\n0\n0\n1\n\nST0 = ST(i)\n1\n0\n0\n\nUnordered**\n1\n1\n1\nNOTES:\n*\nSee the IA-32 Architecture Compatibility section below.\n** Flags not set if unmasked invalid-arithmetic-operand (#IA) exception is generated.\nAn unordered comparison checks the class of the numbers being compared (see “FXAM—Examine ModR/M” in this chapter). The FUCOMI/FUCOMIP instructions perform the same operations as the FCOMI/FCOMIP instructions. The only difference is that the FUCOMI/FUCOMIP instructions raise the invalid-arithmetic-operand exception (#IA) only when either or both operands are an SNaN or are in an unsupported format; QNaNs cause the condition code flags to be set to unordered, but do not cause an exception to be generated. The FCOMI/FCOMIP instructions raise an invalid-operation exception when either or both of the operands are a NaN value of any kind or are in an unsup-ported format.\nIf the operation results in an invalid-arithmetic-operand exception being raised, the status flags in the EFLAGS register are set only if the exception is masked.\nThe FCOMI/FCOMIP and FUCOMI/FUCOMIP instructions set the OF, SF and AF flags to zero in the EFLAGS register (regardless of whether an invalid-operation exception is detected).\nThe FCOMIP and FUCOMIP instructions also pop the register stack following the comparison operation. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FCOS",
"Alias": [],
"Brief": "Cosine",
"Description": "\nComputes the cosine of the source operand in register ST(0) and stores the result in ST(0). The source operand must be given in radians and must be within the range −263 to +263. The following table shows the results obtained when taking the cosine of various classes of numbers.\nTable 3-33. FCOS Results\n\n\nST(0) SRC\nST(0) DEST\n\n− ∞\n*\n\n− F\n−1 to +1\n\n− 0\n+ 1\n\n+ 0\n+ 1\n\n+ F\n− 1 to + 1\n\n+ ∞\n*\n\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\nIf the source operand is outside the acceptable range, the C2 flag in the FPU status word is set, and the value in register ST(0) remains unchanged. The instruction does not raise an exception when the source operand is out of range. It is up to the program to check the C2 flag for out-of-range conditions. Source values outside the range − 263 to +263 can be reduced to the range of the instruction by subtracting an appropriate integer multiple of 2π or by using the FPREM instruction with a divisor of 2π. See the section titled “Pi” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for a discussion of the proper value to use for π in performing such reductions.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FDECSTP",
"Alias": [],
"Brief": "Decrement Stack",
"Description": "\nSubtracts one from the TOP field of the FPU status word (decrements the top-of-stack pointer). If the TOP field contains a 0, it is set to 7. The effect of this instruction is to rotate the stack by one position. The contents of the FPU data registers and tag register are not affected.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FDIV",
"Alias": [
"FDIVP",
"FIDIV"
],
"Brief": "Divide",
"Description": "\nDivides the destination operand by the source operand and stores the result in the destination location. The desti-nation operand (dividend) is always in an FPU register; the source operand (divisor) can be a register or a memory location. Source operands in memory can be in single-precision or double-precision floating-point format, word or doubleword integer format.\nThe no-operand version of the instruction divides the contents of the ST(1) register by the contents of the ST(0) register. The one-operand version divides the contents of the ST(0) register by the contents of a memory location (either a floating-point or an integer value). The two-operand version, divides the contents of the ST(0) register by the contents of the ST(i) register or vice versa.\nThe FDIVP instructions perform the additional operation of popping the FPU register stack after storing the result. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The no-operand version of the floating-point divide instructions always results in the register stack being popped. In some assemblers, the mnemonic for this instruction is FDIV rather than FDIVP.\nThe FIDIV instructions convert an integer source operand to double extended-precision floating-point format before performing the division. When the source operand is an integer 0, it is treated as a +0.\nIf an unmasked divide-by-zero exception (#Z) is generated, no result is stored; if the exception is masked, an ∞ of the appropriate sign is stored in the destination operand.\nThe following table shows the results obtained when dividing various classes of numbers, assuming that neither overflow nor underflow occurs.\nTable 3-34. FDIV/FDIVP/FIDIV Results\nDEST\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n− ∞\n+ 0\n+ 0\n− 0\n− 0\n*\n*\nNaN\n− F\n+ ∞\n+ F\n+ 0\n− 0\n− F\n− ∞\nNaN\n− I\n+ ∞\n+ F\n+ 0\n− 0\n− F\n− ∞\nNaN\n− 0\n+ ∞\n− ∞\nSRC\n**\n*\n*\n**\nNaN\n+ 0\n− ∞\n+ ∞\n**\n*\n*\n**\nNaN\n+ I\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n+ F\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n+ ∞\n− 0\n− 0\n+ 0\n+ 0\n*\n*\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\nI\nMeans integer.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\n** Indicates floating-point zero-divide (#Z) exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FDIVR",
"Alias": [
"FDIVRP",
"FIDIVR"
],
"Brief": "Reverse Divide",
"Description": "\nDivides the source operand by the destination operand and stores the result in the destination location. The desti-nation operand (divisor) is always in an FPU register; the source operand (dividend) can be a register or a memory location. Source operands in memory can be in single-precision or double-precision floating-point format, word or doubleword integer format.\nThese instructions perform the reverse operations of the FDIV, FDIVP, and FIDIV instructions. They are provided to support more efficient coding.\nThe no-operand version of the instruction divides the contents of the ST(0) register by the contents of the ST(1) register. The one-operand version divides the contents of a memory location (either a floating-point or an integer value) by the contents of the ST(0) register. The two-operand version, divides the contents of the ST(i) register by the contents of the ST(0) register or vice versa.\nThe FDIVRP instructions perform the additional operation of popping the FPU register stack after storing the result. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The no-operand version of the floating-point divide instructions always results in the register stack being popped. In some assemblers, the mnemonic for this instruction is FDIVR rather than FDIVRP.\nThe FIDIVR instructions convert an integer source operand to double extended-precision floating-point format before performing the division.\nIf an unmasked divide-by-zero exception (#Z) is generated, no result is stored; if the exception is masked, an ∞ of the appropriate sign is stored in the destination operand.\nThe following table shows the results obtained when dividing various classes of numbers, assuming that neither overflow nor underflow occurs.\nTable 3-35. FDIVR/FDIVRP/FIDIVR Results\nDEST\nNaN\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\n*\n*\nNaN\n− ∞\n+ ∞\n+ ∞\n− ∞\n− ∞\n**\n**\nNaN\n− F\n+ 0\n+ F\n− F\n− 0\nSRC\n**\n**\nNaN\n− I\n+ 0\n+ F\n− F\n− 0\n*\n*\nNaN\n− 0\n+ 0\n+ 0\n− 0\n− 0\n*\n*\nNaN\n+ 0\n− 0\n− 0\n+ 0\n+ 0\n**\n**\nNaN\n+ I\n− 0\n− F\n+ F\n+ 0\n**\n**\nNaN\n+ F\n− 0\n− F\n+ F\n+ 0\n*\n*\nNaN\n+ ∞\n− ∞\n− ∞\n+ ∞\n+ ∞\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\nI\nMeans integer.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\n** Indicates floating-point zero-divide (#Z) exception.\nWhen the source operand is an integer 0, it is treated as a +0. This instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FFREE",
"Alias": [],
"Brief": "Free Floating",
"Description": "\nSets the tag in the FPU tag register associated with register ST(i) to empty (11B). The contents of ST(i) and the FPU stack-top pointer (TOP) are not affected.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FICOM",
"Alias": [
"FICOMP"
],
"Brief": "Compare Integer",
"Description": "\nCompares the value in ST(0) with an integer source operand and sets the condition code flags C0, C2, and C3 in the FPU status word according to the results (see table below). The integer value is converted to double extended-precision floating-point format before the comparison is made.\nTable 3-36. FICOM/FICOMP Results\n\n\nCondition\nC3\nC2\nC0\n\nST(0) > SRC\n0\n0\n0\n\nST(0) < SRC\n0\n0\n1\n\nST(0) = SRC\n1\n0\n0\n\nUnordered\n1\n1\n1\nThese instructions perform an “unordered comparison.” An unordered comparison also checks the class of the numbers being compared (see “FXAM—Examine ModR/M” in this chapter). If either operand is a NaN or is in an undefined format, the condition flags are set to “unordered.”\nThe sign of zero is ignored, so that –0.0 ← +0.0.\nThe FICOMP instructions pop the register stack following the comparison. To pop the register stack, the processor marks the ST(0) register empty and increments the stack pointer (TOP) by 1.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FILD",
"Alias": [],
"Brief": "Load Integer",
"Description": "\nConverts the signed-integer source operand into double extended-precision floating-point format and pushes the value onto the FPU register stack. The source operand can be a word, doubleword, or quadword integer. It is loaded without rounding errors. The sign of the source operand is preserved.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FMUL",
"Alias": [
"FMULP",
"FIMUL"
],
"Brief": "Multiply",
"Description": "\nMultiplies the destination and source operands and stores the product in the destination location. The destination operand is always an FPU data register; the source operand can be an FPU data register or a memory location. Source operands in memory can be in single-precision or double-precision floating-point format or in word or doubleword integer format.\nThe no-operand version of the instruction multiplies the contents of the ST(1) register by the contents of the ST(0) register and stores the product in the ST(1) register. The one-operand version multiplies the contents of the ST(0) register by the contents of a memory location (either a floating point or an integer value) and stores the product in the ST(0) register. The two-operand version, multiplies the contents of the ST(0) register by the contents of the ST(i) register, or vice versa, with the result being stored in the register specified with the first operand (the desti-nation operand).\nThe FMULP instructions perform the additional operation of popping the FPU register stack after storing the product. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The no-operand version of the floating-point multiply instructions always results in the register stack being popped. In some assemblers, the mnemonic for this instruction is FMUL rather than FMULP.\nThe FIMUL instructions convert an integer source operand to double extended-precision floating-point format before performing the multiplication.\nThe sign of the result is always the exclusive-OR of the source signs, even if one or more of the values being multi-plied is 0 or ∞. When the source operand is an integer 0, it is treated as a +0.\nThe following table shows the results obtained when multiplying various classes of numbers, assuming that neither overflow nor underflow occurs.\nTable 3-39. FMUL/FMULP/FIMUL Results\n\n\n\n\n\n\nDEST\n\n\n\n\n\n\n\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n\n\n− ∞\n+ ∞\n+ ∞\n*\n*\n− ∞\n− ∞\nNaN\n\n\n− F\n+ ∞\n+ F\n+ 0\n− 0\n− F\n− ∞\nNaN\n\n\n− I\n+ ∞\n+ F\n+ 0\n− 0\n− F\n− ∞\nNaN\n\nSRC\n− 0\n*\n+ 0\n+ 0\n− 0\n− 0\n*\nNaN\n\n\n+ 0\n*\n− 0\n− 0\n+ 0\n+ 0\n*\nNaN\n\n\n+ I\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n\n\n+ F\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n\n\n+ ∞\n− ∞\n− ∞\n*\n*\n+ ∞\n+ ∞\nNaN\n\n\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\nI\nMeans Integer.\n*\nIndicates invalid-arithmetic-operand (#IA) exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FINCSTP",
"Alias": [],
"Brief": "Increment Stack",
"Description": "\nAdds one to the TOP field of the FPU status word (increments the top-of-stack pointer). If the TOP field contains a 7, it is set to 0. The effect of this instruction is to rotate the stack by one position. The contents of the FPU data registers and tag register are not affected. This operation is not equivalent to popping the stack, because the tag for the previous top-of-stack register is not marked empty.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FINIT",
"Alias": [
"FNINIT"
],
"Brief": "Initialize Floating",
"Description": "\nSets the FPU control, status, tag, instruction pointer, and data pointer registers to their default states. The FPU control word is set to 037FH (round to nearest, all exceptions masked, 64-bit precision). The status word is cleared (no exception flags set, TOP is set to 0). The data registers in the register stack are left unchanged, but they are all tagged as empty (11B). Both the instruction and data pointers are cleared.\nThe FINIT instruction checks for and handles any pending unmasked floating-point exceptions before performing the initialization; the FNINIT instruction does not.\nThe assembler issues two instructions for the FINIT instruction (an FWAIT instruction followed by an FNINIT instruction), and the processor executes each of these instructions in separately. If an exception is generated for either of these instructions, the save EIP points to the instruction that caused the exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FIST",
"Alias": [
"FISTP"
],
"Brief": "Store Integer",
"Description": "\nThe FIST instruction converts the value in the ST(0) register to a signed integer and stores the result in the desti-nation operand. Values can be stored in word or doubleword integer format. The destination operand specifies the address where the first byte of the destination value is to be stored.\nThe FISTP instruction performs the same operation as the FIST instruction and then pops the register stack. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The FISTP instruction also stores values in quadword integer format.\nThe following table shows the results obtained when storing various classes of numbers in integer format.\nTable 3-37. FIST/FISTP Results\n\n\nST(0)\nDEST\n\n− ∞ or Value Too Large for DEST Format\n*\n\nF ≤ −1\n− I\n\n−1 < F < −0\n**\n\n− 0\n0\n\n+ 0\n0\n\n+ 0 < F < + 1\n**\n\nF ≥ + 1\n+ I\n\n+ ∞ or Value Too Large for DEST Format\n*\n\nNaN\n*\n\n\n\nNOTES:\nF Means finite floating-point value.\nI\nMeans integer.\n*\nIndicates floating-point invalid-operation (#IA) exception.\n** 0 or ±1, depending on the rounding mode.\nIf the source value is a non-integral value, it is rounded to an integer value, according to the rounding mode spec-ified by the RC field of the FPU control word.\nIf the converted value is too large for the destination format, or if the source operand is an ∞, SNaN, QNAN, or is in an unsupported format, an invalid-arithmetic-operand condition is signaled. If the invalid-operation exception is not masked, an invalid-arithmetic-operand exception (#IA) is generated and no value is stored in the destination operand. If the invalid-operation exception is masked, the integer indefinite value is stored in memory.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FISTTP",
"Alias": [],
"Brief": "Store Integer with Truncation",
"Description": "\nFISTTP converts the value in ST into a signed integer using truncation (chop) as rounding mode, transfers the result to the destination, and pop ST. FISTTP accepts word, short integer, and long integer destinations.\nThe following table shows the results obtained when storing various classes of numbers in integer format.\nTable 3-38. FISTTP Results\n\n\nST(0)\nDEST\n\n− ∞ or Value Too Large for DEST Format\n*\n\nF ≤ − 1\n− I\n\n− 1 < F < + 1\n0\n\nF Š + 1\n+ I\n\n+ ∞ or Value Too Large for DEST Format\n*\n\nNaN\n*\nNOTES:\nF Means finite floating-point value.\nΙ\nMeans integer.\n∗ Indicates floating-point invalid-operation (#IA) exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FSUB",
"Alias": [
"FSUBP",
"FISUB"
],
"Brief": "Subtract",
"Description": "\nSubtracts the source operand from the destination operand and stores the difference in the destination location. The destination operand is always an FPU data register; the source operand can be a register or a memory location. Source operands in memory can be in single-precision or double-precision floating-point format or in word or doubleword integer format.\nThe no-operand version of the instruction subtracts the contents of the ST(0) register from the ST(1) register and stores the result in ST(1). The one-operand version subtracts the contents of a memory location (either a floating-point or an integer value) from the contents of the ST(0) register and stores the result in ST(0). The two-operand version, subtracts the contents of the ST(0) register from the ST(i) register or vice versa.\nThe FSUBP instructions perform the additional operation of popping the FPU register stack following the subtrac-tion. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The no-operand version of the floating-point subtract instructions always results in the register stack being popped. In some assemblers, the mnemonic for this instruction is FSUB rather than FSUBP.\nThe FISUB instructions convert an integer source operand to double extended-precision floating-point format before performing the subtraction.\nTable 3-48 shows the results obtained when subtracting various classes of numbers from one another, assuming that neither overflow nor underflow occurs. Here, the SRC value is subtracted from the DEST value (DEST − SRC = result).\nWhen the difference between two operands of like sign is 0, the result is +0, except for the round toward −∞ mode, in which case the result is −0. This instruction also guarantees that +0 − (−0) = +0, and that −0 − (+0) = −0. When the source operand is an integer 0, it is treated as a +0.\nWhen one operand is ∞, the result is ∞ of the expected sign. If both operands are ∞ of the same sign, an invalid-operation exception is generated.\nTable 3-48. FSUB/FSUBP/FISUB Results\n\n\n\n\n\n\nSRC\n\n\n\n\n\n\n\n− ∞\n− F or − I\n− 0\n+ 0\n+ F or + I\n+ ∞\nNaN\n\n\n− ∞\n*\n− ∞\n− ∞\n− ∞\n− ∞\n− ∞\nNaN\n\n\n− F\n+ ∞\n±F or ±0\nDEST\nDEST\n− F\n− ∞\nNaN\n\nDEST\n− 0\n+ ∞\n−SRC\n±0\n− 0\n− SRC\n− ∞\nNaN\n\n\n+ 0\n+ ∞\n−SRC\n+ 0\n±0\n− SRC\n− ∞\nNaN\n\n\n+ F\n+ ∞\n+ F\nDEST\nDEST\n±F or ±0\n− ∞\nNaN\n\n\n+ ∞\n+ ∞\n+ ∞\n+ ∞\n+ ∞\n+ ∞\n*\nNaN\n\n\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\nI\nMeans integer.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FSUBR",
"Alias": [
"FSUBRP",
"FISUBR"
],
"Brief": "Reverse Subtract",
"Description": "\nSubtracts the destination operand from the source operand and stores the difference in the destination location. The destination operand is always an FPU register; the source operand can be a register or a memory location. Source operands in memory can be in single-precision or double-precision floating-point format or in word or doubleword integer format.\nThese instructions perform the reverse operations of the FSUB, FSUBP, and FISUB instructions. They are provided to support more efficient coding.\nThe no-operand version of the instruction subtracts the contents of the ST(1) register from the ST(0) register and stores the result in ST(1). The one-operand version subtracts the contents of the ST(0) register from the contents of a memory location (either a floating-point or an integer value) and stores the result in ST(0). The two-operand version, subtracts the contents of the ST(i) register from the ST(0) register or vice versa.\nThe FSUBRP instructions perform the additional operation of popping the FPU register stack following the subtrac-tion. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The no-operand version of the floating-point reverse subtract instructions always results in the register stack being popped. In some assemblers, the mnemonic for this instruction is FSUBR rather than FSUBRP.\nThe FISUBR instructions convert an integer source operand to double extended-precision floating-point format before performing the subtraction.\nThe following table shows the results obtained when subtracting various classes of numbers from one another, assuming that neither overflow nor underflow occurs. Here, the DEST value is subtracted from the SRC value (SRC − DEST = result).\nWhen the difference between two operands of like sign is 0, the result is +0, except for the round toward −∞ mode, in which case the result is −0. This instruction also guarantees that +0 − (−0) = +0, and that −0 − (+0) = −0. When the source operand is an integer 0, it is treated as a +0.\nWhen one operand is ∞, the result is ∞ of the expected sign. If both operands are ∞ of the same sign, an invalid-operation exception is generated.\nTable 3-49. FSUBR/FSUBRP/FISUBR Results\n\n\n\n\n\n\nSRC\n\n\n\n\n\n\n\n− ∞\n−F or −I\n−0\n+0\n+F or +I\n+ ∞\nNaN\n\n\n− ∞\n*\n+ ∞\n+ ∞\n+ ∞\n+ ∞\n+ ∞\nNaN\n\n\n− F\n− ∞\n±F or ±0\n−DEST\n−DEST\n+ F\n+ ∞\nNaN\n\nDEST\n− 0\n− ∞\nSRC\n±0\n+ 0\nSRC\n+ ∞\nNaN\n\n\n+ 0\n− ∞\nSRC\n− 0\n±0\nSRC\n+ ∞\nNaN\n\n\n+ F\n− ∞\n− F\n−DEST\n−DEST\n±F or ±0\n+ ∞\nNaN\n\n\n+ ∞\n− ∞\n− ∞\n− ∞\n− ∞\n− ∞\n*\nNaN\n\n\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\nI\nMeans integer.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FLD",
"Alias": [],
"Brief": "Load Floating Point Value",
"Description": "\nPushes the source operand onto the FPU register stack. The source operand can be in single-precision, double-precision, or double extended-precision floating-point format. If the source operand is in single-precision or double-precision floating-point format, it is automatically converted to the double extended-precision floating-point format before being pushed on the stack.\nThe FLD instruction can also push the value in a selected FPU register [ST(i)] onto the stack. Here, pushing register ST(0) duplicates the stack top.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FLD1",
"Alias": [
"FLDL2T",
"FLDL2E",
"FLDPI",
"FLDLG2",
"FLDLN2",
"FLDZ"
],
"Brief": "Load Constant",
"Description": "\nPush one of seven commonly used constants (in double extended-precision floating-point format) onto the FPU register stack. The constants that can be loaded with these instructions include +1.0, +0.0, log210, log2e, π, log102, and loge2. For each constant, an internal 66-bit constant is rounded (as specified by the RC field in the FPU control word) to double extended-precision floating-point format. The inexact-result exception (#P) is not generated as a result of the rounding, nor is the C1 flag set in the x87 FPU status word if the value is rounded up.\nSee the section titled “Pi” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for a description of the π constant.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FLDCW",
"Alias": [],
"Brief": "Load x87 FPU Control Word",
"Description": "\nLoads the 16-bit source operand into the FPU control word. The source operand is a memory location. This instruc-tion is typically used to establish or change the FPU’s mode of operation.\nIf one or more exception flags are set in the FPU status word prior to loading a new FPU control word and the new control word unmasks one or more of those exceptions, a floating-point exception will be generated upon execution of the next floating-point instruction (except for the no-wait floating-point instructions, see the section titled “Soft-ware Exception Handling” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1). To avoid raising exceptions when changing FPU operating modes, clear any pending exceptions (using the FCLEX or FNCLEX instruction) before loading the new control word.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FLDENV",
"Alias": [],
"Brief": "Load x87 FPU Environment",
"Description": "\nLoads the complete x87 FPU operating environment from memory into the FPU registers. The source operand spec-ifies the first byte of the operating-environment data in memory. This data is typically written to the specified memory location by a FSTENV or FNSTENV instruction.\nThe FPU operating environment consists of the FPU control word, status word, tag word, instruction pointer, data pointer, and last opcode. Figures 8-9 through 8-12 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, show the layout in memory of the loaded environment, depending on the operating mode of the processor (protected or real) and the current operand-size attribute (16-bit or 32-bit). In virtual-8086 mode, the real mode layouts are used.\nThe FLDENV instruction should be executed in the same operating mode as the corresponding FSTENV/FNSTENV instruction.\nIf one or more unmasked exception flags are set in the new FPU status word, a floating-point exception will be generated upon execution of the next floating-point instruction (except for the no-wait floating-point instructions, see the section titled “Software Exception Handling” in Chapter 8 of the Intel® 64 and IA-32 Architectures Soft-ware Developer’s Manual, Volume 1). To avoid generating exceptions when loading a new environment, clear all the exception flags in the FPU status word that is being loaded.\nIf a page or limit fault occurs during the execution of this instruction, the state of the x87 FPU registers as seen by the fault handler may be different than the state being loaded from memory. In such situations, the fault handler should ignore the status of the x87 FPU registers, handle the fault, and return. The FLDENV instruction will then complete the loading of the x87 FPU registers with no resulting context inconsistency.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FNOP",
"Alias": [],
"Brief": "No Operation",
"Description": "\nPerforms no FPU operation. This instruction takes up space in the instruction stream but does not affect the FPU or machine context, except the EIP register.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FSAVE",
"Alias": [
"FNSAVE"
],
"Brief": "Store x87 FPU State",
"Description": "\nStores the current FPU state (operating environment and register stack) at the specified destination in memory, and then re-initializes the FPU. The FSAVE instruction checks for and handles pending unmasked floating-point exceptions before storing the FPU state; the FNSAVE instruction does not.\nThe FPU operating environment consists of the FPU control word, status word, tag word, instruction pointer, data pointer, and last opcode. Figures 8-9 through 8-12 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, show the layout in memory of the stored environment, depending on the operating mode of the processor (protected or real) and the current operand-size attribute (16-bit or 32-bit). In virtual-8086 mode, the real mode layouts are used. The contents of the FPU register stack are stored in the 80 bytes immediately follow the operating environment image.\nThe saved image reflects the state of the FPU after all floating-point instructions preceding the FSAVE/FNSAVE instruction in the instruction stream have been executed.\nAfter the FPU state has been saved, the FPU is reset to the same default values it is set to with the FINIT/FNINIT instructions (see “FINIT/FNINIT—Initialize Floating-Point Unit” in this chapter).\nThe FSAVE/FNSAVE instructions are typically used when the operating system needs to perform a context switch, an exception handler needs to use the FPU, or an application program needs to pass a “clean” FPU to a procedure.\nThe assembler issues two instructions for the FSAVE instruction (an FWAIT instruction followed by an FNSAVE instruction), and the processor executes each of these instructions separately. If an exception is generated for either of these instructions, the save EIP points to the instruction that caused the exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FSTCW",
"Alias": [
"FNSTCW"
],
"Brief": "Store x87 FPU Control Word",
"Description": "\nStores the current value of the FPU control word at the specified destination in memory. The FSTCW instruction checks for and handles pending unmasked floating-point exceptions before storing the control word; the FNSTCW instruction does not.\nThe assembler issues two instructions for the FSTCW instruction (an FWAIT instruction followed by an FNSTCW instruction), and the processor executes each of these instructions in separately. If an exception is generated for either of these instructions, the save EIP points to the instruction that caused the exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FSTENV",
"Alias": [
"FNSTENV"
],
"Brief": "Store x87 FPU Environment",
"Description": "\nSaves the current FPU operating environment at the memory location specified with the destination operand, and then masks all floating-point exceptions. The FPU operating environment consists of the FPU control word, status word, tag word, instruction pointer, data pointer, and last opcode. Figures 8-9 through 8-12 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, show the layout in memory of the stored environ-ment, depending on the operating mode of the processor (protected or real) and the current operand-size attribute (16-bit or 32-bit). In virtual-8086 mode, the real mode layouts are used.\nThe FSTENV instruction checks for and handles any pending unmasked floating-point exceptions before storing the FPU environment; the FNSTENV instruction does not. The saved image reflects the state of the FPU after all floating-point instructions preceding the FSTENV/FNSTENV instruction in the instruction stream have been executed.\nThese instructions are often used by exception handlers because they provide access to the FPU instruction and data pointers. The environment is typically saved in the stack. Masking all exceptions after saving the environment prevents floating-point exceptions from interrupting the exception handler.\nThe assembler issues two instructions for the FSTENV instruction (an FWAIT instruction followed by an FNSTENV instruction), and the processor executes each of these instructions separately. If an exception is generated for either of these instructions, the save EIP points to the instruction that caused the exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FSTSW",
"Alias": [
"FNSTSW"
],
"Brief": "Store x87 FPU Status Word",
"Description": "\nStores the current value of the x87 FPU status word in the destination location. The destination operand can be either a two-byte memory location or the AX register. The FSTSW instruction checks for and handles pending unmasked floating-point exceptions before storing the status word; the FNSTSW instruction does not.\nThe FNSTSW AX form of the instruction is used primarily in conditional branching (for instance, after an FPU comparison instruction or an FPREM, FPREM1, or FXAM instruction), where the direction of the branch depends on the state of the FPU condition code flags. (See the section titled “Branching and Conditional Moves on FPU Condi-tion Codes” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.) This instruction can also be used to invoke exception handlers (by examining the exception flags) in environments that do not use interrupts. When the FNSTSW AX instruction is executed, the AX register is updated before the processor executes any further instructions. The status stored in the AX register is thus guaranteed to be from the completion of the prior FPU instruction.\nThe assembler issues two instructions for the FSTSW instruction (an FWAIT instruction followed by an FNSTSW instruction), and the processor executes each of these instructions separately. If an exception is generated for either of these instructions, the save EIP points to the instruction that caused the exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FPATAN",
"Alias": [],
"Brief": "Partial Arctangent",
"Description": "\nComputes the arctangent of the source operand in register ST(1) divided by the source operand in register ST(0), stores the result in ST(1), and pops the FPU register stack. The result in register ST(0) has the same sign as the source operand ST(1) and a magnitude less than +π.\nThe FPATAN instruction returns the angle between the X axis and the line from the origin to the point (X,Y), where Y (the ordinate) is ST(1) and X (the abscissa) is ST(0). The angle depends on the sign of X and Y independently, not just on the sign of the ratio Y/X. This is because a point (−X,Y) is in the second quadrant, resulting in an angle between π/2 and π, while a point (X,−Y) is in the fourth quadrant, resulting in an angle between 0 and −π/2. A point (−X,−Y) is in the third quadrant, giving an angle between −π/2 and −π.\nThe following table shows the results obtained when computing the arctangent of various classes of numbers, assuming that underflow does not occur.\nTable 3-40. FPATAN Results\n\n\n\n\n\n\nST(0)\n\n\n\n\n\n\n\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n\n\n− ∞\n− 3π/4*\n− π/2\n− π/2\n− π/2\n− π/2\n− π/4*\nNaN\n\nST(1)\n− F\n-p\n−π to −π/2\n−π/2\n−π/2\n−π/2 to −0\n- 0\nNaN\n\n\n− 0\n-p\n-p\n-p*\n− 0*\n− 0\n− 0\nNaN\n\n\n+ 0\n+p\n+ p\n+ π*\n+ 0*\n+ 0\n+ 0\nNaN\n\n\n+ F\n+p\n+π to +π/2\n+ π/2\n+π/2\n+π/2 to +0\n+ 0\nNaN\n\n\n+ ∞\n+3π/4*\n+π/2\n+π/2\n+π/2\n+ π/2\n+ π/4*\nNaN\n\n\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nTable 8-10 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, specifies that the ratios 0/0 and ∞/∞ generate the floating-point invalid arithmetic-operation exception and, if this exception is masked, the floating-point QNaN indefi-nite value is returned. With the FPATAN instruction, the 0/0 or ∞/∞ value is actually not calculated using division. Instead, the arc-tangent of the two variables is derived from a standard mathematical formulation that is generalized to allow complex numbers as arguments. In this complex variable formulation, arctangent(0,0) etc. has well defined values. These values are needed to develop a library to compute transcendental functions with complex arguments, based on the FPU functions that only allow floating-point values as arguments.\nThere is no restriction on the range of source operands that FPATAN can accept.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FPREM",
"Alias": [],
"Brief": "Partial Remainder",
"Description": "\nComputes the remainder obtained from dividing the value in the ST(0) register (the dividend) by the value in the ST(1) register (the divisor or modulus), and stores the result in ST(0). The remainder represents the following value:\nRemainder ← ST(0) − (Q ∗ ST(1))\nHere, Q is an integer value that is obtained by truncating the floating-point number quotient of [ST(0) / ST(1)] toward zero. The sign of the remainder is the same as the sign of the dividend. The magnitude of the remainder is less than that of the modulus, unless a partial remainder was computed (as described below).\nThis instruction produces an exact result; the inexact-result exception does not occur and the rounding control has no effect. The following table shows the results obtained when computing the remainder of various classes of numbers, assuming that underflow does not occur.\nTable 3-41. FPREM Results\n\n\n\n\n\n\nST(1)\n\n\n\n\n\n\n\n-∞\n-F\n-0\n+0\n+F\n+∞\nNaN\n\n\n-∞\n*\n*\n*\n*\n*\n*\nNaN\n\nST(0)\n-F\nST(0)\n-F or -0\n**\n**\n-F or -0\nST(0)\nNaN\n\n\n-0\n-0\n-0\n*\n*\n-0\n-0\nNaN\n\n\n+0\n+0\n+0\n*\n*\n+0\n+0\nNaN\n\n\n+F\nST(0)\n+F or +0\n**\n**\n+F or +0\nST(0)\nNaN\n\n\n+∞\n*\n*\n*\n*\n*\n*\nNaN\n\n\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\n** Indicates floating-point zero-divide (#Z) exception.\nWhen the result is 0, its sign is the same as that of the dividend. When the modulus is ∞, the result is equal to the value in ST(0).\nThe FPREM instruction does not compute the remainder specified in IEEE Std 754. The IEEE specified remainder can be computed with the FPREM1 instruction. The FPREM instruction is provided for compatibility with the Intel 8087 and Intel287 math coprocessors.\nThe FPREM instruction gets its name “partial remainder” because of the way it computes the remainder. This instruction arrives at a remainder through iterative subtraction. It can, however, reduce the exponent of ST(0) by no more than 63 in one execution of the instruction. If the instruction succeeds in producing a remainder that is less than the modulus, the operation is complete and the C2 flag in the FPU status word is cleared. Otherwise, C2 is set, and the result in ST(0) is called the partial remainder. The exponent of the partial remainder will be less than the exponent of the original dividend by at least 32. Software can re-execute the instruction (using the partial remainder in ST(0) as the dividend) until C2 is cleared. (Note that while executing such a remainder-computation loop, a higher-priority interrupting routine that needs the FPU can force a context switch in-between the instruc-tions in the loop.)\nAn important use of the FPREM instruction is to reduce the arguments of periodic functions. When reduction is complete, the instruction stores the three least-significant bits of the quotient in the C3, C1, and C0 flags of the FPU\nstatus word. This information is important in argument reduction for the tangent function (using a modulus of π/4), because it locates the original angle in the correct one of eight sectors of the unit circle.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FPREM1",
"Alias": [],
"Brief": "Partial Remainder",
"Description": "\nComputes the IEEE remainder obtained from dividing the value in the ST(0) register (the dividend) by the value in the ST(1) register (the divisor or modulus), and stores the result in ST(0). The remainder represents the following value:\nRemainder ← ST(0) − (Q ∗ ST(1))\nHere, Q is an integer value that is obtained by rounding the floating-point number quotient of [ST(0) / ST(1)] toward the nearest integer value. The magnitude of the remainder is less than or equal to half the magnitude of the modulus, unless a partial remainder was computed (as described below).\nThis instruction produces an exact result; the precision (inexact) exception does not occur and the rounding control has no effect. The following table shows the results obtained when computing the remainder of various classes of numbers, assuming that underflow does not occur.\nTable 3-42. FPREM1 Results\n\n\n\n\n\n\nST(1)\n\n\n\n\n\n\n\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n\n\n− ∞\n*\n*\n*\n*\n*\n*\nNaN\n\nST(0)\n− F\nST(0)\n±F or −0\n**\n**\n± F or − 0\nST(0)\nNaN\n\n\n− 0\n− 0\n− 0\n*\n*\n− 0\n-0\nNaN\n\n\n+ 0\n+ 0\n+ 0\n*\n*\n+ 0\n+0\nNaN\n\n\n+ F\nST(0)\n± F or + 0\n**\n**\n± F or + 0\nST(0)\nNaN\n\n\n+ ∞\n*\n*\n*\n*\n*\n*\nNaN\n\n\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\n** Indicates floating-point zero-divide (#Z) exception.\nWhen the result is 0, its sign is the same as that of the dividend. When the modulus is ∞, the result is equal to the value in ST(0).\nThe FPREM1 instruction computes the remainder specified in IEEE Standard 754. This instruction operates differ-ently from the FPREM instruction in the way that it rounds the quotient of ST(0) divided by ST(1) to an integer (see the “Operation” section below).\nLike the FPREM instruction, FPREM1 computes the remainder through iterative subtraction, but can reduce the exponent of ST(0) by no more than 63 in one execution of the instruction. If the instruction succeeds in producing a remainder that is less than one half the modulus, the operation is complete and the C2 flag in the FPU status word is cleared. Otherwise, C2 is set, and the result in ST(0) is called the partial remainder. The exponent of the partial remainder will be less than the exponent of the original dividend by at least 32. Software can re-execute the instruction (using the partial remainder in ST(0) as the dividend) until C2 is cleared. (Note that while executing such a remainder-computation loop, a higher-priority interrupting routine that needs the FPU can force a context switch in-between the instructions in the loop.)\nAn important use of the FPREM1 instruction is to reduce the arguments of periodic functions. When reduction is complete, the instruction stores the three least-significant bits of the quotient in the C3, C1, and C0 flags of the FPU\nstatus word. This information is important in argument reduction for the tangent function (using a modulus of π/4), because it locates the original angle in the correct one of eight sectors of the unit circle.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FPTAN",
"Alias": [],
"Brief": "Partial Tangent",
"Description": "\nComputes the tangent of the source operand in register ST(0), stores the result in ST(0), and pushes a 1.0 onto the FPU register stack. The source operand must be given in radians and must be less than ±263. The following table shows the unmasked results obtained when computing the partial tangent of various classes of numbers, assuming that underflow does not occur.\nTable 3-43. FPTAN Results\n\n\nST(0) SRC\nST(0) DEST\n\n− ∞\n*\n\n− F\n− F to + F\n\n− 0\n- 0\n\n+ 0\n+ 0\n\n+ F\n− F to + F\n\n+ ∞\n*\n\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\nIf the source operand is outside the acceptable range, the C2 flag in the FPU status word is set, and the value in register ST(0) remains unchanged. The instruction does not raise an exception when the source operand is out of range. It is up to the program to check the C2 flag for out-of-range conditions. Source values outside the range − 263 to +263 can be reduced to the range of the instruction by subtracting an appropriate integer multiple of 2π or by using the FPREM instruction with a divisor of 2π. See the section titled “Pi” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for a discussion of the proper value to use for π in performing such reductions.\nThe value 1.0 is pushed onto the register stack after the tangent has been computed to maintain compatibility with the Intel 8087 and Intel287 math coprocessors. This operation also simplifies the calculation of other trigonometric functions. For instance, the cotangent (which is the reciprocal of the tangent) can be computed by executing a FDIVR instruction after the FPTAN instruction.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FRNDINT",
"Alias": [],
"Brief": "Round to Integer",
"Description": "\nRounds the source value in the ST(0) register to the nearest integral value, depending on the current rounding mode (setting of the RC field of the FPU control word), and stores the result in ST(0).\nIf the source value is ∞, the value is not changed. If the source value is not an integral value, the floating-point inexact-result exception (#P) is generated.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FRSTOR",
"Alias": [],
"Brief": "Restore x87 FPU State",
"Description": "\nLoads the FPU state (operating environment and register stack) from the memory area specified with the source operand. This state data is typically written to the specified memory location by a previous FSAVE/FNSAVE instruc-tion.\nThe FPU operating environment consists of the FPU control word, status word, tag word, instruction pointer, data pointer, and last opcode. Figures 8-9 through 8-12 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, show the layout in memory of the stored environment, depending on the operating mode of the processor (protected or real) and the current operand-size attribute (16-bit or 32-bit). In virtual-8086 mode, the real mode layouts are used. The contents of the FPU register stack are stored in the 80 bytes immediately following the operating environment image.\nThe FRSTOR instruction should be executed in the same operating mode as the corresponding FSAVE/FNSAVE instruction.\nIf one or more unmasked exception bits are set in the new FPU status word, a floating-point exception will be generated. To avoid raising exceptions when loading a new operating environment, clear all the exception flags in the FPU status word that is being loaded.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FSCALE",
"Alias": [],
"Brief": "Scale",
"Description": "\nTruncates the value in the source operand (toward 0) to an integral value and adds that value to the exponent of the destination operand. The destination and source operands are floating-point values located in registers ST(0) and ST(1), respectively. This instruction provides rapid multiplication or division by integral powers of 2. The following table shows the results obtained when scaling various classes of numbers, assuming that neither overflow nor underflow occurs.\nTable 3-44. FSCALE Results\n\n\nST(1)\nST(1)\n\n\n\n\n\n\n\n\n\n\n− ∞\n− F\n− 0\n+ 0\n+ F\n+ ∞\nNaN\n\n\n− ∞\nNaN\n− ∞\n− ∞\n− ∞\n− ∞\n− ∞\nNaN\n\nST(0)\n− F\n− 0\n− F\n− F\n− F\n− F\n− ∞\nNaN\n\n\n− 0\n− 0\n− 0\n− 0\n− 0\n− 0\nNaN\nNaN\n\n\n+ 0\n+ 0\n+ 0\n+ 0\n+ 0\n+ 0\nNaN\nNaN\n\n\n+ F\n+ 0\n+ F\n+ F\n+ F\n+ F\n+ ∞\nNaN\n\n\n+ ∞\nNaN\n+ ∞\n+ ∞\n+ ∞\n+ ∞\n+ ∞\nNaN\n\n\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\nIn most cases, only the exponent is changed and the mantissa (significand) remains unchanged. However, when the value being scaled in ST(0) is a denormal value, the mantissa is also changed and the result may turn out to be a normalized number. Similarly, if overflow or underflow results from a scale operation, the resulting mantissa will differ from the source’s mantissa.\nThe FSCALE instruction can also be used to reverse the action of the FXTRACT instruction, as shown in the following example:\nFXTRACT;\nFSCALE;\nFSTP ST(1);\nIn this example, the FXTRACT instruction extracts the significand and exponent from the value in ST(0) and stores them in ST(0) and ST(1) respectively. The FSCALE then scales the significand in ST(0) by the exponent in ST(1), recreating the original value before the FXTRACT operation was performed. The FSTP ST(1) instruction overwrites the exponent (extracted by the FXTRACT instruction) with the recreated value, which returns the stack to its orig-inal state with only one register [ST(0)] occupied.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FSIN",
"Alias": [],
"Brief": "Sine",
"Description": "\nComputes the sine of the source operand in register ST(0) and stores the result in ST(0). The source operand must be given in radians and must be within the range −263 to +263. The following table shows the results obtained when taking the sine of various classes of numbers, assuming that underflow does not occur.\nTable 3-45. FSIN Results\n\n\nSRC (ST(0))\nDEST (ST(0))\n\n− ∞\n*\n\n− F\n− 1 to + 1\n\n− 0\n−0\n\n+ 0\n+ 0\n\n+ F\n− 1 to +1\n\n+ ∞\n*\n\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\nIf the source operand is outside the acceptable range, the C2 flag in the FPU status word is set, and the value in register ST(0) remains unchanged. The instruction does not raise an exception when the source operand is out of range. It is up to the program to check the C2 flag for out-of-range conditions. Source values outside the range − 263 to +263 can be reduced to the range of the instruction by subtracting an appropriate integer multiple of 2π or by using the FPREM instruction with a divisor of 2π. See the section titled “Pi” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for a discussion of the proper value to use for π in performing such reductions.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FSINCOS",
"Alias": [],
"Brief": "Sine and Cosine",
"Description": "\nComputes both the sine and the cosine of the source operand in register ST(0), stores the sine in ST(0), and pushes the cosine onto the top of the FPU register stack. (This instruction is faster than executing the FSIN and FCOS instructions in succession.)\nThe source operand must be given in radians and must be within the range −263 to +263. The following table shows the results obtained when taking the sine and cosine of various classes of numbers, assuming that underflow does not occur.\nTable 3-46. FSINCOS Results\nSRC\nDEST\n\n\n\n\n\n\nST(0)\nST(1) Cosine\nST(0) Sine\n\n− ∞\n*\n*\n\n− F\n− 1 to + 1\n− 1 to + 1\n\n− 0\n+ 1\n− 0\n\n+ 0\n+ 1\n+ 0\n\n+ F\n− 1 to + 1\n− 1 to + 1\n\n+ ∞\n*\n*\n\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\nIf the source operand is outside the acceptable range, the C2 flag in the FPU status word is set, and the value in register ST(0) remains unchanged. The instruction does not raise an exception when the source operand is out of range. It is up to the program to check the C2 flag for out-of-range conditions. Source values outside the range − 263 to +263 can be reduced to the range of the instruction by subtracting an appropriate integer multiple of 2π or by using the FPREM instruction with a divisor of 2π. See the section titled “Pi” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for a discussion of the proper value to use for π in performing such reductions.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FSQRT",
"Alias": [],
"Brief": "Square Root",
"Description": "\nComputes the square root of the source value in the ST(0) register and stores the result in ST(0).\nThe following table shows the results obtained when taking the square root of various classes of numbers, assuming that neither overflow nor underflow occurs.\nTable 3-47. FSQRT Results\n\n\nSRC (ST(0))\nDEST (ST(0))\n\n− ∞\n*\n\n− F\n*\n\n− 0\n− 0\n\n+ 0\n+ 0\n\n+ F\n+ F\n\n+ ∞\n+ ∞\n\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nIndicates floating-point invalid-arithmetic-operand (#IA) exception.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FST",
"Alias": [
"FSTP"
],
"Brief": "Store Floating Point Value",
"Description": "\nThe FST instruction copies the value in the ST(0) register to the destination operand, which can be a memory loca-tion or another register in the FPU register stack. When storing the value in memory, the value is converted to single-precision or double-precision floating-point format.\nThe FSTP instruction performs the same operation as the FST instruction and then pops the register stack. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. The FSTP instruction can also store values in memory in double extended-precision floating-point format.\nIf the destination operand is a memory location, the operand specifies the address where the first byte of the desti-nation value is to be stored. If the destination operand is a register, the operand specifies a register in the register stack relative to the top of the stack.\nIf the destination size is single-precision or double-precision, the significand of the value being stored is rounded to the width of the destination (according to the rounding mode specified by the RC field of the FPU control word), and the exponent is converted to the width and bias of the destination format. If the value being stored is too large for the destination format, a numeric overflow exception (#O) is generated and, if the exception is unmasked, no value is stored in the destination operand. If the value being stored is a denormal value, the denormal exception (#D) is not generated. This condition is simply signaled as a numeric underflow exception (#U) condition.\nIf the value being stored is ±0, ±∞, or a NaN, the least-significant bits of the significand and the exponent are trun-cated to fit the destination format. This operation preserves the value’s identity as a 0, ∞, or NaN.\nIf the destination operand is a non-empty register, the invalid-operation exception is not generated.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FTST",
"Alias": [],
"Brief": "TEST",
"Description": "\nCompares the value in the ST(0) register with 0.0 and sets the condition code flags C0, C2, and C3 in the FPU status word according to the results (see table below).\nTable 3-50. FTST Results\n\n\nCondition\nC3\nC2\nC0\n\nST(0) > 0.0\n0\n0\n0\n\nST(0) < 0.0\n0\n0\n1\n\nST(0) = 0.0\n1\n0\n0\n\nUnordered\n1\n1\n1\nThis instruction performs an “unordered comparison.” An unordered comparison also checks the class of the numbers being compared (see “FXAM—Examine ModR/M” in this chapter). If the value in register ST(0) is a NaN or is in an undefined format, the condition flags are set to “unordered” and the invalid operation exception is gener-ated.\nThe sign of zero is ignored, so that (– 0.0 ← +0.0).\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FUCOM",
"Alias": [
"FUCOMP",
"FUCOMPP"
],
"Brief": "Unordered Compare Floating Point Values",
"Description": "\nPerforms an unordered comparison of the contents of register ST(0) and ST(i) and sets condition code flags C0, C2, and C3 in the FPU status word according to the results (see the table below). If no operand is specified, the contents of registers ST(0) and ST(1) are compared. The sign of zero is ignored, so that –0.0 is equal to +0.0.\nTable 3-51. FUCOM/FUCOMP/FUCOMPP Results\n\n\nComparison Results*\nC3\nC2\nC0\n\nST0 > ST(i)\n0\n0\n0\n\nST0 < ST(i)\n0\n0\n1\n\nST0 = ST(i)\n1\n0\n0\n\nUnordered\n1\n1\n1\nNOTES:\n*\nFlags not set if unmasked invalid-arithmetic-operand (#IA) exception is generated.\nAn unordered comparison checks the class of the numbers being compared (see “FXAM—Examine ModR/M” in this chapter). The FUCOM/FUCOMP/FUCOMPP instructions perform the same operations as the FCOM/FCOMP/FCOMPP instructions. The only difference is that the FUCOM/FUCOMP/FUCOMPP instructions raise the invalid-arithmetic-operand exception (#IA) only when either or both operands are an SNaN or are in an unsupported format; QNaNs cause the condition code flags to be set to unordered, but do not cause an exception to be generated. The FCOM/FCOMP/FCOMPP instructions raise an invalid-operation exception when either or both of the operands are a NaN value of any kind or are in an unsupported format.\nAs with the FCOM/FCOMP/FCOMPP instructions, if the operation results in an invalid-arithmetic-operand exception being raised, the condition code flags are set only if the exception is masked.\nThe FUCOMP instruction pops the register stack following the comparison operation and the FUCOMPP instruction pops the register stack twice following the comparison operation. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "WAIT",
"Alias": [
"FWAIT"
],
"Brief": "Wait",
"Description": "\nCauses the processor to check for and handle pending, unmasked, floating-point exceptions before proceeding. (FWAIT is an alternate mnemonic for WAIT.)\nThis instruction is useful for synchronizing exceptions in critical sections of code. Coding a WAIT instruction after a floating-point instruction ensures that any unmasked floating-point exceptions the instruction may raise are handled before the processor can modify the instruction’s results. See the section titled “Floating-Point Exception Synchronization” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for more information on using the WAIT/FWAIT instruction.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FXAM",
"Alias": [],
"Brief": "Examine ModR/M",
"Description": "\nExamines the contents of the ST(0) register and sets the condition code flags C0, C2, and C3 in the FPU status word to indicate the class of value or number in the register (see the table below).\nTable 3-52. FXAM Results\n.\n\n\nClass\nC3\nC2\nC0\n\nUnsupported\n0\n0\n0\n\nNaN\n0\n0\n1\n\nNormal finite number\n0\n1\n0\n\nInfinity\n0\n1\n1\n\nZero\n1\n0\n0\n\nEmpty\n1\n0\n1\n\nDenormal number\n1\n1\n0\nThe C1 flag is set to the sign of the value in ST(0), regardless of whether the register is empty or full.\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FXCH",
"Alias": [],
"Brief": "Exchange Register Contents",
"Description": "\nExchanges the contents of registers ST(0) and ST(i). If no source operand is specified, the contents of ST(0) and ST(1) are exchanged.\nThis instruction provides a simple means of moving values in the FPU register stack to the top of the stack [ST(0)], so that they can be operated on by those floating-point instructions that can only operate on values in ST(0). For example, the following instruction sequence takes the square root of the third register from the top of the register stack:\nFXCH ST(3);\nFSQRT;\nFXCH ST(3);\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FXRSTOR",
"Alias": [],
"Brief": "Restore x87 FPU, MMX, XMM, and MXCSR State",
"Description": "\nReloads the x87 FPU, MMX technology, XMM, and MXCSR registers from the 512-byte memory image specified in the source operand. This data should have been written to memory previously using the FXSAVE instruction, and in the same format as required by the operating modes. The first byte of the data should be located on a 16-byte boundary. There are three distinct layouts of the FXSAVE state map: one for legacy and compatibility mode, a second format for 64-bit mode FXSAVE/FXRSTOR with REX.W=0, and the third format is for 64-bit mode with FXSAVE64/FXRSTOR64. Table 3-53 shows the layout of the legacy/compatibility mode state information in memory and describes the fields in the memory image for the FXRSTOR and FXSAVE instructions. Table 3-56 shows the layout of the 64-bit mode state information when REX.W is set (FXSAVE64/FXRSTOR64). Table 3-57 shows the layout of the 64-bit mode state information when REX.W is clear (FXSAVE/FXRSTOR).\nThe state image referenced with an FXRSTOR instruction must have been saved using an FXSAVE instruction or be in the same format as required by Table 3-53, Table 3-56, or Table 3-57. Referencing a state image saved with an FSAVE, FNSAVE instruction or incompatible field layout will result in an incorrect state restoration.\nThe FXRSTOR instruction does not flush pending x87 FPU exceptions. To check and raise exceptions when loading x87 FPU state information with the FXRSTOR instruction, use an FWAIT instruction after the FXRSTOR instruction.\nIf the OSFXSR bit in control register CR4 is not set, the FXRSTOR instruction may not restore the states of the XMM and MXCSR registers. This behavior is implementation dependent.\nIf the MXCSR state contains an unmasked exception with a corresponding status flag also set, loading the register with the FXRSTOR instruction will not result in a SIMD floating-point error condition being generated. Only the next occurrence of this unmasked exception will result in the exception being generated.\nBits 16 through 32 of the MXCSR register are defined as reserved and should be set to 0. Attempting to write a 1 in any of these bits from the saved state image will result in a general protection exception (#GP) being generated.\nBytes 464:511 of an FXSAVE image are available for software use. FXRSTOR ignores the content of bytes 464:511 in an FXSAVE state image.\n"
},
{
"Name": "FXSAVE",
"Alias": [],
"Brief": "Save x87 FPU, MMX Technology, and SSE State",
"Description": "\nSaves the current state of the x87 FPU, MMX technology, XMM, and MXCSR registers to a 512-byte memory loca-tion specified in the destination operand. The content layout of the 512 byte region depends on whether the processor is operating in non-64-bit operating modes or 64-bit sub-mode of IA-32e mode.\nBytes 464:511 are available to software use. The processor does not write to bytes 464:511 of an FXSAVE area.\nThe operation of FXSAVE in non-64-bit modes is described first.\n"
},
{
"Name": "FXTRACT",
"Alias": [],
"Brief": "Extract Exponent and Significand",
"Description": "\nSeparates the source value in the ST(0) register into its exponent and significand, stores the exponent in ST(0), and pushes the significand onto the register stack. Following this operation, the new top-of-stack register ST(0) contains the value of the original significand expressed as a floating-point value. The sign and significand of this value are the same as those found in the source operand, and the exponent is 3FFFH (biased value for a true expo-nent of zero). The ST(1) register contains the value of the original operand’s true (unbiased) exponent expressed as a floating-point value. (The operation performed by this instruction is a superset of the IEEE-recommended logb(x) function.)\nThis instruction and the F2XM1 instruction are useful for performing power and range scaling operations. The FXTRACT instruction is also useful for converting numbers in double extended-precision floating-point format to decimal representations (e.g., for printing or displaying).\nIf the floating-point zero-divide exception (#Z) is masked and the source operand is zero, an exponent value of – ∞ is stored in register ST(1) and 0 with the sign of the source operand is stored in register ST(0).\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{
"Name": "FYL2X",
"Alias": [],
"Brief": "Compute y ∗ log2x",
"Description": "\nComputes (ST(1) ∗ log2 (ST(0))), stores the result in resister ST(1), and pops the FPU register stack. The source operand in ST(0) must be a non-zero positive number.\nThe following table shows the results obtained when taking the log of various classes of numbers, assuming that neither overflow nor underflow occurs.\nTable 3-58. FYL2X Results\n\n\n\n\n\n\n\nST(0)\n\n\n\n\n\n\n\n− ∞\n− F\n±0\n+0<+F<+1\n+ 1\n+ F > + 1\n+ ∞\nNaN\n\n\n− ∞\n*\n*\n+ ∞\n+ ∞\n*\n− ∞\n− ∞\nNaN\n\nST(1)\n− F\n*\n*\n**\n+ F\n− 0\n− F\n− ∞\nNaN\n\n\n− 0\n*\n*\n*\n+ 0\n− 0\n− 0\n*\nNaN\n\n\n+ 0\n*\n*\n*\n− 0\n+ 0\n+ 0\n*\nNaN\n\n\n+ F\n*\n*\n**\n− F\n+ 0\n+ F\n+ ∞\nNaN\n\n\n+ ∞\n*\n*\n− ∞\n− ∞\n*\n+ ∞\n+ ∞\nNaN\n\n\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNaN\nNOTES:\nF Means finite floating-point value.\n*\nIndicates floating-point invalid-operation (#IA) exception.\n** Indicates floating-point zero-divide (#Z) exception.\nIf the divide-by-zero exception is masked and register ST(0) contains ±0, the instruction returns ∞ with a sign that is the opposite of the sign of the source operand in register ST(1).\nThe FYL2X instruction is designed with a built-in multiplication to optimize the calculation of logarithms with an arbitrary positive base (b):\nlogbx ← (log2b)–1 ∗ log2x\nThis instruction’s operation is the same in non-64-bit modes and 64-bit mode.\n"
},
{