Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1: Basic Architecture

✍ Scribed by coll.

Year: 2019
Tongue: English
Leaves: 4922
Category: Library

No coin nor oath required. For personal study only.

✦ Table of Contents

Volume 1:Basic Architecture
Chapter 1 About This Manual
1.1 Intel® 64 and IA-32 Processors Covered in this Manual
1.2 Overview of Volume 1: Basic Architecture
1.3 Notational Conventions
1.3.1 Bit and Byte Order
1.3.2 Reserved Bits and Software Compatibility
1.3.2.1 Instruction Operands
1.3.3 Hexadecimal and Binary Numbers
1.3.4 Segmented Addressing
1.3.5 A New Syntax for CPUID, CR, and MSR Values
1.3.6 Exceptions
1.4 Related Literature
Chapter 2 Intel® 64 and IA-32 Architectures
2.1 Brief History of Intel® 64 and IA-32 Architecture
2.1.1 16-bit Processors and Segmentation (1978)
2.1.2 The Intel® 286 Processor (1982)
2.1.3 The Intel386™ Processor (1985)
2.1.4 The Intel486™ Processor (1989)
2.1.5 The Intel® Pentium® Processor (1993)
2.1.6 The P6 Family of Processors (1995-1999)
2.1.7 The Intel® Pentium® 4 Processor Family (2000-2006)
2.1.8 The Intel® Xeon® Processor (2001- 2007)
2.1.9 The Intel® Pentium® M Processor (2003-2006)
2.1.10 The Intel® Pentium® Processor Extreme Edition (2005)
2.1.11 The Intel® Core™ Duo and Intel® Core™ Solo Processors (2006-2007)
2.1.12 The Intel® Xeon® Processor 5100, 5300 Series and Intel® Core™2 Processor Family (2006)
2.1.13 The Intel® Xeon® Processor 5200, 5400, 7400 Series and Intel® Core™2 Processor Family (2007)
2.1.14 The Intel® Atom™ Processor Family (2008)
2.1.15 The Intel® Atom™ Processor Family Based on Silvermont Microarchitecture (2013)
2.1.16 The Intel® Core™i7 Processor Family (2008)
2.1.17 The Intel® Xeon® Processor 7500 Series (2010)
2.1.18 2010 Intel® Core™ Processor Family (2010)
2.1.19 The Intel® Xeon® Processor 5600 Series (2010)
2.1.20 The Second Generation Intel® Core™ Processor Family (2011)
2.1.21 The Third Generation Intel® Core™ Processor Family (2012)
2.1.22 The Fourth Generation Intel® Core™ Processor Family (2013)
2.2 More on SPECIFIC advances
2.2.1 P6 Family Microarchitecture
2.2.2 Intel NetBurst® Microarchitecture
2.2.2.1 The Front End Pipeline
2.2.2.2 Out-Of-Order Execution Core
2.2.2.3 Retirement Unit
2.2.3 Intel® Core™ Microarchitecture
2.2.3.1 The Front End
2.2.3.2 Execution Core
2.2.4 Intel® Atom™ Microarchitecture
2.2.5 Intel® Microarchitecture Code Name Nehalem
2.2.6 Intel® Microarchitecture Code Name Sandy Bridge
2.2.7 SIMD Instructions
2.2.8 Intel® Hyper-Threading Technology
2.2.8.1 Some Implementation Notes
2.2.9 Multi-Core Technology
2.2.10 Intel® 64 Architecture
2.2.11 Intel® Virtualization Technology (Intel® VT)
2.3 Intel® 64 and IA-32 processor generations
2.4 Proposed Removal of Intel Instruction Set ARchitecture and Features from Upcoming Products
2.5 Intel Instruction Set Architecture and Features Removed
Chapter 3 Basic Execution Environment
3.1 Modes of Operation
3.1.1 Intel® 64 Architecture
3.2 Overview of the Basic Execution Environment
3.2.1 64-Bit Mode Execution Environment
3.3 Memory Organization
3.3.1 IA-32 Memory Models
3.3.2 Paging and Virtual Memory
3.3.3 Memory Organization in 64-Bit Mode
3.3.4 Modes of Operation vs. Memory Model
3.3.5 32-Bit and 16-Bit Address and Operand Sizes
3.3.6 Extended Physical Addressing in Protected Mode
3.3.7 Address Calculations in 64-Bit Mode
3.3.7.1 Canonical Addressing
3.4 Basic Program Execution Registers
3.4.1 General-Purpose Registers
3.4.1.1 General-Purpose Registers in 64-Bit Mode
3.4.2 Segment Registers
3.4.2.1 Segment Registers in 64-Bit Mode
3.4.3 EFLAGS Register
3.4.3.1 Status Flags
3.4.3.2 DF Flag
3.4.3.3 System Flags and IOPL Field
3.4.3.4 RFLAGS Register in 64-Bit Mode
3.5 Instruction Pointer
3.5.1 Instruction Pointer in 64-Bit Mode
3.6 Operand-Size and Address-Size Attributes
3.6.1 Operand Size and Address Size in 64-Bit Mode
3.7 Operand Addressing
3.7.1 Immediate Operands
3.7.2 Register Operands
3.7.2.1 Register Operands in 64-Bit Mode
3.7.3 Memory Operands
3.7.3.1 Memory Operands in 64-Bit Mode
3.7.4 Specifying a Segment Selector
3.7.4.1 Segmentation in 64-Bit Mode
3.7.5 Specifying an Offset
3.7.5.1 Specifying an Offset in 64-Bit Mode
3.7.6 Assembler and Compiler Addressing Modes
3.7.7 I/O Port Addressing
Chapter 4 Data Types
4.1 Fundamental Data Types
4.1.1 Alignment of Words, Doublewords, Quadwords, and Double Quadwords
4.2 Numeric Data Types
4.2.1 Integers
4.2.1.1 Unsigned Integers
4.2.1.2 Signed Integers
4.2.2 Floating-Point Data Types
4.3 Pointer Data Types
4.3.1 Pointer Data Types in 64-Bit Mode
4.4 Bit Field Data Type
4.5 String Data Types
4.6 Packed SIMD Data Types
4.6.1 64-Bit SIMD Packed Data Types
4.6.2 128-Bit Packed SIMD Data Types
4.7 BCD and Packed BCD Integers
4.8 Real Numbers and Floating-Point Formats
4.8.1 Real Number System
4.8.2 Floating-Point Format
4.8.2.1 Normalized Numbers
4.8.2.2 Biased Exponent
4.8.3 Real Number and Non-number Encodings
4.8.3.1 Signed Zeros
4.8.3.2 Normalized and Denormalized Finite Numbers
4.8.3.3 Signed Infinities
4.8.3.4 NaNs
4.8.3.5 Operating on SNaNs and QNaNs
4.8.3.6 Using SNaNs and QNaNs in Applications
4.8.3.7 QNaN Floating-Point Indefinite
4.8.3.8 Half-Precision Floating-Point Operation
4.8.4 Rounding
4.8.4.1 Rounding Control (RC) Fields
4.8.4.2 Truncation with SSE and SSE2 Conversion Instructions
4.9 Overview of Floating-Point Exceptions
4.9.1 Floating-Point Exception Conditions
4.9.1.1 Invalid Operation Exception (#I)
4.9.1.2 Denormal Operand Exception (#D)
4.9.1.3 Divide-By-Zero Exception (#Z)
4.9.1.4 Numeric Overflow Exception (#O)
4.9.1.5 Numeric Underflow Exception (#U)
4.9.1.6 Inexact-Result (Precision) Exception (#P)
4.9.2 Floating-Point Exception Priority
4.9.3 Typical Actions of a Floating-Point Exception Handler
Chapter 5 Instruction Set Summary
5.1 General-Purpose Instructions
5.1.1 Data Transfer Instructions
5.1.2 Binary Arithmetic Instructions
5.1.3 Decimal Arithmetic Instructions
5.1.4 Logical Instructions
5.1.5 Shift and Rotate Instructions
5.1.6 Bit and Byte Instructions
5.1.7 Control Transfer Instructions
5.1.8 String Instructions
5.1.9 I/O Instructions
5.1.10 Enter and Leave Instructions
5.1.11 Flag Control (EFLAG) Instructions
5.1.12 Segment Register Instructions
5.1.13 Miscellaneous Instructions
5.1.14 User Mode Extended Sate Save/Restore Instructions
5.1.15 Random Number Generator Instructions
5.1.16 BMI1, BMI2
5.1.16.1 Detection of VEX-encoded GPR Instructions, LZCNT and TZCNT, PREFETCHW
5.2 x87 FPU Instructions
5.2.1 x87 FPU Data Transfer Instructions
5.2.2 x87 FPU Basic Arithmetic Instructions
5.2.3 x87 FPU Comparison Instructions
5.2.4 x87 FPU Transcendental Instructions
5.2.5 x87 FPU Load Constants Instructions
5.2.6 x87 FPU Control Instructions
5.3 x87 FPU AND SIMD State Management Instructions
5.4 MMX™ Instructions
5.4.1 MMX Data Transfer Instructions
5.4.2 MMX Conversion Instructions
5.4.3 MMX Packed Arithmetic Instructions
5.4.4 MMX Comparison Instructions
5.4.5 MMX Logical Instructions
5.4.6 MMX Shift and Rotate Instructions
5.4.7 MMX State Management Instructions
5.5 SSE Instructions
5.5.1 SSE SIMD Single-Precision Floating-Point Instructions
5.5.1.1 SSE Data Transfer Instructions
5.5.1.2 SSE Packed Arithmetic Instructions
5.5.1.3 SSE Comparison Instructions
5.5.1.4 SSE Logical Instructions
5.5.1.5 SSE Shuffle and Unpack Instructions
5.5.1.6 SSE Conversion Instructions
5.5.2 SSE MXCSR State Management Instructions
5.5.3 SSE 64-Bit SIMD Integer Instructions
5.5.4 SSE Cacheability Control, Prefetch, and Instruction Ordering Instructions
5.6 SSE2 Instructions
5.6.1 SSE2 Packed and Scalar Double-Precision Floating-Point Instructions
5.6.1.1 SSE2 Data Movement Instructions
5.6.1.2 SSE2 Packed Arithmetic Instructions
5.6.1.3 SSE2 Logical Instructions
5.6.1.4 SSE2 Compare Instructions
5.6.1.5 SSE2 Shuffle and Unpack Instructions
5.6.1.6 SSE2 Conversion Instructions
5.6.2 SSE2 Packed Single-Precision Floating-Point Instructions
5.6.3 SSE2 128-Bit SIMD Integer Instructions
5.6.4 SSE2 Cacheability Control and Ordering Instructions
5.7 SSE3 Instructions
5.7.1 SSE3 x87-FP Integer Conversion Instruction
5.7.2 SSE3 Specialized 128-bit Unaligned Data Load Instruction
5.7.3 SSE3 SIMD Floating-Point Packed ADD/SUB Instructions
5.7.4 SSE3 SIMD Floating-Point Horizontal ADD/SUB Instructions
5.7.5 SSE3 SIMD Floating-Point LOAD/MOVE/DUPLICATE Instructions
5.7.6 SSE3 Agent Synchronization Instructions
5.8 Supplemental Streaming SIMD Extensions 3 (SSSE3) Instructions
5.8.1 Horizontal Addition/Subtraction
5.8.2 Packed Absolute Values
5.8.3 Multiply and Add Packed Signed and Unsigned Bytes
5.8.4 Packed Multiply High with Round and Scale
5.8.5 Packed Shuffle Bytes
5.8.6 Packed Sign
5.8.7 Packed Align Right
5.9 SSE4 Instructions
5.10 SSE4.1 Instructions
5.10.1 Dword Multiply Instructions
5.10.2 Floating-Point Dot Product Instructions
5.10.3 Streaming Load Hint Instruction
5.10.4 Packed Blending Instructions
5.10.5 Packed Integer MIN/MAX Instructions
5.10.6 Floating-Point Round Instructions with Selectable Rounding Mode
5.10.7 Insertion and Extractions from XMM Registers
5.10.8 Packed Integer Format Conversions
5.10.9 Improved Sums of Absolute Differences (SAD) for 4-Byte Blocks
5.10.10 Horizontal Search
5.10.11 Packed Test
5.10.12 Packed Qword Equality Comparisons
5.10.13 Dword Packing With Unsigned Saturation
5.11 SSE4.2 Instruction Set
5.11.1 String and Text Processing Instructions
5.11.2 Packed Comparison SIMD integer Instruction
5.12 AESNI and PCLMULQDQ
5.13 Intel® Advanced Vector Extensions (Intel® AVX)
5.14 16-bit Floating-Point Conversion
5.15 Fused-Multiply-ADD (FMA)
5.16 Intel® Advanced Vector Extensions 2 (Intel® AVX2)
5.17 Intel® Transactional Synchronization Extensions (Intel® TSX)
5.18 Intel® SHA Extensions
5.19 Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
5.20 System Instructions
5.21 64-Bit Mode Instructions
5.22 Virtual-Machine Extensions
5.23 Safer Mode Extensions
5.24 Intel® Memory Protection Extensions
5.25 Intel® Software Guard Extensions
Chapter 6 Procedure Calls, Interrupts, and Exceptions
6.1 Procedure Call Types
6.2 Stacks
6.2.1 Setting Up a Stack
6.2.2 Stack Alignment
6.2.3 Address-Size Attributes for Stack Accesses
6.2.4 Procedure Linking Information
6.2.4.1 Stack-Frame Base Pointer
6.2.4.2 Return Instruction Pointer
6.2.5 Stack Behavior in 64-Bit Mode
6.3 Calling Procedures Using CALL and RET
6.3.1 Near CALL and RET Operation
6.3.2 Far CALL and RET Operation
6.3.3 Parameter Passing
6.3.3.1 Passing Parameters Through the General-Purpose Registers
6.3.3.2 Passing Parameters on the Stack
6.3.3.3 Passing Parameters in an Argument List
6.3.4 Saving Procedure State Information
6.3.5 Calls to Other Privilege Levels
6.3.6 CALL and RET Operation Between Privilege Levels
6.3.7 Branch Functions in 64-Bit Mode
6.4 Interrupts and Exceptions
6.4.1 Call and Return Operation for Interrupt or Exception Handling Procedures
6.4.2 Calls to Interrupt or Exception Handler Tasks
6.4.3 Interrupt and Exception Handling in Real-Address Mode
6.4.4 INT n, INTO, INT3, INT1, and BOUND Instructions
6.4.5 Handling Floating-Point Exceptions
6.4.6 Interrupt and Exception Behavior in 64-Bit Mode
6.5 Procedure Calls for Block-Structured Languages
6.5.1 ENTER Instruction
6.5.2 LEAVE Instruction
Chapter 7 Programming With General-Purpose Instructions
7.1 Programming environment for GP Instructions
7.2 Programming Environment for GP Instructions in 64-Bit Mode
7.3 Summary of GP Instructions
7.3.1 Data Transfer Instructions
7.3.1.1 General Data Movement Instructions
7.3.1.2 Exchange Instructions
7.3.1.3 Exchange Instructions in 64-Bit Mode
7.3.1.4 Stack Manipulation Instructions
7.3.1.5 Stack Manipulation Instructions in 64-Bit Mode
7.3.1.6 Type Conversion Instructions
7.3.1.7 Type Conversion Instructions in 64-Bit Mode
7.3.2 Binary Arithmetic Instructions
7.3.2.1 Addition and Subtraction Instructions
7.3.2.2 Increment and Decrement Instructions
7.3.2.3 Increment and Decrement Instructions in 64-Bit Mode
7.3.2.4 Comparison and Sign Change Instructions
7.3.2.5 Multiplication and Division Instructions
7.3.3 Decimal Arithmetic Instructions
7.3.3.1 Packed BCD Adjustment Instructions
7.3.3.2 Unpacked BCD Adjustment Instructions
7.3.4 Decimal Arithmetic Instructions in 64-Bit Mode
7.3.5 Logical Instructions
7.3.6 Shift and Rotate Instructions
7.3.6.1 Shift Instructions
7.3.6.2 Double-Shift Instructions
7.3.6.3 Rotate Instructions
7.3.7 Bit and Byte Instructions
7.3.7.1 Bit Test and Modify Instructions
7.3.7.2 Bit Scan Instructions
7.3.7.3 Byte Set on Condition Instructions
7.3.7.4 Test Instruction
7.3.8 Control Transfer Instructions
7.3.8.1 Unconditional Transfer Instructions
7.3.8.2 Conditional Transfer Instructions
7.3.8.3 Control Transfer Instructions in 64-Bit Mode
7.3.8.4 Software Interrupt Instructions
7.3.8.5 Software Interrupt Instructions in 64-bit Mode and Compatibility Mode
7.3.9 String Operations
7.3.9.1 String Instructions
7.3.9.2 Repeated String Operations
7.3.9.3 Fast-String Operation
7.3.9.4 String Operations in 64-Bit Mode
7.3.10 I/O Instructions
7.3.11 I/O Instructions in 64-Bit Mode
7.3.12 Enter and Leave Instructions
7.3.13 Flag Control (EFLAG) Instructions
7.3.13.1 Carry and Direction Flag Instructions
7.3.13.2 EFLAGS Transfer Instructions
7.3.13.3 Interrupt Flag Instructions
7.3.14 Flag Control (RFLAG) Instructions in 64-Bit Mode
7.3.15 Segment Register Instructions
7.3.15.1 Segment-Register Load and Store Instructions
7.3.15.2 Far Control Transfer Instructions
7.3.15.3 Software Interrupt Instructions
7.3.15.4 Load Far Pointer Instructions
7.3.16 Miscellaneous Instructions
7.3.16.1 Address Computation Instruction
7.3.16.2 Table Lookup Instructions
7.3.16.3 Processor Identification Instruction
7.3.16.4 No-Operation and Undefined Instructions
7.3.17 Random Number Generator Instructions
7.3.17.1 RDRAND
7.3.17.2 RDSEED
Chapter 8 Programming with the x87 FPU
8.1 x87 FPU Execution Environment
8.1.1 x87 FPU in 64-Bit Mode and Compatibility Mode
8.1.2 x87 FPU Data Registers
8.1.2.1 Parameter Passing With the x87 FPU Register Stack
8.1.3 x87 FPU Status Register
8.1.3.1 Top of Stack (TOP) Pointer
8.1.3.2 Condition Code Flags
8.1.3.3 x87 FPU Floating-Point Exception Flags
8.1.3.4 Stack Fault Flag
8.1.4 Branching and Conditional Moves on Condition Codes
8.1.5 x87 FPU Control Word
8.1.5.1 x87 FPU Floating-Point Exception Mask Bits
8.1.5.2 Precision Control Field
8.1.5.3 Rounding Control Field
8.1.6 Infinity Control Flag
8.1.7 x87 FPU Tag Word
8.1.8 x87 FPU Instruction and Data (Operand) Pointers
8.1.9 Last Instruction Opcode
8.1.9.1 Fopcode Compatibility Sub-mode
8.1.10 Saving the x87 FPU’s State with FSTENV/FNSTENV and FSAVE/FNSAVE
8.1.11 Saving the x87 FPU’s State with FXSAVE
8.2 x87 FPU Data Types
8.2.1 Indefinites
8.2.2 Unsupported Double Extended-Precision Floating-Point Encodings and Pseudo-Denormals
8.3 x87 FPU Instruction Set
8.3.1 Escape (ESC) Instructions
8.3.2 x87 FPU Instruction Operands
8.3.3 Data Transfer Instructions
8.3.4 Load Constant Instructions
8.3.5 Basic Arithmetic Instructions
8.3.6 Comparison and Classification Instructions
8.3.6.1 Branching on the x87 FPU Condition Codes
8.3.7 Trigonometric Instructions
8.3.8 Approximation of Pi
8.3.9 Logarithmic, Exponential, and Scale
8.3.10 Transcendental Instruction Accuracy
8.3.11 x87 FPU Control Instructions
8.3.12 Waiting vs. Non-waiting Instructions
8.3.13 Unsupported x87 FPU Instructions
8.4 x87 FPU Floating-Point Exception Handling
8.4.1 Arithmetic vs. Non-arithmetic Instructions
8.5 x87 FPU Floating-Point Exception Conditions
8.5.1 Invalid Operation Exception
8.5.1.1 Stack Overflow or Underflow Exception (#IS)
8.5.1.2 Invalid Arithmetic Operand Exception (#IA)
8.5.2 Denormal Operand Exception (#D)
8.5.3 Divide-By-Zero Exception (#Z)
8.5.4 Numeric Overflow Exception (#O)
8.5.5 Numeric Underflow Exception (#U)
8.5.6 Inexact-Result (Precision) Exception (#P)
8.6 x87 FPU Exception Synchronization
8.7 Handling x87 FPU Exceptions in Software
8.7.1 Native Mode
8.7.2 MS-DOS Compatibility Sub-mode
8.7.3 Handling x87 FPU Exceptions in Software
Chapter 9 Programming with Intel® MMX™ Technology
9.1 Overview of MMX Technology
9.2 The MMX Technology Programming Environment
9.2.1 MMX Technology in 64-Bit Mode and Compatibility Mode
9.2.2 MMX Registers
9.2.3 MMX Data Types
9.2.4 Memory Data Formats
9.2.5 Single Instruction, Multiple Data (SIMD) Execution Model
9.3 Saturation and Wraparound Modes
9.4 MMX Instructions
9.4.1 Data Transfer Instructions
9.4.2 Arithmetic Instructions
9.4.3 Comparison Instructions
9.4.4 Conversion Instructions
9.4.5 Unpack Instructions
9.4.6 Logical Instructions
9.4.7 Shift Instructions
9.4.8 EMMS Instruction
9.5 Compatibility with x87 FPU Architecture
9.5.1 MMX Instructions and the x87 FPU Tag Word
9.6 WRITING APPLICATIONS WITH MMX CODE
9.6.1 Checking for MMX Technology Support
9.6.2 Transitions Between x87 FPU and MMX Code
9.6.3 Using the EMMS Instruction
9.6.4 Mixing MMX and x87 FPU Instructions
9.6.5 Interfacing with MMX Code
9.6.6 Using MMX Code in a Multitasking Operating System Environment
9.6.7 Exception Handling in MMX Code
9.6.8 Register Mapping
9.6.9 Effect of Instruction Prefixes on MMX Instructions
Chapter 10 Programming with Intel® Streaming SIMD Extensions (Intel® SSE)
10.1 Overview of SSE Extensions
10.2 SSE Programming Environment
10.2.1 SSE in 64-Bit Mode and Compatibility Mode
10.2.2 XMM Registers
10.2.3 MXCSR Control and Status Register
10.2.3.1 SIMD Floating-Point Mask and Flag Bits
10.2.3.2 SIMD Floating-Point Rounding Control Field
10.2.3.3 Flush-To-Zero
10.2.3.4 Denormals-Are-Zeros
10.2.4 Compatibility of SSE Extensions with SSE2/SSE3/MMX and the x87 FPU
10.3 SSE Data Types
10.4 SSE Instruction Set
10.4.1 SSE Packed and Scalar Floating-Point Instructions
10.4.1.1 SSE Data Movement Instructions
10.4.1.2 SSE Arithmetic Instructions
10.4.2 SSE Logical Instructions
10.4.2.1 SSE Comparison Instructions
10.4.2.2 SSE Shuffle and Unpack Instructions
10.4.3 SSE Conversion Instructions
10.4.4 SSE 64-Bit SIMD Integer Instructions
10.4.5 MXCSR State Management Instructions
10.4.6 Cacheability Control, Prefetch, and Memory Ordering Instructions
10.4.6.1 Cacheability Control Instructions
10.4.6.2 Caching of Temporal vs. Non-Temporal Data
10.4.6.3 PREFETCHh Instructions
10.4.6.4 SFENCE Instruction
10.5 FXSAVE and FXRSTOR Instructions
10.5.1 FXSAVE Area
10.5.1.1 x87 State
10.5.1.2 SSE State
10.5.2 Operation of FXSAVE
10.5.3 Operation of FXRSTOR
10.6 Handling SSE Instruction Exceptions
10.7 Writing Applications with the SSE Extensions
Chapter 11 Programming with Intel® Streaming SIMD Extensions 2 (Intel® SSE2)
11.1 Overview of SSE2 Extensions
11.2 SSE2 Programming Environment
11.2.1 SSE2 in 64-Bit Mode and Compatibility Mode
11.2.2 Compatibility of SSE2 Extensions with SSE, MMX Technology and x87 FPU Programming Environment
11.2.3 Denormals-Are-Zeros Flag
11.3 SSE2 Data Types
11.4 SSE2 Instructions
11.4.1 Packed and Scalar Double-Precision Floating-Point Instructions
11.4.1.1 Data Movement Instructions
11.4.1.2 SSE2 Arithmetic Instructions
11.4.1.3 SSE2 Logical Instructions
11.4.1.4 SSE2 Comparison Instructions
11.4.1.5 SSE2 Shuffle and Unpack Instructions
11.4.1.6 SSE2 Conversion Instructions
11.4.2 SSE2 64-Bit and 128-Bit SIMD Integer Instructions
11.4.3 128-Bit SIMD Integer Instruction Extensions
11.4.4 Cacheability Control and Memory Ordering Instructions
11.4.4.1 FLUSH Cache Line
11.4.4.2 Cacheability Control Instructions
11.4.4.3 Memory Ordering Instructions
11.4.4.4 Pause
11.4.5 Branch Hints
11.5 SSE, SSE2, and SSE3 Exceptions
11.5.1 SIMD Floating-Point Exceptions
11.5.2 SIMD Floating-Point Exception Conditions
11.5.2.1 Invalid Operation Exception (#I)
11.5.2.2 Denormal-Operand Exception (#D)
11.5.2.3 Divide-By-Zero Exception (#Z)
11.5.2.4 Numeric Overflow Exception (#O)
11.5.2.5 Numeric Underflow Exception (#U)
11.5.2.6 Inexact-Result (Precision) Exception (#P)
11.5.3 Generating SIMD Floating-Point Exceptions
11.5.3.1 Handling Masked Exceptions
11.5.3.2 Handling Unmasked Exceptions
11.5.3.3 Handling Combinations of Masked and Unmasked Exceptions
11.5.4 Handling SIMD Floating-Point Exceptions in Software
11.5.5 Interaction of SIMD and x87 FPU Floating-Point Exceptions
11.6 Writing Applications with SSE/SSE2 Extensions
11.6.1 General Guidelines for Using SSE/SSE2 Extensions
11.6.2 Checking for SSE/SSE2 Support
11.6.3 Checking for the DAZ Flag in the MXCSR Register
11.6.4 Initialization of SSE/SSE2 Extensions
11.6.5 Saving and Restoring the SSE/SSE2 State
11.6.6 Guidelines for Writing to the MXCSR Register
11.6.7 Interaction of SSE/SSE2 Instructions with x87 FPU and MMX Instructions
11.6.8 Compatibility of SIMD and x87 FPU Floating-Point Data Types
11.6.9 Mixing Packed and Scalar Floating-Point and 128-Bit SIMD Integer Instructions and Data
11.6.10 Interfacing with SSE/SSE2 Procedures and Functions
11.6.10.1 Passing Parameters in XMM Registers
11.6.10.2 Saving XMM Register State on a Procedure or Function Call
11.6.10.3 Caller-Save Recommendation for Procedure and Function Calls
11.6.11 Updating Existing MMX Technology Routines Using 128-Bit SIMD Integer Instructions
11.6.12 Branching on Arithmetic Operations
11.6.13 Cacheability Hint Instructions
11.6.14 Effect of Instruction Prefixes on the SSE/SSE2 Instructions
Chapter 12 Programming with Intel® SSE3, SSSE3, Intel® SSE4 and Intel® AESNI
12.1 Programming Environment and Data types
12.1.1 SSE3, SSSE3, SSE4 in 64-Bit Mode and Compatibility Mode
12.1.2 Compatibility of SSE3/SSSE3 with MMX Technology, the x87 FPU Environment, and SSE/SSE2 Extensions
12.1.3 Horizontal and Asymmetric Processing
12.2 Overview of SSE3 Instructions
12.3 SSE3 Instructions
12.3.1 x87 FPU Instruction for Integer Conversion
12.3.2 SIMD Integer Instruction for Specialized 128-bit Unaligned Data Load
12.3.3 SIMD Floating-Point Instructions That Enhance LOAD/MOVE/DUPLICATE Performance
12.3.4 SIMD Floating-Point Instructions Provide Packed Addition/Subtraction
12.3.5 SIMD Floating-Point Instructions Provide Horizontal Addition/Subtraction
12.3.6 Two Thread Synchronization Instructions
12.4 Writing Applications with SSE3 Extensions
12.4.1 Guidelines for Using SSE3 Extensions
12.4.2 Checking for SSE3 Support
12.4.3 Enable FTZ and DAZ for SIMD Floating-Point Computation
12.4.4 Programming SSE3 with SSE/SSE2 Extensions
12.5 Overview of SSSE3 Instructions
12.6 SSSE3 Instructions
12.6.1 Horizontal Addition/Subtraction
12.6.2 Packed Absolute Values
12.6.3 Multiply and Add Packed Signed and Unsigned Bytes
12.6.4 Packed Multiply High with Round and Scale
12.6.5 Packed Shuffle Bytes
12.6.6 Packed Sign
12.6.7 Packed Align Right
12.7 Writing Applications with SSSE3 Extensions
12.7.1 Guidelines for Using SSSE3 Extensions
12.7.2 Checking for SSSE3 Support
12.8 SSE3/SSSE3 And SSE4 Exceptions
12.8.1 Device Not Available (DNA) Exceptions
12.8.2 Numeric Error flag and IGNNE#
12.8.3 Emulation
12.8.4 IEEE 754 Compliance of SSE4.1 Floating-Point Instructions
12.9 SSE4 Overview
12.10 SSE4.1 Instruction Set
12.10.1 Dword Multiply Instructions
12.10.2 Floating-Point Dot Product Instructions
12.10.3 Streaming Load Hint Instruction
12.10.4 Packed Blending Instructions
12.10.5 Packed Integer MIN/MAX Instructions
12.10.6 Floating-Point Round Instructions with Selectable Rounding Mode
12.10.7 Insertion and Extractions from XMM Registers
12.10.8 Packed Integer Format Conversions
12.10.9 Improved Sums of Absolute Differences (SAD) for 4-Byte Blocks
12.10.10 Horizontal Search
12.10.11 Packed Test
12.10.12 Packed Qword Equality Comparisons
12.10.13 Dword Packing With Unsigned Saturation
12.11 SSE4.2 Instruction Set
12.11.1 String and Text Processing Instructions
12.11.1.1 Memory Operand Alignment
12.11.2 Packed Comparison SIMD Integer Instruction
12.12 Writing Applications with SSE4 Extensions
12.12.1 Guidelines for Using SSE4 Extensions
12.12.2 Checking for SSE4.1 Support
12.12.3 Checking for SSE4.2 Support
12.13 AESNI Overview
12.13.1 Little-Endian Architecture and Big-Endian Specification (FIPS 197)
12.13.1.1 AES Data Structure in Intel 64 Architecture
12.13.2 AES Transformations and Functions
12.13.3 PCLMULQDQ
12.13.4 Checking for AESNI Support
Chapter 13 Managing State Using the XSAVE Feature Set
13.1 XSAVE-Supported Features and State-Component Bitmaps
13.2 Enumeration of CPU Support for XSAVE Instructions and XSAVE- Supported Features
13.3 Enabling the XSAVE Feature Set and XSAVE-Enabled Features
13.4 XSAVE Area
13.4.1 Legacy Region of an XSAVE Area
13.4.2 XSAVE Header
13.4.3 Extended Region of an XSAVE Area
13.5 XSAVE-Managed State
13.5.1 x87 State
13.5.2 SSE State
13.5.3 AVX State
13.5.4 MPX State
13.5.5 AVX-512 State
13.5.6 PT State
13.5.7 PKRU State
13.5.8 HDC State
13.6 Processor Tracking of XSAVE-Managed State
13.7 Operation of XSAVE
13.8 Operation of XRSTOR
13.8.1 Standard Form of XRSTOR
13.8.2 Compacted Form of XRSTOR
13.8.3 XRSTOR and the Init and Modified Optimizations
13.9 Operation of XSAVEOPT
13.10 Operation of XSAVEC
13.11 Operation of XSAVES
13.12 Operation of XRSTORS
13.13 Memory Accesses by the XSAVE Feature Set
Chapter 14 Programming with AVX, FMA and AVX2
14.1 Intel AVX Overview
14.1.1 256-Bit Wide SIMD Register Support
14.1.2 Instruction Syntax Enhancements
14.1.3 VEX Prefix Instruction Encoding Support
14.2 Functional Overview
14.2.1 256-bit Floating-Point Arithmetic Processing Enhancements
14.2.2 256-bit Non-Arithmetic Instruction Enhancements
14.2.3 Arithmetic Primitives for 128-bit Vector and Scalar processing
14.2.4 Non-Arithmetic Primitives for 128-bit Vector and Scalar Processing
14.3 Detection of AVX Instructions
14.3.1 Detection of VEX-Encoded AES and VPCLMULQDQ
14.4 Half-Precision Floating-Point Conversion
14.4.1 Detection of F16C Instructions
14.5 Fused-Multiply-ADD (FMA) Extensions
14.5.1 FMA Instruction Operand Order and Arithmetic Behavior
14.5.2 Fused-Multiply-ADD (FMA) Numeric Behavior
14.5.3 Detection of FMA
14.6 Overview of Intel® Advanced Vector Extensions 2 (Intel® AVX2)
14.6.1 AVX2 and 256-bit Vector Integer Processing
14.7 Promoted Vector Integer Instructions in AVX2
14.7.1 Detection of AVX2
14.8 Accessing YMM Registers
14.9 Memory alignment
14.10 SIMD floating-point ExCeptions
14.11 Emulation
14.12 Writing AVX floating-point exception handlers
14.13 General Purpose Instruction Set Enhancements
Chapter 15 Programming with Intel® AVX-512
15.1 Overview
15.1.1 512-Bit Wide SIMD Register Support
15.1.2 32 SIMD Register Support
15.1.3 Eight Opmask Register Support
15.1.4 Instruction Syntax Enhancement
15.1.5 EVEX Instruction Encoding Support
15.2 Detection of AVX-512 Foundation Instructions
15.2.1 Additional 512-bit Instruction Extensions of the Intel AVX-512 Family
15.3 Detection of 512-bit Instruction Groups of Intel® AVX-512 Family
15.4 Detection of Intel AVX-512 Instruction Groups Operating at 256 and 128-bit Vector Lengths
15.5 Accessing XMM, YMM AND ZMM Registers
15.6 Enhanced Vector Programming Environment Using EVEX Encoding
15.6.1 OPMASK Register to Predicate Vector Data Processing
15.6.1.1 Opmask Register K0
15.6.1.2 Example of Opmask Usages
15.6.2 OpMask Instructions
15.6.3 Broadcast
15.6.4 Static Rounding Mode and Suppress All Exceptions
15.6.5 Compressed Disp8N Encoding
15.7 Memory Alignment
15.8 SIMD Floating-Point Exceptions
15.9 Instruction Exception Specification
15.10 Emulation
15.11 Writing floating-point exception handlers
Chapter 16 Programming with Intel® Transactional Synchronization Extensions
16.1 Overview
16.2 Intel® Transactional Synchronization Extensions
16.2.1 HLE Software Interface
16.2.2 RTM Software Interface
16.3 Intel® TSX Application Programming Model
16.3.1 Detection of Transactional Synchronization Support
16.3.1.1 Detection of HLE Support
16.3.1.2 Detection of RTM Support
16.3.1.3 Detection of XTEST Instruction
16.3.2 Querying Transactional Execution Status
16.3.3 Requirements for HLE Locks
16.3.4 Transactional Nesting
16.3.4.1 HLE Nesting and Elision
16.3.4.2 RTM Nesting
16.3.4.3 Nesting HLE and RTM
16.3.5 RTM Abort Status Definition
16.3.6 RTM Memory Ordering
16.3.7 RTM-Enabled Debugger Support
16.3.8 Programming Considerations
16.3.8.1 Instruction Based Considerations
16.3.8.2 Runtime Considerations
Chapter 17 Intel® Memory Protection Extensions
17.1 Intel® Memory Protection Extensions (Intel® MPX)
17.2 Introduction
17.3 Intel MPX Programming Environment
17.3.1 Detection and Enumeration of Intel MPX Interfaces
17.3.2 Bounds Registers
17.3.3 Configuration and Status Registers
17.3.4 Read and Write of IA32_BNDCFGS
17.4 Intel MPX Instruction Summary
17.4.1 Instruction Encoding
17.4.2 Usage and Examples
17.4.3 Loading and Storing Bounds in Memory
17.4.3.1 BNDLDX and BNDSTX in 64-Bit Mode
17.4.3.2 BNDLDX and BNDSTX Outside 64-Bit Mode
17.5 Interactions with Intel MPX
17.5.1 Intel MPX and Operating Modes
17.5.2 Intel MPX Support for Pointer Operations with Branching
17.5.3 CALL, RET, JMP and All Jcc
17.5.4 BOUND Instruction and Intel MPX
17.5.5 Programming Considerations
17.5.6 Intel MPX and System Management Mode
17.5.7 Support of Intel MPX in VMCS
17.5.8 Support of Intel MPX in Intel TSX
Chapter 18 Input/Output
18.1 I/O Port Addressing
18.2 I/O Port Hardware
18.3 I/O Address Space
18.3.1 Memory-Mapped I/O
18.4 I/O Instructions
18.5 Protected-Mode I/O
18.5.1 I/O Privilege Level
18.5.2 I/O Permission Bit Map
18.6 Ordering I/O
Chapter 19 Processor Identification and Feature Determination
19.1 Using the CPUID Instruction
19.1.1 Notes on Where to Start
19.1.2 Identification of Earlier IA-32 Processors
Appendix A EFLAGS Cross-Reference
A.1 EFLAGS and Instructions
Appendix B EFLAGS Condition Codes
B.1 Condition Codes
Appendix C Floating-Point Exceptions Summary
C.1 Overview
C.2 x87 FPU Instructions
C.3 SSE Instructions
C.4 SSE2 Instructions
C.5 SSE3 Instructions
C.6 SSSE3 Instructions
C.7 SSE4 Instructions
Appendix D Guidelines for Writing x87 FPU Exception Handlers
D.1 MS-DOS Compatibility Sub-mode for Handling x87 FPU Exceptions
D.2 Implementation of the MS-DOS Compatibility Sub-mode in the Intel486™, Pentium®, and P6 Processor Family, and Pentium® 4 Processors
D.2.1 MS-DOS Compatibility Sub-mode in the Intel486™ and Pentium® Processors
D.2.1.1 Basic Rules: When FERR# Is Generated
D.2.1.2 Recommended External Hardware to Support the MS-DOS Compatibility Sub-mode
D.2.1.3 No-Wait x87 FPU Instructions Can Get x87 FPU Interrupt in Window
D.2.2 MS-DOS Compatibility Sub-mode in the P6 Family and Pentium® 4 Processors
D.3 Recommended Protocol for MS-DOS Compatibility Handlers
D.3.1 Floating-Point Exceptions and Their Defaults
D.3.2 Two Options for Handling Numeric Exceptions
D.3.2.1 Automatic Exception Handling: Using Masked Exceptions
D.3.2.2 Software Exception Handling
D.3.3 Synchronization Required for Use of x87 FPU Exception Handlers
D.3.3.1 Exception Synchronization: What, Why, and When
D.3.3.2 Exception Synchronization Examples
D.3.3.3 Proper Exception Synchronization
D.3.4 x87 FPU Exception Handling Examples
D.3.5 Need for Storing State of IGNNE# Circuit If Using x87 FPU and SMM
D.3.6 Considerations When x87 FPU Shared Between Tasks
D.3.6.1 Speculatively Deferring x87 FPU Saves, General Overview
D.3.6.2 Tracking x87 FPU Ownership
D.3.6.3 Interaction of x87 FPU State Saves and Floating-Point Exception Association
D.3.6.4 Interrupt Routing From the Kernel
D.3.6.5 Special Considerations for Operating Systems that Support Streaming SIMD Extensions
D.4 Differences For Handlers Using Native Mode
D.4.1 Origin with the Intel 286 and Intel 287, and Intel386 and Intel 387 Processors
D.4.2 Changes with Intel486, Pentium and Pentium Pro Processors with CR0.NE[bit 5] = 1
D.4.3 Considerations When x87 FPU Shared Between Tasks Using Native Mode
Appendix E Guidelines for Writing SIMD Floating-Point Exception Handlers
E.1 Two Options for Handling Floating-Point Exceptions
E.2 Software Exception Handling
E.3 Exception Synchronization
E.4 SIMD Floating-Point Exceptions and the IEEE Standard 754
E.4.1 Floating-Point Emulation
E.4.2 SSE/SSE2/SSE3 Response To Floating-Point Exceptions
E.4.2.1 Numeric Exceptions
E.4.2.2 Results of Operations with NaN Operands or a NaN Result for SSE/SSE2/SSE3 Numeric Instructions
E.4.2.3 Condition Codes, Exception Flags, and Response for Masked and Unmasked Numeric Exceptions
E.4.3 Example SIMD Floating-Point Emulation Implementation
Volume 2 (2A, 2B, 2C & 2D):Instruction Set Reference, A-Z
Chapter 1 About This Manual
1.1 Intel® 64 and IA-32 Processors Covered in this Manual
1.2 Overview of Volume 2A, 2B, 2C and 2D: Instruction Set Reference
1.3 Notational Conventions
1.3.1 Bit and Byte Order
1.3.2 Reserved Bits and Software Compatibility
1.3.3 Instruction Operands
1.3.4 Hexadecimal and Binary Numbers
1.3.5 Segmented Addressing
1.3.6 Exceptions
1.3.7 A New Syntax for CPUID, CR, and MSR Values
1.4 Related Literature
Chapter 2 Instruction Format
2.1 Instruction Format for Protected Mode, real-address Mode, and virtual-8086 mode
2.1.1 Instruction Prefixes
2.1.2 Opcodes
2.1.3 ModR/M and SIB Bytes
2.1.4 Displacement and Immediate Bytes
2.1.5 Addressing-Mode Encoding of ModR/M and SIB Bytes
2.2 IA-32e Mode
2.2.1 REX Prefixes
2.2.1.1 Encoding
2.2.1.2 More on REX Prefix Fields
2.2.1.3 Displacement
2.2.1.4 Direct Memory-Offset MOVs
2.2.1.5 Immediates
2.2.1.6 RIP-Relative Addressing
2.2.1.7 Default 64-Bit Operand Size
2.2.2 Additional Encodings for Control and Debug Registers
2.3 Intel® Advanced Vector Extensions (Intel® AVX)
2.3.1 Instruction Format
2.3.2 VEX and the LOCK prefix
2.3.3 VEX and the 66H, F2H, and F3H prefixes
2.3.4 VEX and the REX prefix
2.3.5 The VEX Prefix
2.3.5.1 VEX Byte 0, bits[7:0]
2.3.5.2 VEX Byte 1, bit [7] - ‘R’
2.3.5.3 3-byte VEX byte 1, bit[6] - ‘X’
2.3.5.4 3-byte VEX byte 1, bit[5] - ‘B’
2.3.5.5 3-byte VEX byte 2, bit[7] - ‘W’
2.3.5.6 2-byte VEX Byte 1, bits[6:3] and 3-byte VEX Byte 2, bits [6:3]- ‘vvvv’ the Source or Dest Register Specifier
2.3.6 Instruction Operand Encoding and VEX.vvvv, ModR/M
2.3.6.1 3-byte VEX byte 1, bits[4:0] - “m-mmmm”
2.3.6.2 2-byte VEX byte 1, bit[2], and 3-byte VEX byte 2, bit [2]- “L”
2.3.6.3 2-byte VEX byte 1, bits[1:0], and 3-byte VEX byte 2, bits [1:0]- “pp”
2.3.7 The Opcode Byte
2.3.8 The MODRM, SIB, and Displacement Bytes
2.3.9 The Third Source Operand (Immediate Byte)
2.3.10 AVX Instructions and the Upper 128-bits of YMM registers
2.3.10.1 Vector Length Transition and Programming Considerations
2.3.11 AVX Instruction Length
2.3.12 Vector SIB (VSIB) Memory Addressing
2.3.12.1 64-bit Mode VSIB Memory Addressing
2.4 AVX and SSE Instruction Exception Specification
2.4.1 Exceptions Type 1 (Aligned memory reference)
2.4.2 Exceptions Type 2 (>=16 Byte Memory Reference, Unaligned)
2.4.3 Exceptions Type 3 (<16 Byte memory argument)
2.4.4 Exceptions Type 4 (>=16 Byte mem arg no alignment, no floating-point exceptions)
2.4.5 Exceptions Type 5 (<16 Byte mem arg and no FP exceptions)
2.4.6 Exceptions Type 6 (VEX-Encoded Instructions Without Legacy SSE Analogues)
2.4.7 Exceptions Type 7 (No FP exceptions, no memory arg)
2.4.8 Exceptions Type 8 (AVX and no memory argument)
2.4.9 Exceptions Type 11 (VEX-only, mem arg no AC, floating-point exceptions)
2.4.10 Exceptions Type 12 (VEX-only, VSIB mem arg, no AC, no floating-point exceptions)
2.5 VEX Encoding Support for GPR Instructions
2.5.1 Exceptions Type 13 (VEX-Encoded GPR Instructions)
2.6 Intel® AVX-512 Encoding
2.6.1 Instruction Format and EVEX
2.6.2 Register Specifier Encoding and EVEX
2.6.3 Opmask Register Encoding
2.6.4 Masking Support in EVEX
2.6.5 Compressed Displacement (disp8N) Support in EVEX
2.6.6 EVEX Encoding of Broadcast/Rounding/SAE Support
2.6.7 Embedded Broadcast Support in EVEX
2.6.8 Static Rounding Support in EVEX
2.6.9 SAE Support in EVEX
2.6.10 Vector Length Orthogonality
2.6.11 #UD Equations for EVEX
2.6.11.1 State Dependent #UD
2.6.11.2 Opcode Independent #UD
2.6.11.3 Opcode Dependent #UD
2.6.12 Device Not Available
2.6.13 Scalar Instructions
2.7 Exception Classifications of EVEX-Encoded instructions
2.7.1 Exceptions Type E1 and E1NF of EVEX-Encoded Instructions
2.7.2 Exceptions Type E2 of EVEX-Encoded Instructions
2.7.3 Exceptions Type E3 and E3NF of EVEX-Encoded Instructions
2.7.4 Exceptions Type E4 and E4NF of EVEX-Encoded Instructions
2.7.5 Exceptions Type E5 and E5NF
2.7.6 Exceptions Type E6 and E6NF
2.7.7 Exceptions Type E7NM
2.7.8 Exceptions Type E9 and E9NF
2.7.9 Exceptions Type E10 and E10NF
2.7.10 Exception Type E11 (EVEX-only, mem arg no AC, floating-point exceptions)
2.7.11 Exception Type E12 and E12NP (VSIB mem arg, no AC, no floating-point exceptions)
2.8 Exception Classifications of Opmask instructions
Chapter 3 Instruction Set Reference, A-L
3.1 Interpreting the Instruction Reference Pages
3.1.1 Instruction Format
3.1.1.1 Opcode Column in the Instruction Summary Table (Instructions without VEX Prefix)
3.1.1.2 Opcode Column in the Instruction Summary Table (Instructions with VEX prefix)
3.1.1.3 Instruction Column in the Opcode Summary Table
3.1.1.4 Operand Encoding Column in the Instruction Summary Table
3.1.1.5 64/32-bit Mode Column in the Instruction Summary Table
3.1.1.6 CPUID Support Column in the Instruction Summary Table
3.1.1.7 Description Column in the Instruction Summary Table
3.1.1.8 Description Section
3.1.1.9 Operation Section
3.1.1.10 Intel® C/C++ Compiler Intrinsics Equivalents Section
3.1.1.11 Flags Affected Section
3.1.1.12 FPU Flags Affected Section
3.1.1.13 Protected Mode Exceptions Section
3.1.1.14 Real-Address Mode Exceptions Section
3.1.1.15 Virtual-8086 Mode Exceptions Section
3.1.1.16 Floating-Point Exceptions Section
3.1.1.17 SIMD Floating-Point Exceptions Section
3.1.1.18 Compatibility Mode Exceptions Section
3.1.1.19 64-Bit Mode Exceptions Section
3.2 Instructions (A-L)
AAA—ASCII Adjust After Addition
AAD—ASCII Adjust AX Before Division
AAM—ASCII Adjust AX After Multiply
AAS—ASCII Adjust AL After Subtraction
ADC—Add with Carry
ADCX — Unsigned Integer Addition of Two Operands with Carry Flag
ADD—Add
ADDPD—Add Packed Double-Precision Floating-Point Values
ADDPS—Add Packed Single-Precision Floating-Point Values
ADDSD—Add Scalar Double-Precision Floating-Point Values
ADDSS—Add Scalar Single-Precision Floating-Point Values
ADDSUBPD—Packed Double-FP Add/Subtract
ADDSUBPS—Packed Single-FP Add/Subtract
ADOX — Unsigned Integer Addition of Two Operands with Overflow Flag
AESDEC—Perform One Round of an AES Decryption Flow
AESDECLAST—Perform Last Round of an AES Decryption Flow
AESENC—Perform One Round of an AES Encryption Flow
AESENCLAST—Perform Last Round of an AES Encryption Flow
AESIMC—Perform the AES InvMixColumn Transformation
AESKEYGENASSIST—AES Round Key Generation Assist
AND—Logical AND
ANDN — Logical AND NOT
ANDPD—Bitwise Logical AND of Packed Double Precision Floating-Point Values
ANDPS—Bitwise Logical AND of Packed Single Precision Floating-Point Values
ANDNPD—Bitwise Logical AND NOT of Packed Double Precision Floating-Point Values
ANDNPS—Bitwise Logical AND NOT of Packed Single Precision Floating-Point Values
ARPL—Adjust RPL Field of Segment Selector
BEXTR — Bit Field Extract
BLENDPD — Blend Packed Double Precision Floating-Point Values
BLENDPS — Blend Packed Single Precision Floating-Point Values
BLENDVPD — Variable Blend Packed Double Precision Floating-Point Values
BLENDVPS — Variable Blend Packed Single Precision Floating-Point Values
BLSI — Extract Lowest Set Isolated Bit
BLSMSK — Get Mask Up to Lowest Set Bit
BLSR — Reset Lowest Set Bit
BNDCL—Check Lower Bound
BNDCU/BNDCN—Check Upper Bound
BNDLDX—Load Extended Bounds Using Address Translation
BNDMK—Make Bounds
BNDMOV—Move Bounds
BNDSTX—Store Extended Bounds Using Address Translation
BOUND—Check Array Index Against Bounds
BSF—Bit Scan Forward
BSR—Bit Scan Reverse
BSWAP—Byte Swap
BT—Bit Test
BTC—Bit Test and Complement
BTR—Bit Test and Reset
BTS—Bit Test and Set
BZHI — Zero High Bits Starting with Specified Bit Position
CALL—Call Procedure
CBW/CWDE/CDQE—Convert Byte to Word/Convert Word to Doubleword/Convert Doubleword to Quadword
CLAC—Clear AC Flag in EFLAGS Register
CLC—Clear Carry Flag
CLD—Clear Direction Flag
CLDEMOTE—Cache Line Demote
CLFLUSH—Flush Cache Line
CLFLUSHOPT—Flush Cache Line Optimized
CLI — Clear Interrupt Flag
CLTS—Clear Task-Switched Flag in CR0
CLWB—Cache Line Write Back
CMC—Complement Carry Flag
CMOVcc—Conditional Move
CMP—Compare Two Operands
CMPPD—Compare Packed Double-Precision Floating-Point Values
CMPPS—Compare Packed Single-Precision Floating-Point Values
CMPS/CMPSB/CMPSW/CMPSD/CMPSQ—Compare String Operands
CMPSD—Compare Scalar Double-Precision Floating-Point Value
CMPSS—Compare Scalar Single-Precision Floating-Point Value
CMPXCHG—Compare and Exchange
CMPXCHG8B/CMPXCHG16B—Compare and Exchange Bytes
COMISD—Compare Scalar Ordered Double-Precision Floating-Point Values and Set EFLAGS
COMISS—Compare Scalar Ordered Single-Precision Floating-Point Values and Set EFLAGS
CPUID—CPU Identification
CRC32 — Accumulate CRC32 Value
CVTDQ2PD—Convert Packed Doubleword Integers to Packed Double-Precision Floating-Point Values
CVTDQ2PS—Convert Packed Doubleword Integers to Packed Single-Precision Floating-Point Values
CVTPD2DQ—Convert Packed Double-Precision Floating-Point Values to Packed Doubleword Integers
CVTPD2PI—Convert Packed Double-Precision FP Values to Packed Dword Integers
CVTPD2PS—Convert Packed Double-Precision Floating-Point Values to Packed Single-Precision Floating-Point Values
CVTPI2PD—Convert Packed Dword Integers to Packed Double-Precision FP Values
CVTPI2PS—Convert Packed Dword Integers to Packed Single-Precision FP Values
CVTPS2DQ—Convert Packed Single-Precision Floating-Point Values to Packed Signed Doubleword Integer Values
CVTPS2PD—Convert Packed Single-Precision Floating-Point Values to Packed Double-Precision Floating-Point Values
CVTPS2PI—Convert Packed Single-Precision FP Values to Packed Dword Integers
CVTSD2SI—Convert Scalar Double-Precision Floating-Point Value to Doubleword Integer
CVTSD2SS—Convert Scalar Double-Precision Floating-Point Value to Scalar Single-Precision Floating-Point Value
CVTSI2SD—Convert Doubleword Integer to Scalar Double-Precision Floating-Point Value
CVTSI2SS—Convert Doubleword Integer to Scalar Single-Precision Floating-Point Value
CVTSS2SD—Convert Scalar Single-Precision Floating-Point Value to Scalar Double-Precision Floating-Point Value
CVTSS2SI—Convert Scalar Single-Precision Floating-Point Value to Doubleword Integer
CVTTPD2DQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Doubleword Integers
CVTTPD2PI—Convert with Truncation Packed Double-Precision FP Values to Packed Dword Integers
CVTTPS2DQ—Convert with Truncation Packed Single-Precision Floating-Point Values to Packed Signed Doubleword Integer Values
CVTTPS2PI—Convert with Truncation Packed Single-Precision FP Values to Packed Dword Integers
CVTTSD2SI—Convert with Truncation Scalar Double-Precision Floating-Point Value to Signed Integer
CVTTSS2SI—Convert with Truncation Scalar Single-Precision Floating-Point Value to Integer
CWD/CDQ/CQO—Convert Word to Doubleword/Convert Doubleword to Quadword
DAA—Decimal Adjust AL after Addition
DAS—Decimal Adjust AL after Subtraction
DEC—Decrement by 1
DIV—Unsigned Divide
DIVPD—Divide Packed Double-Precision Floating-Point Values
DIVPS—Divide Packed Single-Precision Floating-Point Values
DIVSD—Divide Scalar Double-Precision Floating-Point Value
DIVSS—Divide Scalar Single-Precision Floating-Point Values
DPPD — Dot Product of Packed Double Precision Floating-Point Values
DPPS — Dot Product of Packed Single Precision Floating-Point Values
EMMS—Empty MMX Technology State
ENTER—Make Stack Frame for Procedure Parameters
EXTRACTPS—Extract Packed Floating-Point Values
F2XM1—Compute 2x–1
FABS—Absolute Value
FADD/FADDP/FIADD—Add
FBLD—Load Binary Coded Decimal
FBSTP—Store BCD Integer and Pop
FCHS—Change Sign
FCLEX/FNCLEX—Clear Exceptions
FCMOVcc—Floating-Point Conditional Move
FCOM/FCOMP/FCOMPP—Compare Floating Point Values
FCOMI/FCOMIP/ FUCOMI/FUCOMIP—Compare Floating Point Values and Set EFLAGS
FCOS— Cosine
FDECSTP—Decrement Stack-Top Pointer
FDIV/FDIVP/FIDIV—Divide
FDIVR/FDIVRP/FIDIVR—Reverse Divide
FFREE—Free Floating-Point Register
FICOM/FICOMP—Compare Integer
FILD—Load Integer
FINCSTP—Increment Stack-Top Pointer
FINIT/FNINIT—Initialize Floating-Point Unit
FIST/FISTP—Store Integer
FISTTP—Store Integer with Truncation
FLD—Load Floating Point Value
FLD1/FLDL2T/FLDL2E/FLDPI/FLDLG2/FLDLN2/FLDZ—Load Constant
FLDCW—Load x87 FPU Control Word
FLDENV—Load x87 FPU Environment
FMUL/FMULP/FIMUL—Multiply
FNOP—No Operation
FPATAN—Partial Arctangent
FPREM—Partial Remainder
FPREM1—Partial Remainder
FPTAN—Partial Tangent
FRNDINT—Round to Integer
FRSTOR—Restore x87 FPU State
FSAVE/FNSAVE—Store x87 FPU State
FSCALE—Scale
FSIN—Sine
FSINCOS—Sine and Cosine
FSQRT—Square Root
FST/FSTP—Store Floating Point Value
FSTCW/FNSTCW—Store x87 FPU Control Word
FSTENV/FNSTENV—Store x87 FPU Environment
FSTSW/FNSTSW—Store x87 FPU Status Word
FSUB/FSUBP/FISUB—Subtract
FSUBR/FSUBRP/FISUBR—Reverse Subtract
FTST—TEST
FUCOM/FUCOMP/FUCOMPP—Unordered Compare Floating Point Values
FXAM—Examine Floating-Point
FXCH—Exchange Register Contents
FXRSTOR—Restore x87 FPU, MMX, XMM, and MXCSR State
FXSAVE—Save x87 FPU, MMX Technology, and SSE State
FXTRACT—Extract Exponent and Significand
FYL2X—Compute y * log2x
FYL2XP1—Compute y * log2(x +1)
GF2P8AFFINEINVQB — Galois Field Affine Transformation Inverse
GF2P8AFFINEQB — Galois Field Affine Transformation
GF2P8MULB — Galois Field Multiply Bytes
HADDPD—Packed Double-FP Horizontal Add
HADDPS—Packed Single-FP Horizontal Add
HLT—Halt
HSUBPD—Packed Double-FP Horizontal Subtract
HSUBPS—Packed Single-FP Horizontal Subtract
IDIV—Signed Divide
IMUL—Signed Multiply
IN—Input from Port
INC—Increment by 1
INS/INSB/INSW/INSD—Input from Port to String
INSERTPS—Insert Scalar Single-Precision Floating-Point Value
INT n/INTO/INT3/INT1—Call to Interrupt Procedure
INVD—Invalidate Internal Caches
INVLPG—Invalidate TLB Entries
INVPCID—Invalidate Process-Context Identifier
IRET/IRETD—Interrupt Return
Jcc—Jump if Condition Is Met
JMP—Jump
KADDW/KADDB/KADDQ/KADDD—ADD Two Masks
KANDW/KANDB/KANDQ/KANDD—Bitwise Logical AND Masks
KANDNW/KANDNB/KANDNQ/KANDND—Bitwise Logical AND NOT Masks
KMOVW/KMOVB/KMOVQ/KMOVD—Move from and to Mask Registers
KNOTW/KNOTB/KNOTQ/KNOTD—NOT Mask Register
KORW/KORB/KORQ/KORD—Bitwise Logical OR Masks
KORTESTW/KORTESTB/KORTESTQ/KORTESTD—OR Masks And Set Flags
KSHIFTLW/KSHIFTLB/KSHIFTLQ/KSHIFTLD—Shift Left Mask Registers
KSHIFTRW/KSHIFTRB/KSHIFTRQ/KSHIFTRD—Shift Right Mask Registers
KTESTW/KTESTB/KTESTQ/KTESTD—Packed Bit Test Masks and Set Flags
KUNPCKBW/KUNPCKWD/KUNPCKDQ—Unpack for Mask Registers
KXNORW/KXNORB/KXNORQ/KXNORD—Bitwise Logical XNOR Masks
KXORW/KXORB/KXORQ/KXORD—Bitwise Logical XOR Masks
LAHF—Load Status Flags into AH Register
LAR—Load Access Rights Byte
LDDQU—Load Unaligned Integer 128 Bits
LDMXCSR—Load MXCSR Register
LDS/LES/LFS/LGS/LSS—Load Far Pointer
LEA—Load Effective Address
LEAVE—High Level Procedure Exit
LFENCE—Load Fence
LGDT/LIDT—Load Global/Interrupt Descriptor Table Register
LLDT—Load Local Descriptor Table Register
LMSW—Load Machine Status Word
LOCK—Assert LOCK# Signal Prefix
LODS/LODSB/LODSW/LODSD/LODSQ—Load String
LOOP/LOOPcc—Loop According to ECX Counter
LSL—Load Segment Limit
LTR—Load Task Register
LZCNT— Count the Number of Leading Zero Bits
Chapter 4 Instruction Set Reference, M-U
4.1 Imm8 Control Byte Operation for PCMPESTRI / PCMPESTRM / PCMPISTRI / PCMPISTRM
4.1.1 General Description
4.1.2 Source Data Format
4.1.3 Aggregation Operation
4.1.4 Polarity
4.1.5 Output Selection
4.1.6 Valid/Invalid Override of Comparisons
4.1.7 Summary of Im8 Control byte
4.1.8 Diagram Comparison and Aggregation Process
4.2 Common Transformation and Primitive Functions for SHA1XXX and SHA256XXX
4.3 Instructions (M-U)
MASKMOVDQU—Store Selected Bytes of Double Quadword
MASKMOVQ—Store Selected Bytes of Quadword
MAXPD—Maximum of Packed Double-Precision Floating-Point Values
MAXPS—Maximum of Packed Single-Precision Floating-Point Values
MAXSD—Return Maximum Scalar Double-Precision Floating-Point Value
MAXSS—Return Maximum Scalar Single-Precision Floating-Point Value
MFENCE—Memory Fence
MINPD—Minimum of Packed Double-Precision Floating-Point Values
MINPS—Minimum of Packed Single-Precision Floating-Point Values
MINSD—Return Minimum Scalar Double-Precision Floating-Point Value
MINSS—Return Minimum Scalar Single-Precision Floating-Point Value
MONITOR—Set Up Monitor Address
MOV—Move
MOV—Move to/from Control Registers
MOV—Move to/from Debug Registers
MOVAPD—Move Aligned Packed Double-Precision Floating-Point Values
MOVAPS—Move Aligned Packed Single-Precision Floating-Point Values
MOVBE—Move Data After Swapping Bytes
MOVD/MOVQ—Move Doubleword/Move Quadword
MOVDDUP—Replicate Double FP Values
MOVDIRI—Move Doubleword as Direct Store
MOVDIR64B—Move 64 Bytes as Direct Store
MOVDQA,VMOVDQA32/64—Move Aligned Packed Integer Values
MOVDQU,VMOVDQU8/16/32/64—Move Unaligned Packed Integer Values
MOVDQ2Q—Move Quadword from XMM to MMX Technology Register
MOVHLPS—Move Packed Single-Precision Floating-Point Values High to Low
MOVHPD—Move High Packed Double-Precision Floating-Point Value
MOVHPS—Move High Packed Single-Precision Floating-Point Values
MOVLHPS—Move Packed Single-Precision Floating-Point Values Low to High
MOVLPD—Move Low Packed Double-Precision Floating-Point Value
MOVLPS—Move Low Packed Single-Precision Floating-Point Values
MOVMSKPD—Extract Packed Double-Precision Floating-Point Sign Mask
MOVMSKPS—Extract Packed Single-Precision Floating-Point Sign Mask
MOVNTDQA—Load Double Quadword Non-Temporal Aligned Hint
MOVNTDQ—Store Packed Integers Using Non-Temporal Hint
MOVNTI—Store Doubleword Using Non-Temporal Hint
MOVNTPD—Store Packed Double-Precision Floating-Point Values Using Non-Temporal Hint
MOVNTPS—Store Packed Single-Precision Floating-Point Values Using Non-Temporal Hint
MOVNTQ—Store of Quadword Using Non-Temporal Hint
MOVQ—Move Quadword
MOVQ2DQ—Move Quadword from MMX Technology to XMM Register
MOVS/MOVSB/MOVSW/MOVSD/MOVSQ—Move Data from String to String
MOVSD—Move or Merge Scalar Double-Precision Floating-Point Value
MOVSHDUP—Replicate Single FP Values
MOVSLDUP—Replicate Single FP Values
MOVSS—Move or Merge Scalar Single-Precision Floating-Point Value
MOVSX/MOVSXD—Move with Sign-Extension
MOVUPD—Move Unaligned Packed Double-Precision Floating-Point Values
MOVUPS—Move Unaligned Packed Single-Precision Floating-Point Values
MOVZX—Move with Zero-Extend
MPSADBW — Compute Multiple Packed Sums of Absolute Difference
MUL—Unsigned Multiply
MULPD—Multiply Packed Double-Precision Floating-Point Values
MULPS—Multiply Packed Single-Precision Floating-Point Values
MULSD—Multiply Scalar Double-Precision Floating-Point Value
MULSS—Multiply Scalar Single-Precision Floating-Point Values
MULX — Unsigned Multiply Without Affecting Flags
MWAIT—Monitor Wait
NEG—Two's Complement Negation
NOP—No Operation
NOT—One's Complement Negation
OR—Logical Inclusive OR
ORPD—Bitwise Logical OR of Packed Double Precision Floating-Point Values
ORPS—Bitwise Logical OR of Packed Single Precision Floating-Point Values
OUT—Output to Port
OUTS/OUTSB/OUTSW/OUTSD—Output String to Port
PABSB/PABSW/PABSD/PABSQ — Packed Absolute Value
PACKSSWB/PACKSSDW—Pack with Signed Saturation
PACKUSDW—Pack with Unsigned Saturation
PACKUSWB—Pack with Unsigned Saturation
PADDB/PADDW/PADDD/PADDQ—Add Packed Integers
PADDSB/PADDSW—Add Packed Signed Integers with Signed Saturation
PADDUSB/PADDUSW—Add Packed Unsigned Integers with Unsigned Saturation
PALIGNR — Packed Align Right
PAND—Logical AND
PANDN—Logical AND NOT
PAUSE—Spin Loop Hint
PAVGB/PAVGW—Average Packed Integers
PBLENDVB — Variable Blend Packed Bytes
PBLENDW — Blend Packed Words
PCLMULQDQ — Carry-Less Multiplication Quadword
PCMPEQB/PCMPEQW/PCMPEQD— Compare Packed Data for Equal
PCMPEQQ — Compare Packed Qword Data for Equal
PCMPESTRI — Packed Compare Explicit Length Strings, Return Index
PCMPESTRM — Packed Compare Explicit Length Strings, Return Mask
PCMPGTB/PCMPGTW/PCMPGTD—Compare Packed Signed Integers for Greater Than
PCMPGTQ — Compare Packed Data for Greater Than
PCMPISTRI — Packed Compare Implicit Length Strings, Return Index
PCMPISTRM — Packed Compare Implicit Length Strings, Return Mask
PDEP — Parallel Bits Deposit
PEXT — Parallel Bits Extract
PEXTRB/PEXTRD/PEXTRQ — Extract Byte/Dword/Qword
PEXTRW—Extract Word
PHADDW/PHADDD — Packed Horizontal Add
PHADDSW — Packed Horizontal Add and Saturate
PHMINPOSUW — Packed Horizontal Word Minimum
PHSUBW/PHSUBD — Packed Horizontal Subtract
PHSUBSW — Packed Horizontal Subtract and Saturate
PINSRB/PINSRD/PINSRQ — Insert Byte/Dword/Qword
PINSRW—Insert Word
PMADDUBSW — Multiply and Add Packed Signed and Unsigned Bytes
PMADDWD—Multiply and Add Packed Integers
PMAXSB/PMAXSW/PMAXSD/PMAXSQ—Maximum of Packed Signed Integers
PMAXUB/PMAXUW—Maximum of Packed Unsigned Integers
PMAXUD/PMAXUQ—Maximum of Packed Unsigned Integers
PMINSB/PMINSW—Minimum of Packed Signed Integers
PMINSD/PMINSQ—Minimum of Packed Signed Integers
PMINUB/PMINUW—Minimum of Packed Unsigned Integers
PMINUD/PMINUQ—Minimum of Packed Unsigned Integers
PMOVMSKB—Move Byte Mask
PMOVSX—Packed Move with Sign Extend
PMOVZX—Packed Move with Zero Extend
PMULDQ—Multiply Packed Doubleword Integers
PMULHRSW — Packed Multiply High with Round and Scale
PMULHUW—Multiply Packed Unsigned Integers and Store High Result
PMULHW—Multiply Packed Signed Integers and Store High Result
PMULLD/PMULLQ—Multiply Packed Integers and Store Low Result
PMULLW—Multiply Packed Signed Integers and Store Low Result
PMULUDQ—Multiply Packed Unsigned Doubleword Integers
POP—Pop a Value from the Stack
POPA/POPAD—Pop All General-Purpose Registers
POPCNT — Return the Count of Number of Bits Set to 1
POPF/POPFD/POPFQ—Pop Stack into EFLAGS Register
POR—Bitwise Logical OR
PREFETCHh—Prefetch Data Into Caches
PREFETCHW—Prefetch Data into Caches in Anticipation of a Write
PSADBW—Compute Sum of Absolute Differences
PSHUFB — Packed Shuffle Bytes
PSHUFD—Shuffle Packed Doublewords
PSHUFHW—Shuffle Packed High Words
PSHUFLW—Shuffle Packed Low Words
PSHUFW—Shuffle Packed Words
PSIGNB/PSIGNW/PSIGND — Packed SIGN
PSLLDQ—Shift Double Quadword Left Logical
PSLLW/PSLLD/PSLLQ—Shift Packed Data Left Logical
PSRAW/PSRAD/PSRAQ—Shift Packed Data Right Arithmetic
PSRLDQ—Shift Double Quadword Right Logical
PSRLW/PSRLD/PSRLQ—Shift Packed Data Right Logical
PSUBB/PSUBW/PSUBD—Subtract Packed Integers
PSUBQ—Subtract Packed Quadword Integers
PSUBSB/PSUBSW—Subtract Packed Signed Integers with Signed Saturation
PSUBUSB/PSUBUSW—Subtract Packed Unsigned Integers with Unsigned Saturation
PTEST- Logical Compare
PTWRITE - Write Data to a Processor Trace Packet
PUNPCKHBW/PUNPCKHWD/PUNPCKHDQ/PUNPCKHQDQ— Unpack High Data
PUNPCKLBW/PUNPCKLWD/PUNPCKLDQ/PUNPCKLQDQ—Unpack Low Data
PUSH—Push Word, Doubleword or Quadword Onto the Stack
PUSHA/PUSHAD—Push All General-Purpose Registers
PUSHF/PUSHFD/PUSHFQ—Push EFLAGS Register onto the Stack
PXOR—Logical Exclusive OR
RCL/RCR/ROL/ROR—Rotate
RCPPS—Compute Reciprocals of Packed Single-Precision Floating-Point Values
RCPSS—Compute Reciprocal of Scalar Single-Precision Floating-Point Values
RDFSBASE/RDGSBASE—Read FS/GS Segment Base
RDMSR—Read from Model Specific Register
RDPID—Read Processor ID
RDPKRU—Read Protection Key Rights for User Pages
RDPMC—Read Performance-Monitoring Counters
RDRAND—Read Random Number
RDSEED—Read Random SEED
RDTSC—Read Time-Stamp Counter
RDTSCP—Read Time-Stamp Counter and Processor ID
REP/REPE/REPZ/REPNE/REPNZ—Repeat String Operation Prefix
RET—Return from Procedure
RORX — Rotate Right Logical Without Affecting Flags
ROUNDPD — Round Packed Double Precision Floating-Point Values
ROUNDPS — Round Packed Single Precision Floating-Point Values
ROUNDSD — Round Scalar Double Precision Floating-Point Values
ROUNDSS — Round Scalar Single Precision Floating-Point Values
RSM—Resume from System Management Mode
RSQRTPS—Compute Reciprocals of Square Roots of Packed Single-Precision Floating-Point Values
RSQRTSS—Compute Reciprocal of Square Root of Scalar Single-Precision Floating-Point Value
SAHF—Store AH into Flags
SAL/SAR/SHL/SHR—Shift
SARX/SHLX/SHRX — Shift Without Affecting Flags
SBB—Integer Subtraction with Borrow
SCAS/SCASB/SCASW/SCASD—Scan String
SETcc—Set Byte on Condition
SFENCE—Store Fence
SGDT—Store Global Descriptor Table Register
SHA1RNDS4—Perform Four Rounds of SHA1 Operation
SHA1NEXTE—Calculate SHA1 State Variable E after Four Rounds
SHA1MSG1—Perform an Intermediate Calculation for the Next Four SHA1 Message Dwords
SHA1MSG2—Perform a Final Calculation for the Next Four SHA1 Message Dwords
SHA256RNDS2—Perform Two Rounds of SHA256 Operation
SHA256MSG1—Perform an Intermediate Calculation for the Next Four SHA256 Message Dwords
SHA256MSG2—Perform a Final Calculation for the Next Four SHA256 Message Dwords
SHLD—Double Precision Shift Left
SHRD—Double Precision Shift Right
SHUFPD—Packed Interleave Shuffle of Pairs of Double-Precision Floating-Point Values
SHUFPS—Packed Interleave Shuffle of Quadruplets of Single-Precision Floating-Point Values
SIDT—Store Interrupt Descriptor Table Register
SLDT—Store Local Descriptor Table Register
SMSW—Store Machine Status Word
SQRTPD—Square Root of Double-Precision Floating-Point Values
SQRTPS—Square Root of Single-Precision Floating-Point Values
SQRTSD—Compute Square Root of Scalar Double-Precision Floating-Point Value
SQRTSS—Compute Square Root of Scalar Single-Precision Value
STAC—Set AC Flag in EFLAGS Register
STC—Set Carry Flag
STD—Set Direction Flag
STI—Set Interrupt Flag
STMXCSR—Store MXCSR Register State
STOS/STOSB/STOSW/STOSD/STOSQ—Store String
STR—Store Task Register
SUB—Subtract
SUBPD—Subtract Packed Double-Precision Floating-Point Values
SUBPS—Subtract Packed Single-Precision Floating-Point Values
SUBSD—Subtract Scalar Double-Precision Floating-Point Value
SUBSS—Subtract Scalar Single-Precision Floating-Point Value
SWAPGS—Swap GS Base Register
SYSCALL—Fast System Call
SYSENTER—Fast System Call
SYSEXIT—Fast Return from Fast System Call
SYSRET—Return From Fast System Call
TEST—Logical Compare
TPAUSE—Timed PAUSE
TZCNT — Count the Number of Trailing Zero Bits
UCOMISD—Unordered Compare Scalar Double-Precision Floating-Point Values and Set EFLAGS
UCOMISS—Unordered Compare Scalar Single-Precision Floating-Point Values and Set EFLAGS
UD—Undefined Instruction
UMONITOR—User Level Set Up Monitor Address
UMWAIT—User Level Monitor Wait
UNPCKHPD—Unpack and Interleave High Packed Double-Precision Floating-Point Values
UNPCKHPS—Unpack and Interleave High Packed Single-Precision Floating-Point Values
UNPCKLPD—Unpack and Interleave Low Packed Double-Precision Floating-Point Values
UNPCKLPS—Unpack and Interleave Low Packed Single-Precision Floating-Point Values
Chapter 5 Instruction Set Reference, V-Z
5.1 Ternary Bit Vector Logic Table
5.2 Instructions (V-Z)
VALIGND/VALIGNQ—Align Doubleword/Quadword Vectors
VBLENDMPD/VBLENDMPS—Blend Float64/Float32 Vectors Using an OpMask Control
VBROADCAST—Load with Broadcast Floating-Point Data
VCOMPRESSPD—Store Sparse Packed Double-Precision Floating-Point Values into Dense Memory
VCOMPRESSPS—Store Sparse Packed Single-Precision Floating-Point Values into Dense Memory
VCVTPD2QQ—Convert Packed Double-Precision Floating-Point Values to Packed Quadword Integers
VCVTPD2UDQ—Convert Packed Double-Precision Floating-Point Values to Packed Unsigned Doubleword Integers
VCVTPD2UQQ—Convert Packed Double-Precision Floating-Point Values to Packed Unsigned Quadword Integers
VCVTPH2PS—Convert 16-bit FP values to Single-Precision FP values
VCVTPS2PH—Convert Single-Precision FP value to 16-bit FP value
VCVTPS2UDQ—Convert Packed Single-Precision Floating-Point Values to Packed Unsigned Doubleword Integer Values
VCVTPS2QQ—Convert Packed Single Precision Floating-Point Values to Packed Singed Quadword Integer Values
VCVTPS2UQQ—Convert Packed Single Precision Floating-Point Values to Packed Unsigned Quadword Integer Values
VCVTQQ2PD—Convert Packed Quadword Integers to Packed Double-Precision Floating-Point Values
VCVTQQ2PS—Convert Packed Quadword Integers to Packed Single-Precision Floating-Point Values
VCVTSD2USI—Convert Scalar Double-Precision Floating-Point Value to Unsigned Doubleword Integer
VCVTSS2USI—Convert Scalar Single-Precision Floating-Point Value to Unsigned Doubleword Integer
VCVTTPD2QQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Quadword Integers
VCVTTPD2UDQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Unsigned Doubleword Integers
VCVTTPD2UQQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Unsigned Quadword Integers
VCVTTPS2UDQ—Convert with Truncation Packed Single-Precision Floating-Point Values to Packed Unsigned Doubleword Integer Values
VCVTTPS2QQ—Convert with Truncation Packed Single Precision Floating-Point Values to Packed Singed Quadword Integer Values
VCVTTPS2UQQ—Convert with Truncation Packed Single Precision Floating-Point Values to Packed Unsigned Quadword Integer Values
VCVTTSD2USI—Convert with Truncation Scalar Double-Precision Floating-Point Value to Unsigned Integer
VCVTTSS2USI—Convert with Truncation Scalar Single-Precision Floating-Point Value to Unsigned Integer
VCVTUDQ2PD—Convert Packed Unsigned Doubleword Integers to Packed Double-Precision Floating-Point Values
VCVTUDQ2PS—Convert Packed Unsigned Doubleword Integers to Packed Single-Precision Floating-Point Values
VCVTUQQ2PD—Convert Packed Unsigned Quadword Integers to Packed Double-Precision Floating-Point Values
VCVTUQQ2PS—Convert Packed Unsigned Quadword Integers to Packed Single-Precision Floating-Point Values
VCVTUSI2SD—Convert Unsigned Integer to Scalar Double-Precision Floating-Point Value
VCVTUSI2SS—Convert Unsigned Integer to Scalar Single-Precision Floating-Point Value
VDBPSADBW—Double Block Packed Sum-Absolute-Differences (SAD) on Unsigned Bytes
VEXPANDPD—Load Sparse Packed Double-Precision Floating-Point Values from Dense Memory
VEXPANDPS—Load Sparse Packed Single-Precision Floating-Point Values from Dense Memory
VERR/VERW—Verify a Segment for Reading or Writing
VEXTRACTF128/VEXTRACTF32x4/VEXTRACTF64x2/VEXTRACTF32x8/VEXTRACTF64x4—Extra ct Packed Floating-Point Values
VEXTRACTI128/VEXTRACTI32x4/VEXTRACTI64x2/VEXTRACTI32x8/VEXTRACTI64x4—Extract packed Integer Values
VFIXUPIMMPD—Fix Up Special Packed Float64 Values
VFIXUPIMMPS—Fix Up Special Packed Float32 Values
VFIXUPIMMSD—Fix Up Special Scalar Float64 Value
VFIXUPIMMSS—Fix Up Special Scalar Float32 Value
VFMADD132PD/VFMADD213PD/VFMADD231PD—Fused Multiply-Add of Packed Double- Precision Floating-Point Values
VFMADD132PS/VFMADD213PS/VFMADD231PS—Fused Multiply-Add of Packed Single- Precision Floating-Point Values
VFMADD132SD/VFMADD213SD/VFMADD231SD—Fused Multiply-Add of Scalar Double- Precision Floating-Point Values
VFMADD132SS/VFMADD213SS/VFMADD231SS—Fused Multiply-Add of Scalar Single-Precision Floating-Point Values
VFMADDSUB132PD/VFMADDSUB213PD/VFMADDSUB231PD—Fused Multiply-Alternating Add/Subtract of Packed Double-Precision Floating-Point Values
VFMADDSUB132PS/VFMADDSUB213PS/VFMADDSUB231PS—Fused Multiply-Alternating Add/Subtract of Packed Single-Precision Floating-Point Values
VFMSUBADD132PD/VFMSUBADD213PD/VFMSUBADD231PD—Fused Multiply-Alternating Subtract/Add of Packed Double-Precision Floating-Point Values
VFMSUBADD132PS/VFMSUBADD213PS/VFMSUBADD231PS—Fused Multiply-Alternating Subtract/Add of Packed Single-Precision Floating-Point Values
VFMSUB132PD/VFMSUB213PD/VFMSUB231PD—Fused Multiply-Subtract of Packed Double- Precision Floating-Point Values
VFMSUB132PS/VFMSUB213PS/VFMSUB231PS—Fused Multiply-Subtract of Packed Single- Precision Floating-Point Values
VFMSUB132SD/VFMSUB213SD/VFMSUB231SD—Fused Multiply-Subtract of Scalar Double- Precision Floating-Point Values
VFMSUB132SS/VFMSUB213SS/VFMSUB231SS—Fused Multiply-Subtract of Scalar Single- Precision Floating-Point Values
VFNMADD132PD/VFNMADD213PD/VFNMADD231PD—Fused Negative Multiply-Add of Packed Double-Precision Floating-Point Values
VFNMADD132PS/VFNMADD213PS/VFNMADD231PS—Fused Negative Multiply-Add of Packed Single-Precision Floating-Point Values
VFNMADD132SD/VFNMADD213SD/VFNMADD231SD—Fused Negative Multiply-Add of Scalar Double-Precision Floating-Point Values
VFNMADD132SS/VFNMADD213SS/VFNMADD231SS—Fused Negative Multiply-Add of Scalar Single-Precision Floating-Point Values
VFNMSUB132PD/VFNMSUB213PD/VFNMSUB231PD—Fused Negative Multiply-Subtract of Packed Double-Precision Floating-Point Values
VFNMSUB132PS/VFNMSUB213PS/VFNMSUB231PS—Fused Negative Multiply-Subtract of Packed Single-Precision Floating-Point Values
VFNMSUB132SD/VFNMSUB213SD/VFNMSUB231SD—Fused Negative Multiply-Subtract of Scalar Double-Precision Floating-Point Values
VFNMSUB132SS/VFNMSUB213SS/VFNMSUB231SS—Fused Negative Multiply-Subtract of Scalar Single-Precision Floating-Point Values
VFPCLASSPD—Tests Types Of a Packed Float64 Values
VFPCLASSPS—Tests Types Of a Packed Float32 Values
VFPCLASSSD—Tests Types Of a Scalar Float64 Values
VFPCLASSSS—Tests Types Of a Scalar Float32 Values
VGATHERDPD/VGATHERQPD — Gather Packed DP FP Values Using Signed Dword/Qword Indices
VGATHERDPS/VGATHERQPS — Gather Packed SP FP values Using Signed Dword/Qword Indices
VGATHERDPS/VGATHERDPD—Gather Packed Single, Packed Double with Signed Dword
VGATHERQPS/VGATHERQPD—Gather Packed Single, Packed Double with Signed Qword Indices
VGETEXPPD—Convert Exponents of Packed DP FP Values to DP FP Values
VGETEXPPS—Convert Exponents of Packed SP FP Values to SP FP Values
VGETEXPSD—Convert Exponents of Scalar DP FP Values to DP FP Value
VGETEXPSS—Convert Exponents of Scalar SP FP Values to SP FP Value
VGETMANTPD—Extract Float64 Vector of Normalized Mantissas from Float64 Vector
VGETMANTPS—Extract Float32 Vector of Normalized Mantissas from Float32 Vector
VGETMANTSD—Extract Float64 of Normalized Mantissas from Float64 Scalar
VGETMANTSS—Extract Float32 Vector of Normalized Mantissa from Float32 Vector
VINSERTF128/VINSERTF32x4/VINSERTF64x2/VINSERTF32x8/VINSERTF64x4—Insert Packed Floating-Point Values
VINSERTI128/VINSERTI32x4/VINSERTI64x2/VINSERTI32x8/VINSERTI64x4—Insert Packed Integer Values
VMASKMOV—Conditional SIMD Packed Loads and Stores
VPBLENDD — Blend Packed Dwords
VPBLENDMB/VPBLENDMW—Blend Byte/Word Vectors Using an Opmask Control
VPBLENDMD/VPBLENDMQ—Blend Int32/Int64 Vectors Using an OpMask Control
VPBROADCASTB/W/D/Q—Load with Broadcast Integer Data from General Purpose Register
VPBROADCAST—Load Integer and Broadcast
VPBROADCASTM—Broadcast Mask to Vector Register
VPCMPB/VPCMPUB—Compare Packed Byte Values Into Mask
VPCMPD/VPCMPUD—Compare Packed Integer Values into Mask
VPCMPQ/VPCMPUQ—Compare Packed Integer Values into Mask
VPCMPW/VPCMPUW—Compare Packed Word Values Into Mask
VPCOMPRESSD—Store Sparse Packed Doubleword Integer Values into Dense Memory/Register
VPCOMPRESSQ—Store Sparse Packed Quadword Integer Values into Dense Memory/Register
VPCONFLICTD/Q—Detect Conflicts Within a Vector of Packed Dword/Qword Values into Dense Memory/ Register
VPERM2F128 — Permute Floating-Point Values
VPERM2I128 — Permute Integer Values
VPERMB—Permute Packed Bytes Elements
VPERMD/VPERMW—Permute Packed Doublewords/Words Elements
VPERMI2B—Full Permute of Bytes from Two Tables Overwriting the Index
VPERMI2W/D/Q/PS/PD—Full Permute From Two Tables Overwriting the Index
VPERMILPD—Permute In-Lane of Pairs of Double-Precision Floating-Point Values
VPERMILPS—Permute In-Lane of Quadruples of Single-Precision Floating-Point Values
VPERMPD—Permute Double-Precision Floating-Point Elements
VPERMPS—Permute Single-Precision Floating-Point Elements
VPERMQ—Qwords Element Permutation
VPERMT2B—Full Permute of Bytes from Two Tables Overwriting a Table
VPERMT2W/D/Q/PS/PD—Full Permute from Two Tables Overwriting one Table
VPEXPANDD—Load Sparse Packed Doubleword Integer Values from Dense Memory / Register
VPEXPANDQ—Load Sparse Packed Quadword Integer Values from Dense Memory / Register
VPGATHERDD/VPGATHERQD — Gather Packed Dword Values Using Signed Dword/Qword Indices
VPGATHERDD/VPGATHERDQ—Gather Packed Dword, Packed Qword with Signed Dword Indices
VPGATHERDQ/VPGATHERQQ — Gather Packed Qword Values Using Signed Dword/Qword Indices
VPGATHERQD/VPGATHERQQ—Gather Packed Dword, Packed Qword with Signed Qword Indices
VPLZCNTD/Q—Count the Number of Leading Zero Bits for Packed Dword, Packed Qword Values
VPMADD52HUQ—Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit Accumulators
VPMADD52LUQ—Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword Accumulators
VPMASKMOV — Conditional SIMD Integer Packed Loads and Stores
VPMOVB2M/VPMOVW2M/VPMOVD2M/VPMOVQ2M—Convert a Vector Register to a Mask
VPMOVDB/VPMOVSDB/VPMOVUSDB—Down Convert DWord to Byte
VPMOVDW/VPMOVSDW/VPMOVUSDW—Down Convert DWord to Word
VPMOVM2B/VPMOVM2W/VPMOVM2D/VPMOVM2Q—Convert a Mask Register to a Vector Register
VPMOVQB/VPMOVSQB/VPMOVUSQB—Down Convert QWord to Byte
VPMOVQD/VPMOVSQD/VPMOVUSQD—Down Convert QWord to DWord
VPMOVQW/VPMOVSQW/VPMOVUSQW—Down Convert QWord to Word
VPMOVWB/VPMOVSWB/VPMOVUSWB—Down Convert Word to Byte
VPMULTISHIFTQB – Select Packed Unaligned Bytes from Quadword Sources
VPROLD/VPROLVD/VPROLQ/VPROLVQ—Bit Rotate Left
VPRORD/VPRORVD/VPRORQ/VPRORVQ—Bit Rotate Right
VPSCATTERDD/VPSCATTERDQ/VPSCATTERQD/VPSCATTERQQ—Scatter Packed Dword, Packed Qword with Signed Dword, Signed Qword Indices
VPSLLVW/VPSLLVD/VPSLLVQ—Variable Bit Shift Left Logical
VPSRAVW/VPSRAVD/VPSRAVQ—Variable Bit Shift Right Arithmetic
VPSRLVW/VPSRLVD/VPSRLVQ—Variable Bit Shift Right Logical
VPTERNLOGD/VPTERNLOGQ—Bitwise Ternary Logic
VPTESTMB/VPTESTMW/VPTESTMD/VPTESTMQ—Logical AND and Set Mask
VPTESTNMB/W/D/Q—Logical NAND and Set
VRANGEPD—Range Restriction Calculation For Packed Pairs of Float64 Values
VRANGEPS—Range Restriction Calculation For Packed Pairs of Float32 Values
VRANGESD—Range Restriction Calculation From a pair of Scalar Float64 Values
VRANGESS—Range Restriction Calculation From a Pair of Scalar Float32 Values
VRCP14PD—Compute Approximate Reciprocals of Packed Float64 Values
VRCP14SD—Compute Approximate Reciprocal of Scalar Float64 Value
VRCP14PS—Compute Approximate Reciprocals of Packed Float32 Values
VRCP14SS—Compute Approximate Reciprocal of Scalar Float32 Value
VREDUCEPD—Perform Reduction Transformation on Packed Float64 Values
VREDUCESD—Perform a Reduction Transformation on a Scalar Float64 Value
VREDUCEPS—Perform Reduction Transformation on Packed Float32 Values
VREDUCESS—Perform a Reduction Transformation on a Scalar Float32 Value
VRNDSCALEPD—Round Packed Float64 Values To Include A Given Number Of Fraction Bits
VRNDSCALESD—Round Scalar Float64 Value To Include A Given Number Of Fraction Bits
VRNDSCALEPS—Round Packed Float32 Values To Include A Given Number Of Fraction Bits
VRNDSCALESS—Round Scalar Float32 Value To Include A Given Number Of Fraction Bits
VRSQRT14PD—Compute Approximate Reciprocals of Square Roots of Packed Float64 Values
VRSQRT14SD—Compute Approximate Reciprocal of Square Root of Scalar Float64 Value
VRSQRT14PS—Compute Approximate Reciprocals of Square Roots of Packed Float32 Values
VRSQRT14SS—Compute Approximate Reciprocal of Square Root of Scalar Float32 Value
VSCALEFPD—Scale Packed Float64 Values With Float64 Values
VSCALEFSD—Scale Scalar Float64 Values With Float64 Values
VSCALEFPS—Scale Packed Float32 Values With Float32 Values
VSCALEFSS—Scale Scalar Float32 Value With Float32 Value
VSCATTERDPS/VSCATTERDPD/VSCATTERQPS/VSCATTERQPD—Scatter Packed Single, Packed Double with Signed Dword and Qword Indices
VSHUFF32x4/VSHUFF64x2/VSHUFI32x4/VSHUFI64x2—Shuffle Packed Values at 128-bit Granularity
VTESTPD/VTESTPS—Packed Bit Test
VZEROALL—Zero All YMM Registers
VZEROUPPER—Zero Upper Bits of YMM Registers
WAIT/FWAIT—Wait
WBINVD—Write Back and Invalidate Cache
WRFSBASE/WRGSBASE—Write FS/GS Segment Base
WRMSR—Write to Model Specific Register
WRPKRU—Write Data to User Page Key Register
XACQUIRE/XRELEASE — Hardware Lock Elision Prefix Hints
XABORT — Transactional Abort
XADD—Exchange and Add
XBEGIN — Transactional Begin
XCHG—Exchange Register/Memory with Register
XEND — Transactional End
XGETBV—Get Value of Extended Control Register
XLAT/XLATB—Table Look-up Translation
XOR—Logical Exclusive OR
XORPD—Bitwise Logical XOR of Packed Double Precision Floating-Point Values
XORPS—Bitwise Logical XOR of Packed Single Precision Floating-Point Values
XRSTOR—Restore Processor Extended States
XRSTORS—Restore Processor Extended States Supervisor
XSAVE—Save Processor Extended States
XSAVEC—Save Processor Extended States with Compaction
XSAVEOPT—Save Processor Extended States Optimized
XSAVES—Save Processor Extended States Supervisor
XSETBV—Set Extended Control Register
XTEST — Test If In Transactional Execution
Chapter 6 Safer Mode Extensions Reference
6.1 Overview
6.2 SMX Functionality
6.2.1 Detecting and Enabling SMX
6.2.2 SMX Instruction Summary
6.2.2.1 GETSEC[CAPABILITIES]
6.2.2.2 GETSEC[ENTERACCS]
6.2.2.3 GETSEC[EXITAC]
6.2.2.4 GETSEC[SENTER]
6.2.2.5 GETSEC[SEXIT]
6.2.2.6 GETSEC[PARAMETERS]
6.2.2.7 GETSEC[SMCTRL]
6.2.2.8 GETSEC[WAKEUP]
6.2.3 Measured Environment and SMX
6.3 GETSEC Leaf Functions
GETSEC[CAPABILITIES] - Report the SMX Capabilities
GETSEC[ENTERACCS] - Execute Authenticated Chipset Code
GETSEC[EXITAC]—Exit Authenticated Code Execution Mode
GETSEC[SENTER]—Enter a Measured Environment
GETSEC[SEXIT]—Exit Measured Environment
GETSEC[PARAMETERS]—Report the SMX Parameters
GETSEC[SMCTRL]—SMX Mode Control
GETSEC[WAKEUP]—Wake up sleeping processors in measured environment
Chapter 7 Instruction Set Reference Unique to Intel® Xeon Phi™ Processors
PREFETCHWT1—Prefetch Vector Data Into Caches with Intent to Write and T1 Hint
V4FMADDPS/V4FNMADDPS — Packed Single-Precision Floating-Point Fused Multiply-Add (4-iterations)
V4FMADDSS/V4FNMADDSS —Scalar Single-Precision Floating-Point Fused Multiply-Add (4-iterations)
VEXP2PD—Approximation to the Exponential 2^x of Packed Double-Precision Floating-Point Values with Less Than 2^-23 Relative Error
VEXP2PS—Approximation to the Exponential 2^x of Packed Single-Precision Floating-Point Values with Less Than 2^-23 Relative Error
VGATHERPF0DPS/VGATHERPF0QPS/VGATHERPF0DPD/VGATHERPF0QPD—Sparse Prefetch Packed SP/DP Data Values with Signed Dword, Signed Qword Indices Using T0 Hint
VGATHERPF1DPS/VGATHERPF1QPS/VGATHERPF1DPD/VGATHERPF1QPD—Sparse Prefetch Packed SP/DP Data Values with Signed Dword, Signed Qword Indices Using T1 Hint
VP4DPWSSDS — Dot Product of Signed Words with Dword Accumulation and Saturation (4-iterations)
VP4DPWSSD — Dot Product of Signed Words with Dword Accumulation (4-iterations)
VRCP28PD—Approximation to the Reciprocal of Packed Double-Precision Floating-Point Values with Less Than 2^-28 Relative Error
VRCP28SD—Approximation to the Reciprocal of Scalar Double-Precision Floating-Point Value with Less Than 2^-28 Relative Error
VRCP28PS—Approximation to the Reciprocal of Packed Single-Precision Floating-Point Values with Less Than 2^-28 Relative Error
VRCP28SS—Approximation to the Reciprocal of Scalar Single-Precision Floating-Point Value with Less Than 2^-28 Relative Error
VRSQRT28PD—Approximation to the Reciprocal Square Root of Packed Double-Precision Floating-Point Values with Less Than 2^-28 Relative Error
VRSQRT28SD—Approximation to the Reciprocal Square Root of Scalar Double-Precision Floating-Point Value with Less Than 2^-28 Relative Error
VRSQRT28PS—Approximation to the Reciprocal Square Root of Packed Single-Precision Floating-Point Values with Less Than 2^-28 Relative Error
VRSQRT28SS—Approximation to the Reciprocal Square Root of Scalar Single-Precision Floating- Point Value with Less Than 2^-28 Relative Error
VSCATTERPF0DPS/VSCATTERPF0QPS/VSCATTERPF0DPD/VSCATTERPF0QPD—Sparse Prefetch Packed SP/DP Data Values with Signed Dword, Signed Qword Indices Using T0 Hint with Intent to Write
VSCATTERPF1DPS/VSCATTERPF1QPS/VSCATTERPF1DPD/VSCATTERPF1QPD—Sparse Prefetch Packed SP/DP Data Values with Signed Dword, Signed Qword Indices Using T1 Hint with Intent to Write
Appendix A Opcode Map
A.1 Using Opcode Tables
A.2 Key to Abbreviations
A.2.1 Codes for Addressing Method
A.2.2 Codes for Operand Type
A.2.3 Register Codes
A.2.4 Opcode Look-up Examples for One, Two, and Three-Byte Opcodes
A.2.4.1 One-Byte Opcode Instructions
A.2.4.2 Two-Byte Opcode Instructions
A.2.4.3 Three-Byte Opcode Instructions
A.2.4.4 VEX Prefix Instructions
A.2.5 Superscripts Utilized in Opcode Tables
A.3 One, Two, and THREE-Byte Opcode Maps
A.4 Opcode Extensions For One-Byte And Two-byte Opcodes
A.4.1 Opcode Look-up Examples Using Opcode Extensions
A.4.2 Opcode Extension Tables
A.5 Escape Opcode Instructions
A.5.1 Opcode Look-up Examples for Escape Instruction Opcodes
A.5.2 Escape Opcode Instruction Tables
A.5.2.1 Escape Opcodes with D8 as First Byte
A.5.2.2 Escape Opcodes with D9 as First Byte
A.5.2.3 Escape Opcodes with DA as First Byte
A.5.2.4 Escape Opcodes with DB as First Byte
A.5.2.5 Escape Opcodes with DC as First Byte
A.5.2.6 Escape Opcodes with DD as First Byte
A.5.2.7 Escape Opcodes with DE as First Byte
A.5.2.8 Escape Opcodes with DF As First Byte
Appendix B Instruction Formats and Encodings
B.1 Machine Instruction Format
B.1.1 Legacy Prefixes
B.1.2 REX Prefixes
B.1.3 Opcode Fields
B.1.4 Special Fields
B.1.4.1 Reg Field (reg) for Non-64-Bit Modes
B.1.4.2 Reg Field (reg) for 64-Bit Mode
B.1.4.3 Encoding of Operand Size (w) Bit
B.1.4.4 Sign-Extend (s) Bit
B.1.4.5 Segment Register (sreg) Field
B.1.4.6 Special-Purpose Register (eee) Field
B.1.4.7 Condition Test (tttn) Field
B.1.4.8 Direction (d) Bit
B.1.5 Other Notes
B.2 General-Purpose Instruction Formats and Encodings for Non- 64-Bit Modes
B.2.1 General Purpose Instruction Formats and Encodings for 64-Bit Mode
B.3 Pentium® Processor Family Instruction Formats and Encodings
B.4 64-bit Mode Instruction Encodings for SIMD Instruction Extensions
B.5 MMX Instruction Formats and Encodings
B.5.1 Granularity Field (gg)
B.5.2 MMX Technology and General-Purpose Register Fields (mmxreg and reg)
B.5.3 MMX Instruction Formats and Encodings Table
B.6 Processor ExtendeD State INstruction Formats and EncodIngs
B.7 P6 Family INstruction Formats and Encodings
B.8 SSE Instruction Formats and Encodings
B.9 SSE2 Instruction Formats and Encodings
B.9.1 Granularity Field (gg)
B.10 SSE3 Formats and Encodings Table
B.11 SSsE3 Formats and Encoding Table
B.12 AESNI and PCLMULQDQ INstruction Formats and Encodings
B.13 Special Encodings for 64-Bit Mode
B.14 SSE4.1 Formats and Encoding Table
B.15 SSE4.2 Formats and Encoding Table
B.16 AVX Formats and Encoding Table
B.17 Floating-Point Instruction Formats and Encodings
B.18 VMX Instructions
B.19 SMX Instructions
Appendix C Intel® C/C++ Compiler Intrinsics and Functional Equivalents
C.1 Simple Intrinsics
C.2 Composite Intrinsics
Volume 3 (3A, 3B, 3C & 3D):System Programming Guide
Chapter 1 About This Manual
1.1 Intel® 64 and IA-32 Processors Covered in this Manual
1.2 Overview of The SYSTEM PROGRAMMING GUIDE
1.3 Notational Conventions
1.3.1 Bit and Byte Order
1.3.2 Reserved Bits and Software Compatibility
1.3.3 Instruction Operands
1.3.4 Hexadecimal and Binary Numbers
1.3.5 Segmented Addressing
1.3.6 Syntax for CPUID, CR, and MSR Values
1.3.7 Exceptions
1.4 Related Literature
Chapter 2 System Architecture Overview
2.1 Overview of the System-Level Architecture
2.1.1 Global and Local Descriptor Tables
2.1.1.1 Global and Local Descriptor Tables in IA-32e Mode
2.1.2 System Segments, Segment Descriptors, and Gates
2.1.2.1 Gates in IA-32e Mode
2.1.3 Task-State Segments and Task Gates
2.1.3.1 Task-State Segments in IA-32e Mode
2.1.4 Interrupt and Exception Handling
2.1.4.1 Interrupt and Exception Handling IA-32e Mode
2.1.5 Memory Management
2.1.5.1 Memory Management in IA-32e Mode
2.1.6 System Registers
2.1.6.1 System Registers in IA-32e Mode
2.1.7 Other System Resources
2.2 Modes of Operation
2.2.1 Extended Feature Enable Register
2.3 System Flags and Fields in the EFLAGS Register
2.3.1 System Flags and Fields in IA-32e Mode
2.4 Memory-Management Registers
2.4.1 Global Descriptor Table Register (GDTR)
2.4.2 Local Descriptor Table Register (LDTR)
2.4.3 IDTR Interrupt Descriptor Table Register
2.4.4 Task Register (TR)
2.5 Control Registers
2.5.1 CPUID Qualification of Control Register Flags
2.6 Extended Control Registers (Including XCR0)
2.7 Protection Key Rights Register (PKRU)
2.8 System Instruction Summary
2.8.1 Loading and Storing System Registers
2.8.2 Verifying of Access Privileges
2.8.3 Loading and Storing Debug Registers
2.8.4 Invalidating Caches and TLBs
2.8.5 Controlling the Processor
2.8.6 Reading Performance-Monitoring and Time-Stamp Counters
2.8.6.1 Reading Counters in 64-Bit Mode
2.8.7 Reading and Writing Model-Specific Registers
2.8.7.1 Reading and Writing Model-Specific Registers in 64-Bit Mode
2.8.8 Enabling Processor Extended States
Chapter 3 Protected-Mode Memory Management
3.1 Memory Management Overview
3.2 Using Segments
3.2.1 Basic Flat Model
3.2.2 Protected Flat Model
3.2.3 Multi-Segment Model
3.2.4 Segmentation in IA-32e Mode
3.2.5 Paging and Segmentation
3.3 Physical Address Space
3.3.1 Intel® 64 Processors and Physical Address Space
3.4 Logical and Linear Addresses
3.4.1 Logical Address Translation in IA-32e Mode
3.4.2 Segment Selectors
3.4.3 Segment Registers
3.4.4 Segment Loading Instructions in IA-32e Mode
3.4.5 Segment Descriptors
3.4.5.1 Code- and Data-Segment Descriptor Types
3.5 System Descriptor Types
3.5.1 Segment Descriptor Tables
3.5.2 Segment Descriptor Tables in IA-32e Mode
Chapter 4 Paging
4.1 Paging Modes and Control Bits
4.1.1 Three Paging Modes
4.1.2 Paging-Mode Enabling
4.1.3 Paging-Mode Modifiers
4.1.4 Enumeration of Paging Features by CPUID
4.2 Hierarchical Paging Structures: an Overview
4.3 32-Bit Paging
4.4 PAE Paging
4.4.1 PDPTE Registers
4.4.2 Linear-Address Translation with PAE Paging
4.5 4-Level Paging
4.6 Access Rights
4.6.1 Determination of Access Rights
4.6.2 Protection Keys
4.7 Page-Fault Exceptions
4.8 Accessed and Dirty Flags
4.9 Paging and Memory Typing
4.9.1 Paging and Memory Typing When the PAT is Not Supported (Pentium Pro and Pentium II Processors)
4.9.2 Paging and Memory Typing When the PAT is Supported (Pentium III and More Recent Processor Families)
4.9.3 Caching Paging-Related Information about Memory Typing
4.10 Caching Translation Information
4.10.1 Process-Context Identifiers (PCIDs)
4.10.2 Translation Lookaside Buffers (TLBs)
4.10.2.1 Page Numbers, Page Frames, and Page Offsets
4.10.2.2 Caching Translations in TLBs
4.10.2.3 Details of TLB Use
4.10.2.4 Global Pages
4.10.3 Paging-Structure Caches
4.10.3.1 Caches for Paging Structures
4.10.3.2 Using the Paging-Structure Caches to Translate Linear Addresses
4.10.3.3 Multiple Cached Entries for a Single Paging-Structure Entry
4.10.4 Invalidation of TLBs and Paging-Structure Caches
4.10.4.1 Operations that Invalidate TLBs and Paging-Structure Caches
4.10.4.2 Recommended Invalidation
4.10.4.3 Optional Invalidation
4.10.4.4 Delayed Invalidation
4.10.5 Propagation of Paging-Structure Changes to Multiple Processors
4.11 Interactions with Virtual-Machine Extensions (VMX)
4.11.1 VMX Transitions
4.11.2 VMX Support for Address Translation
4.12 Using Paging for Virtual Memory
4.13 Mapping Segments to Pages
Chapter 5 Protection
5.1 Enabling and Disabling Segment and Page Protection
5.2 Fields and Flags Used for Segment-Level and Page-Level Protection
5.2.1 Code-Segment Descriptor in 64-bit Mode
5.3 Limit Checking
5.3.1 Limit Checking in 64-bit Mode
5.4 Type Checking
5.4.1 Null Segment Selector Checking
5.4.1.1 NULL Segment Checking in 64-bit Mode
5.5 Privilege Levels
5.6 Privilege Level Checking When Accessing Data Segments
5.6.1 Accessing Data in Code Segments
5.7 Privilege Level Checking When Loading the SS Register
5.8 Privilege Level Checking When Transferring Program Control Between Code Segments
5.8.1 Direct Calls or Jumps to Code Segments
5.8.1.1 Accessing Nonconforming Code Segments
5.8.1.2 Accessing Conforming Code Segments
5.8.2 Gate Descriptors
5.8.3 Call Gates
5.8.3.1 IA-32e Mode Call Gates
5.8.4 Accessing a Code Segment Through a Call Gate
5.8.5 Stack Switching
5.8.5.1 Stack Switching in 64-bit Mode
5.8.6 Returning from a Called Procedure
5.8.7 Performing Fast Calls to System Procedures with the SYSENTER and SYSEXIT Instructions
5.8.7.1 SYSENTER and SYSEXIT Instructions in IA-32e Mode
5.8.8 Fast System Calls in 64-Bit Mode
5.9 Privileged Instructions
5.10 Pointer Validation
5.10.1 Checking Access Rights (LAR Instruction)
5.10.2 Checking Read/Write Rights (VERR and VERW Instructions)
5.10.3 Checking That the Pointer Offset Is Within Limits (LSL Instruction)
5.10.4 Checking Caller Access Privileges (ARPL Instruction)
5.10.5 Checking Alignment
5.11 Page-Level Protection
5.11.1 Page-Protection Flags
5.11.2 Restricting Addressable Domain
5.11.3 Page Type
5.11.4 Combining Protection of Both Levels of Page Tables
5.11.5 Overrides to Page Protection
5.12 Combining Page and Segment Protection
5.13 Page-Level Protection and Execute-Disable Bit
5.13.1 Detecting and Enabling the Execute-Disable Capability
5.13.2 Execute-Disable Page Protection
5.13.3 Reserved Bit Checking
5.13.4 Exception Handling
Chapter 6 Interrupt and Exception Handling
6.1 Interrupt and Exception Overview
6.2 Exception and Interrupt Vectors
6.3 Sources of Interrupts
6.3.1 External Interrupts
6.3.2 Maskable Hardware Interrupts
6.3.3 Software-Generated Interrupts
6.4 Sources of Exceptions
6.4.1 Program-Error Exceptions
6.4.2 Software-Generated Exceptions
6.4.3 Machine-Check Exceptions
6.5 Exception Classifications
6.6 Program or Task Restart
6.7 NonMaskable Interrupt (NMI)
6.7.1 Handling Multiple NMIs
6.8 Enabling and Disabling Interrupts
6.8.1 Masking Maskable Hardware Interrupts
6.8.2 Masking Instruction Breakpoints
6.8.3 Masking Exceptions and Interrupts When Switching Stacks
6.9 Priority Among Simultaneous Exceptions and Interrupts
6.10 Interrupt Descriptor Table (IDT)
6.11 IDT Descriptors
6.12 Exception and Interrupt Handling
6.12.1 Exception- or Interrupt-Handler Procedures
6.12.1.1 Protection of Exception- and Interrupt-Handler Procedures
6.12.1.2 Flag Usage By Exception- or Interrupt-Handler Procedure
6.12.2 Interrupt Tasks
6.13 Error Code
6.14 Exception and Interrupt Handling in 64-bit Mode
6.14.1 64-Bit Mode IDT
6.14.2 64-Bit Mode Stack Frame
6.14.3 IRET in IA-32e Mode
6.14.4 Stack Switching in IA-32e Mode
6.14.5 Interrupt Stack Table
6.15 Exception and Interrupt Reference
Interrupt 0—Divide Error Exception (#DE)
Interrupt 1—Debug Exception (#DB)
Interrupt 2—NMI Interrupt
Interrupt 3—Breakpoint Exception (#BP)
Interrupt 4—Overflow Exception (#OF)
Interrupt 5—BOUND Range Exceeded Exception (#BR)
Interrupt 6—Invalid Opcode Exception (#UD)
Interrupt 7—Device Not Available Exception (#NM)
Interrupt 8—Double Fault Exception (#DF)
Interrupt 9—Coprocessor Segment Overrun
Interrupt 10—Invalid TSS Exception (#TS)
Interrupt 11—Segment Not Present (#NP)
Interrupt 12—Stack Fault Exception (#SS)
Interrupt 13—General Protection Exception (#GP)
Interrupt 14—Page-Fault Exception (#PF)
Interrupt 16—x87 FPU Floating-Point Error (#MF)
Interrupt 17—Alignment Check Exception (#AC)
Interrupt 18—Machine-Check Exception (#MC)
Interrupt 19—SIMD Floating-Point Exception (#XM)
Interrupt 20—Virtualization Exception (#VE)
Interrupts 32 to 255—User Defined Interrupts
Chapter 7 Task Management
7.1 Task Management Overview
7.1.1 Task Structure
7.1.2 Task State
7.1.3 Executing a Task
7.2 Task Management Data Structures
7.2.1 Task-State Segment (TSS)
7.2.2 TSS Descriptor
7.2.3 TSS Descriptor in 64-bit mode
7.2.4 Task Register
7.2.5 Task-Gate Descriptor
7.3 Task Switching
7.4 Task Linking
7.4.1 Use of Busy Flag To Prevent Recursive Task Switching
7.4.2 Modifying Task Linkages
7.5 Task Address Space
7.5.1 Mapping Tasks to the Linear and Physical Address Spaces
7.5.2 Task Logical Address Space
7.6 16-Bit Task-State Segment (TSS)
7.7 Task Management in 64-bit Mode
Chapter 8 Multiple-Processor Management
8.1 Locked Atomic Operations
8.1.1 Guaranteed Atomic Operations
8.1.2 Bus Locking
8.1.2.1 Automatic Locking
8.1.2.2 Software Controlled Bus Locking
8.1.3 Handling Self- and Cross-Modifying Code
8.1.4 Effects of a LOCK Operation on Internal Processor Caches
8.2 Memory Ordering
8.2.1 Memory Ordering in the Intel® Pentium® and Intel486™ Processors
8.2.2 Memory Ordering in P6 and More Recent Processor Families
8.2.3 Examples Illustrating the Memory-Ordering Principles
8.2.3.1 Assumptions, Terminology, and Notation
8.2.3.2 Neither Loads Nor Stores Are Reordered with Like Operations
8.2.3.3 Stores Are Not Reordered With Earlier Loads
8.2.3.4 Loads May Be Reordered with Earlier Stores to Different Locations
8.2.3.5 Intra-Processor Forwarding Is Allowed
8.2.3.6 Stores Are Transitively Visible
8.2.3.7 Stores Are Seen in a Consistent Order by Other Processors
8.2.3.8 Locked Instructions Have a Total Order
8.2.3.9 Loads and Stores Are Not Reordered with Locked Instructions
8.2.4 Fast-String Operation and Out-of-Order Stores
8.2.4.1 Memory-Ordering Model for String Operations on Write-Back (WB) Memory
8.2.4.2 Examples Illustrating Memory-Ordering Principles for String Operations
8.2.5 Strengthening or Weakening the Memory-Ordering Model
8.3 Serializing Instructions
8.4 Multiple-Processor (MP) Initialization
8.4.1 BSP and AP Processors
8.4.2 MP Initialization Protocol Requirements and Restrictions
8.4.3 MP Initialization Protocol Algorithm for MP Systems
8.4.4 MP Initialization Example
8.4.4.1 Typical BSP Initialization Sequence
8.4.4.2 Typical AP Initialization Sequence
8.4.5 Identifying Logical Processors in an MP System
8.5 Intel® Hyper-Threading Technology and Intel® Multi-Core Technology
8.6 Detecting Hardware Multi-Threading Support and Topology
8.6.1 Initializing Processors Supporting Hyper-Threading Technology
8.6.2 Initializing Multi-Core Processors
8.6.3 Executing Multiple Threads on an Intel® 64 or IA-32 Processor Supporting Hardware Multi-Threading
8.6.4 Handling Interrupts on an IA-32 Processor Supporting Hardware Multi-Threading
8.7 Intel® Hyper-Threading Technology Architecture
8.7.1 State of the Logical Processors
8.7.2 APIC Functionality
8.7.3 Memory Type Range Registers (MTRR)
8.7.4 Page Attribute Table (PAT)
8.7.5 Machine Check Architecture
8.7.6 Debug Registers and Extensions
8.7.7 Performance Monitoring Counters
8.7.8 IA32_MISC_ENABLE MSR
8.7.9 Memory Ordering
8.7.10 Serializing Instructions
8.7.11 Microcode Update Resources
8.7.12 Self Modifying Code
8.7.13 Implementation-Specific Intel HT Technology Facilities
8.7.13.1 Processor Caches
8.7.13.2 Processor Translation Lookaside Buffers (TLBs)
8.7.13.3 Thermal Monitor
8.7.13.4 External Signal Compatibility
8.8 Multi-Core Architecture
8.8.1 Logical Processor Support
8.8.2 Memory Type Range Registers (MTRR)
8.8.3 Performance Monitoring Counters
8.8.4 IA32_MISC_ENABLE MSR
8.8.5 Microcode Update Resources
8.9 Programming Considerations for Hardware Multi-Threading Capable Processors
8.9.1 Hierarchical Mapping of Shared Resources
8.9.2 Hierarchical Mapping of CPUID Extended Topology Leaf
8.9.3 Hierarchical ID of Logical Processors in an MP System
8.9.3.1 Hierarchical ID of Logical Processors with x2APIC ID
8.9.4 Algorithm for Three-Level Mappings of APIC_ID
8.9.5 Identifying Topological Relationships in a MP System
8.10 Management of Idle and Blocked Conditions
8.10.1 HLT Instruction
8.10.2 PAUSE Instruction
8.10.3 Detecting Support MONITOR/MWAIT Instruction
8.10.4 MONITOR/MWAIT Instruction
8.10.5 Monitor/Mwait Address Range Determination
8.10.6 Required Operating System Support
8.10.6.1 Use the PAUSE Instruction in Spin-Wait Loops
8.10.6.2 Potential Usage of MONITOR/MWAIT in C0 Idle Loops
8.10.6.3 Halt Idle Logical Processors
8.10.6.4 Potential Usage of MONITOR/MWAIT in C1 Idle Loops
8.10.6.5 Guidelines for Scheduling Threads on Logical Processors Sharing Execution Resources
8.10.6.6 Eliminate Execution-Based Timing Loops
8.10.6.7 Place Locks and Semaphores in Aligned, 128-Byte Blocks of Memory
8.11 MP Initialization For P6 Family Processors
8.11.1 Overview of the MP Initialization Process For P6 Family Processors
8.11.2 MP Initialization Protocol Algorithm
8.11.2.1 Error Detection and Handling During the MP Initialization Protocol
Chapter 9 Processor Management and Initialization
9.1 Initialization Overview
9.1.1 Processor State After Reset
9.1.2 Processor Built-In Self-Test (BIST)
9.1.3 Model and Stepping Information
9.1.4 First Instruction Executed
9.2 x87 FPU Initialization
9.2.1 Configuring the x87 FPU Environment
9.2.2 Setting the Processor for x87 FPU Software Emulation
9.3 Cache Enabling
9.4 Model-Specific Registers (MSRs)
9.5 Memory Type Range Registers (MTRRs)
9.6 Initializing SSE/SSE2/SSE3/SSSE3 Extensions
9.7 Software Initialization for Real-Address Mode Operation
9.7.1 Real-Address Mode IDT
9.7.2 NMI Interrupt Handling
9.8 Software Initialization for Protected-Mode Operation
9.8.1 Protected-Mode System Data Structures
9.8.2 Initializing Protected-Mode Exceptions and Interrupts
9.8.3 Initializing Paging
9.8.4 Initializing Multitasking
9.8.5 Initializing IA-32e Mode
9.8.5.1 IA-32e Mode System Data Structures
9.8.5.2 IA-32e Mode Interrupts and Exceptions
9.8.5.3 64-bit Mode and Compatibility Mode Operation
9.8.5.4 Switching Out of IA-32e Mode Operation
9.9 Mode Switching
9.9.1 Switching to Protected Mode
9.9.2 Switching Back to Real-Address Mode
9.10 Initialization and Mode Switching Example
9.10.1 Assembler Usage
9.10.2 STARTUP.ASM Listing
9.10.3 MAIN.ASM Source Code
9.10.4 Supporting Files
9.11 Microcode Update Facilities
9.11.1 Microcode Update
9.11.2 Optional Extended Signature Table
9.11.3 Processor Identification
9.11.4 Platform Identification
9.11.5 Microcode Update Checksum
9.11.6 Microcode Update Loader
9.11.6.1 Hard Resets in Update Loading
9.11.6.2 Update in a Multiprocessor System
9.11.6.3 Update in a System Supporting Intel Hyper-Threading Technology
9.11.6.4 Update in a System Supporting Dual-Core Technology
9.11.6.5 Update Loader Enhancements
9.11.7 Update Signature and Verification
9.11.7.1 Determining the Signature
9.11.7.2 Authenticating the Update
9.11.8 Optional Processor Microcode Update Specifications
9.11.8.1 Responsibilities of the BIOS
9.11.8.2 Responsibilities of the Calling Program
9.11.8.3 Microcode Update Functions
9.11.8.4 INT 15H-based Interface
9.11.8.5 Function 00H—Presence Test
9.11.8.6 Function 01H—Write Microcode Update Data
9.11.8.7 Function 02H—Microcode Update Control
9.11.8.8 Function 03H—Read Microcode Update Data
9.11.8.9 Return Codes
Chapter 10 Advanced Programmable Interrupt Controller (APIC)
10.1 Local and I/O APIC Overview
10.2 System Bus Vs. APIC Bus
10.3 The Intel® 82489DX External APIC, the APIC, the xAPIC, and the X2APIC
10.4 Local APIC
10.4.1 The Local APIC Block Diagram
10.4.2 Presence of the Local APIC
10.4.3 Enabling or Disabling the Local APIC
10.4.4 Local APIC Status and Location
10.4.5 Relocating the Local APIC Registers
10.4.6 Local APIC ID
10.4.7 Local APIC State
10.4.7.1 Local APIC State After Power-Up or Reset
10.4.7.2 Local APIC State After It Has Been Software Disabled
10.4.7.3 Local APIC State After an INIT Reset (“Wait-for-SIPI” State)
10.4.7.4 Local APIC State After It Receives an INIT-Deassert IPI
10.4.8 Local APIC Version Register
10.5 Handling Local Interrupts
10.5.1 Local Vector Table
10.5.2 Valid Interrupt Vectors
10.5.3 Error Handling
10.5.4 APIC Timer
10.5.4.1 TSC-Deadline Mode
10.5.5 Local Interrupt Acceptance
10.6 Issuing Interprocessor Interrupts
10.6.1 Interrupt Command Register (ICR)
10.6.2 Determining IPI Destination
10.6.2.1 Physical Destination Mode
10.6.2.2 Logical Destination Mode
10.6.2.3 Broadcast/Self Delivery Mode
10.6.2.4 Lowest Priority Delivery Mode
10.6.3 IPI Delivery and Acceptance
10.7 System and APIC Bus Arbitration
10.8 Handling Interrupts
10.8.1 Interrupt Handling with the Pentium 4 and Intel Xeon Processors
10.8.2 Interrupt Handling with the P6 Family and Pentium Processors
10.8.3 Interrupt, Task, and Processor Priority
10.8.3.1 Task and Processor Priorities
10.8.4 Interrupt Acceptance for Fixed Interrupts
10.8.5 Signaling Interrupt Servicing Completion
10.8.6 Task Priority in IA-32e Mode
10.8.6.1 Interaction of Task Priorities between CR8 and APIC
10.9 Spurious Interrupt
10.10 APIC Bus Message Passing Mechanism and Protocol (P6 Family, Pentium Processors)
10.10.1 Bus Message Formats
10.11 Message Signalled Interrupts
10.11.1 Message Address Register Format
10.11.2 Message Data Register Format
10.12 Extended XAPIC (x2APIC)
10.12.1 Detecting and Enabling x2APIC Mode
10.12.1.1 Instructions to Access APIC Registers
10.12.1.2 x2APIC Register Address Space
10.12.1.3 Reserved Bit Checking
10.12.2 x2APIC Register Availability
10.12.3 MSR Access in x2APIC Mode
10.12.4 VM-Exit Controls for MSRs and x2APIC Registers
10.12.5 x2APIC State Transitions
10.12.5.1 x2APIC States
x2APIC After Reset
x2APIC Transitions From x2APIC Mode
x2APIC Transitions From Disabled Mode
State Changes From xAPIC Mode to x2APIC Mode
10.12.6 Routing of Device Interrupts in x2APIC Mode
10.12.7 Initialization by System Software
10.12.8 CPUID Extensions And Topology Enumeration
10.12.8.1 Consistency of APIC IDs and CPUID
10.12.9 ICR Operation in x2APIC Mode
10.12.10 Determining IPI Destination in x2APIC Mode
10.12.10.1 Logical Destination Mode in x2APIC Mode
10.12.10.2 Deriving Logical x2APIC ID from the Local x2APIC ID
10.12.11 SELF IPI Register
10.13 APIC Bus Message Formats
10.13.1 Bus Message Formats
10.13.2 EOI Message
10.13.2.1 Short Message
10.13.2.2 Non-focused Lowest Priority Message
10.13.2.3 APIC Bus Status Cycles
Chapter 11 Memory Cache Control
11.1 Internal Caches, TLBs, and Buffers
11.2 Caching Terminology
11.3 Methods of Caching Available
11.3.1 Buffering of Write Combining Memory Locations
11.3.2 Choosing a Memory Type
11.3.3 Code Fetches in Uncacheable Memory
11.4 Cache Control Protocol
11.5 Cache Control
11.5.1 Cache Control Registers and Bits
11.5.2 Precedence of Cache Controls
11.5.2.1 Selecting Memory Types for Pentium Pro and Pentium II Processors
11.5.2.2 Selecting Memory Types for Pentium III and More Recent Processor Families
11.5.2.3 Writing Values Across Pages with Different Memory Types
11.5.3 Preventing Caching
11.5.4 Disabling and Enabling the L3 Cache
11.5.5 Cache Management Instructions
11.5.6 L1 Data Cache Context Mode
11.5.6.1 Adaptive Mode
11.5.6.2 Shared Mode
11.6 Self-Modifying Code
11.7 Implicit Caching (Pentium 4, Intel Xeon, and P6 Family Processors)
11.8 Explicit Caching
11.9 Invalidating the Translation Lookaside Buffers (TLBs)
11.10 Store Buffer
11.11 Memory Type Range Registers (MTRRs)
11.11.1 MTRR Feature Identification
11.11.2 Setting Memory Ranges with MTRRs
11.11.2.1 IA32_MTRR_DEF_TYPE MSR
11.11.2.2 Fixed Range MTRRs
11.11.2.3 Variable Range MTRRs
11.11.2.4 System-Management Range Register Interface
11.11.3 Example Base and Mask Calculations
11.11.3.1 Base and Mask Calculations for Greater-Than 36-bit Physical Address Support
11.11.4 Range Size and Alignment Requirement
11.11.4.1 MTRR Precedences
11.11.5 MTRR Initialization
11.11.6 Remapping Memory Types
11.11.7 MTRR Maintenance Programming Interface
11.11.7.1 MemTypeGet() Function
11.11.7.2 MemTypeSet() Function
11.11.8 MTRR Considerations in MP Systems
11.11.9 Large Page Size Considerations
11.12 Page Attribute Table (PAT)
11.12.1 Detecting Support for the PAT Feature
11.12.2 IA32_PAT MSR
11.12.3 Selecting a Memory Type from the PAT
11.12.4 Programming the PAT
11.12.5 PAT Compatibility with Earlier IA-32 Processors
Chapter 12 Intel® MMX™ Technology System Programming
12.1 Emulation of the MMX Instruction Set
12.2 The MMX State and MMX Register Aliasing
12.2.1 Effect of MMX, x87 FPU, FXSAVE, and FXRSTOR Instructions on the x87 FPU Tag Word
12.3 Saving and Restoring the MMX State and Registers
12.4 Saving MMX State on Task or Context Switches
12.5 EXCEPTIONS That Can Occur When Executing MMX Instructions
12.5.1 Effect of MMX Instructions on Pending x87 Floating-Point Exceptions
12.6 Debugging MMX Code
Chapter 13 System Programming for Instruction Set Extensions and Processor Extended States
13.1 Providing Operating System Support for SSE Extensions
13.1.1 Adding Support to an Operating System for SSE Extensions
13.1.2 Checking for CPU Support
13.1.3 Initialization of the SSE Extensions
13.1.4 Providing Non-Numeric Exception Handlers for Exceptions Generated by the SSE Instructions
13.1.5 Providing a Handler for the SIMD Floating-Point Exception (#XM)
13.1.5.1 Numeric Error flag and IGNNE#
13.2 Emulation of SSE Extensions
13.3 Saving and Restoring SSE State
13.4 Designing OS Facilities for Saving x87 FPU, SSE AND EXTENDED States on Task or Context Switches
13.4.1 Using the TS Flag to Control the Saving of the x87 FPU and SSE State
13.5 The XSAVE Feature Set and Processor Extended State Management
13.5.1 Checking the Support for XSAVE Feature Set
13.5.2 Determining the XSAVE Managed Feature States And The Required Buffer Size
13.5.3 Enable the Use Of XSAVE Feature Set And XSAVE State Components
13.5.4 Provide an Initialization for the XSAVE State Components
13.5.5 Providing the Required Exception Handlers
13.6 Interoperability Of The XSAVE Feature Set And FXSAVE/FXRSTOR
13.7 The XSAVE Feature Set And Processor Supervisor State Management
13.8 System Programming For XSAVE ManAged Features
13.8.1 Intel® Advanced Vector Extensions (Intel® AVX)
13.8.2 Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
Chapter 14 Power and Thermal Management
14.1 Enhanced Intel Speedstep® Technology
14.1.1 Software Interface For Initiating Performance State Transitions
14.2 P-State Hardware Coordination
14.3 System Software Considerations and Opportunistic processor Performance operation
14.3.1 Intel® Dynamic Acceleration Technology
14.3.2 System Software Interfaces for Opportunistic Processor Performance Operation
14.3.2.1 Discover Hardware Support and Enabling of Opportunistic Processor Performance Operation
14.3.2.2 OS Control of Opportunistic Processor Performance Operation
14.3.2.3 Required Changes to OS Power Management P-State Policy
14.3.3 Intel® Turbo Boost Technology
14.3.4 Performance and Energy Bias Hint support
14.4 Hardware-Controlled Performance States (HWP)
14.4.1 HWP Programming Interfaces
14.4.2 Enabling HWP
14.4.3 HWP Performance Range and Dynamic Capabilities
14.4.4 Managing HWP
14.4.4.1 IA32_HWP_REQUEST MSR (Address: 0x774 Logical Processor Scope)
14.4.4.2 IA32_HWP_REQUEST_PKG MSR (Address: 0x772 Package Scope)
14.4.4.3 IA32_HWP_PECI_REQUEST_INFO MSR (Address 0x775 Package Scope)
14.4.5 HWP Feedback
14.4.5.1 Non-Architectural HWP Feedback
14.4.6 HWP Notifications
14.4.7 Idle Logical Processor Impact on Core Frequency
14.4.8 Fast Write of Uncore MSR (Model Specific Feature)
14.4.8.1 FAST_UNCORE_MSRS_CAPABILITY (Address: 0x65F, Logical Processor Scope)
14.4.8.2 FAST_UNCORE_MSRS_CTL (Address: 0x657, Logical Processor Scope)
14.4.8.3 FAST_UNCORE_MSRS_STATUS (Address: 0x65E, Logical Processor Scope)
14.4.9 Fast_IA32_HWP_REQUEST CPUID
14.4.10 Recommendations for OS use of HWP Controls
14.5 Hardware Duty Cycling (HDC)
14.5.1 Hardware Duty Cycling Programming Interfaces
14.5.2 Package level Enabling HDC
14.5.3 Logical-Processor Level HDC Control
14.5.4 HDC Residency Counters
14.5.4.1 IA32_THREAD_STALL
14.5.4.2 Non-Architectural HDC Residency Counters
14.5.5 MPERF and APERF Counters Under HDC
14.6 MWAIT Extensions for Advanced Power Management
14.7 Thermal Monitoring and Protection
14.7.1 Catastrophic Shutdown Detector
14.7.2 Thermal Monitor
14.7.2.1 Thermal Monitor 1
14.7.2.2 Thermal Monitor 2
14.7.2.3 Two Methods for Enabling TM2
14.7.2.4 Performance State Transitions and Thermal Monitoring
14.7.2.5 Thermal Status Information
14.7.2.6 Adaptive Thermal Monitor
14.7.3 Software Controlled Clock Modulation
14.7.3.1 Extension of Software Controlled Clock Modulation
14.7.4 Detection of Thermal Monitor and Software Controlled Clock Modulation Facilities
14.7.4.1 Detection of Software Controlled Clock Modulation Extension
14.7.5 On Die Digital Thermal Sensors
14.7.5.1 Digital Thermal Sensor Enumeration
14.7.5.2 Reading the Digital Sensor
14.7.6 Power Limit Notification
14.8 Package Level Thermal Management
14.8.1 Support for Passive and Active cooling
14.9 Platform Specific Power Management Support
14.9.1 RAPL Interfaces
14.9.2 RAPL Domains and Platform Specificity
14.9.3 Package RAPL Domain
14.9.4 PP0/PP1 RAPL Domains
14.9.5 DRAM RAPL Domain
Chapter 15 Machine-Check Architecture
15.1 Machine-Check Architecture
15.2 Compatibility with Pentium Processor
15.3 Machine-Check MSRs
15.3.1 Machine-Check Global Control MSRs
15.3.1.1 IA32_MCG_CAP MSR
15.3.1.2 IA32_MCG_STATUS MSR
15.3.1.3 IA32_MCG_CTL MSR
15.3.1.4 IA32_MCG_EXT_CTL MSR
15.3.1.5 Enabling Local Machine Check
15.3.2 Error-Reporting Register Banks
15.3.2.1 IA32_MCi_CTL MSRs
15.3.2.2 IA32_MCi_STATUS MSRS
15.3.2.3 IA32_MCi_ADDR MSRs
15.3.2.4 IA32_MCi_MISC MSRs
15.3.2.5 IA32_MCi_CTL2 MSRs
15.3.2.6 IA32_MCG Extended Machine Check State MSRs
15.3.3 Mapping of the Pentium Processor Machine-Check Errors to the Machine-Check Architecture
15.4 Enhanced Cache Error reporting
15.5 Corrected Machine Check Error Interrupt
15.5.1 CMCI Local APIC Interface
15.5.2 System Software Recommendation for Managing CMCI and Machine Check Resources
15.5.2.1 CMCI Initialization
15.5.2.2 CMCI Threshold Management
15.5.2.3 CMCI Interrupt Handler
15.6 Recovery of Uncorrected Recoverable (UCR) Errors
15.6.1 Detection of Software Error Recovery Support
15.6.2 UCR Error Reporting and Logging
15.6.3 UCR Error Classification
15.6.4 UCR Error Overwrite Rules
15.7 Machine-Check Availability
15.8 Machine-Check Initialization
15.9 Interpreting the MCA Error Codes
15.9.1 Simple Error Codes
15.9.2 Compound Error Codes
15.9.2.1 Correction Report Filtering (F) Bit
15.9.2.2 Transaction Type (TT) Sub-Field
15.9.2.3 Level (LL) Sub-Field
15.9.2.4 Request (RRRR) Sub-Field
15.9.2.5 Bus and Interconnect Errors
15.9.2.6 Memory Controller and Extended Memory Errors
15.9.3 Architecturally Defined UCR Errors
15.9.3.1 Architecturally Defined SRAO Errors
15.9.3.2 Architecturally Defined SRAR Errors
15.9.4 Multiple MCA Errors
15.9.5 Machine-Check Error Codes Interpretation
15.10 Guidelines for Writing Machine-Check Software
15.10.1 Machine-Check Exception Handler
15.10.2 Pentium Processor Machine-Check Exception Handling
15.10.3 Logging Correctable Machine-Check Errors
15.10.4 Machine-Check Software Handler Guidelines for Error Recovery
15.10.4.1 Machine-Check Exception Handler for Error Recovery
15.10.4.2 Corrected Machine-Check Handler for Error Recovery
Chapter 16 Interpreting Machine-Check Error Codes
16.1 Incremental Decoding Information: Processor Family 06H Machine Error Codes For Machine Check
16.2 Incremental Decoding Information: Intel Core 2 Processor Family Machine Error Codes For Machine Check
16.2.1 Model-Specific Machine Check Error Codes for Intel Xeon Processor 7400 Series
16.2.1.1 Processor Machine Check Status Register Incremental MCA Error Code Definition
16.2.2 Intel Xeon Processor 7400 Model Specific Error Code Field
16.2.2.1 Processor Model Specific Error Code Field Type B: Bus and Interconnect Error
16.2.2.2 Processor Model Specific Error Code Field Type C: Cache Bus Controller Error
16.3 Incremental Decoding Information: Processor Family with CPUID DisplayFamily_DisplayModel Signature 06_1AH, Machine Error Codes For Machine Check
16.3.1 Intel QPI Machine Check Errors
16.3.2 Internal Machine Check Errors
16.3.3 Memory Controller Errors
16.4 Incremental Decoding Information: Processor Family with CPUID DisplayFamily_DisplayModel Signature 06_2DH, Machine Error Codes For Machine Check
16.4.1 Internal Machine Check Errors
16.4.2 Intel QPI Machine Check Errors
16.4.3 Integrated Memory Controller Machine Check Errors
16.5 Incremental Decoding Information: Processor Family with CPUID DisplayFamily_DisplayModel Signature 06_3EH, Machine Error Codes For Machine Check
16.5.1 Internal Machine Check Errors
16.5.2 Integrated Memory Controller Machine Check Errors
16.5.3 Home Agent Machine Check Errors
16.6 Incremental Decoding Information: Processor Family with CPUID DisplayFamily_DisplayModel Signature 06_3FH, Machine Error Codes For Machine Check
16.6.1 Internal Machine Check Errors
16.6.2 Intel QPI Machine Check Errors
16.6.3 Integrated Memory Controller Machine Check Errors
16.6.4 Home Agent Machine Check Errors
16.7 Incremental Decoding Information: Processor Family with CPUID DisplayFamily_DisplayModel Signature 06_56H, Machine Error Codes For Machine Check
16.7.1 Internal Machine Check Errors
16.7.2 Integrated Memory Controller Machine Check Errors
16.8 Incremental Decoding Information: Processor Family with CPUID DisplayFamily_DisplayModel Signature 06_4FH, Machine Error Codes For Machine Check
16.8.1 Integrated Memory Controller Machine Check Errors
16.8.2 Home Agent Machine Check Errors
16.9 Incremental Decoding Information: Intel® Xeon® Processor Scalable Family, Machine Error Codes For Machine Check
16.9.1 Internal Machine Check Errors
16.9.2 Interconnect Machine Check Errors
16.9.3 Integrated Memory Controller Machine Check Errors
16.9.4 M2M Machine Check Errors
16.9.5 Home Agent Machine Check Errors
16.10 Incremental Decoding Information: Processor Family with CPUID DisplayFamily_DisplayModel Signature 06_5FH, Machine Error Codes For Machine Check
16.10.1 Integrated Memory Controller Machine Check Errors
16.11 Incremental Decoding Information: Processor Family 0FH Machine Error Codes For Machine Check
16.11.1 Model-Specific Machine Check Error Codes for Intel Xeon Processor MP 7100 Series
16.11.1.1 Processor Machine Check Status Register MCA Error Code Definition
16.11.2 Other_Info Field (all MCA Error Types)
16.11.3 Processor Model Specific Error Code Field
16.11.3.1 MCA Error Type A: L3 Error
16.11.3.2 Processor Model Specific Error Code Field Type B: Bus and Interconnect Error
16.11.3.3 Processor Model Specific Error Code Field Type C: Cache Bus Controller Error
Chapter 17 Debug, Branch Profile, TSC, and Intel® Resource Director Technology (Intel® RDT) Features
17.1 Overview of Debug Support Facilities
17.2 Debug Registers
17.2.1 Debug Address Registers (DR0-DR3)
17.2.2 Debug Registers DR4 and DR5
17.2.3 Debug Status Register (DR6)
17.2.4 Debug Control Register (DR7)
17.2.5 Breakpoint Field Recognition
17.2.6 Debug Registers and Intel® 64 Processors
17.3 Debug Exceptions
17.3.1 Debug Exception (#DB)—Interrupt Vector 1
17.3.1.1 Instruction-Breakpoint Exception Condition
17.3.1.2 Data Memory and I/O Breakpoint Exception Conditions
17.3.1.3 General-Detect Exception Condition
17.3.1.4 Single-Step Exception Condition
17.3.1.5 Task-Switch Exception Condition
17.3.2 Breakpoint Exception (#BP)—Interrupt Vector 3
17.3.3 Debug Exceptions, Breakpoint Exceptions, and Restricted Transactional Memory (RTM)
17.4 Last Branch, Interrupt, and Exception Recording Overview
17.4.1 IA32_DEBUGCTL MSR
17.4.2 Monitoring Branches, Exceptions, and Interrupts
17.4.3 Single-Stepping on Branches
17.4.4 Branch Trace Messages
17.4.4.1 Branch Trace Message Visibility
17.4.5 Branch Trace Store (BTS)
17.4.6 CPL-Qualified Branch Trace Mechanism
17.4.7 Freezing LBR and Performance Counters on PMI
17.4.8 LBR Stack
17.4.8.1 LBR Stack and Intel® 64 Processors
17.4.8.2 LBR Stack and IA-32 Processors
17.4.8.3 Last Exception Records and Intel 64 Architecture
17.4.9 BTS and DS Save Area
17.4.9.1 64 Bit Format of the DS Save Area
17.4.9.2 Setting Up the DS Save Area
17.4.9.3 Setting Up the BTS Buffer
17.4.9.4 Setting Up CPL-Qualified BTS
17.4.9.5 Writing the DS Interrupt Service Routine
17.5 Last Branch, Interrupt, and Exception Recording (Intel® Core™ 2 Duo and Intel® Atom™ Processors)
17.5.1 LBR Stack
17.5.2 LBR Stack in Intel Atom Processors based on the Silvermont Microarchitecture
17.6 Last Branch, Call Stack, Interrupt, and Exception Recording for Processors based on Goldmont Microarchitecture
17.7 Last Branch, Call Stack, Interrupt, and Exception Recording for Processors based on Goldmont Plus Microarchitecture
17.8 Last Branch, Interrupt and Exception Recording for Intel® Xeon Phi™ Processor 7200/5200/3200
17.9 Last Branch, Interrupt, and Exception Recording for Processors based on Intel® Microarchitecture code name Nehalem
17.9.1 LBR Stack
17.9.2 Filtering of Last Branch Records
17.10 Last Branch, Interrupt, and Exception Recording for Processors based on Intel® Microarchitecture code name Sandy Bridge
17.11 Last Branch, Call Stack, Interrupt, and Exception Recording for Processors based on Haswell Microarchitecture
17.11.1 LBR Stack Enhancement
17.12 Last Branch, Call Stack, Interrupt, and Exception Recording for Processors based on Skylake Microarchitecture
17.12.1 MSR_LBR_INFO_x MSR
17.12.2 Streamlined Freeze_LBRs_On_PMI Operation
17.12.3 LBR Behavior and Deep C-State
17.13 Last Branch, Interrupt, and Exception Recording (Processors based on Intel NetBurst® Microarchitecture)
17.13.1 MSR_DEBUGCTLA MSR
17.13.2 LBR Stack for Processors Based on Intel NetBurst® Microarchitecture
17.13.3 Last Exception Records
17.14 Last Branch, Interrupt, and Exception Recording (Intel® Core™ Solo and Intel® Core™ Duo Processors)
17.15 Last Branch, Interrupt, and Exception Recording (Pentium M Processors)
17.16 Last Branch, Interrupt, and Exception Recording (P6 Family Processors)
17.16.1 DEBUGCTLMSR Register
17.16.2 Last Branch and Last Exception MSRs
17.16.3 Monitoring Branches, Exceptions, and Interrupts
17.17 Time-Stamp Counter
17.17.1 Invariant TSC
17.17.2 IA32_TSC_AUX Register and RDTSCP Support
17.17.3 Time-Stamp Counter Adjustment
17.17.4 Invariant Time-Keeping
17.18 Intel® Resource Director Technology (Intel® RDT) Monitoring Features
17.18.1 Overview of Cache Monitoring Technology and Memory Bandwidth Monitoring
17.18.2 Enabling Monitoring: Usage Flow
17.18.3 Enumeration and Detecting Support of Cache Monitoring Technology and Memory Bandwidth Monitoring
17.18.4 Monitoring Resource Type and Capability Enumeration
17.18.5 Feature-Specific Enumeration
17.18.5.1 Cache Monitoring Technology
17.18.5.2 Memory Bandwidth Monitoring
17.18.6 Monitoring Resource RMID Association
17.18.7 Monitoring Resource Selection and Reporting Infrastructure
17.18.8 Monitoring Programming Considerations
17.18.8.1 Monitoring Dynamic Configuration
17.18.8.2 Monitoring Operation With Power Saving Features
17.18.8.3 Monitoring Operation with Other Operating Modes
17.18.8.4 Monitoring Operation with RAS Features
17.19 Intel® Resource Director Technology (Intel® RDT) Allocation Features
17.19.1 Introduction to Cache Allocation Technology (CAT)
17.19.2 Cache Allocation Technology Architecture
17.19.3 Code and Data Prioritization (CDP) Technology
17.19.4 Enabling Cache Allocation Technology Usage Flow
17.19.4.1 Enumeration and Detection Support of Cache Allocation Technology
17.19.4.2 Cache Allocation Technology: Resource Type and Capability Enumeration
17.19.4.3 Cache Allocation Technology: Cache Mask Configuration
17.19.4.4 Class of Service to Cache Mask Association: Common Across Allocation Features
17.19.5 Code and Data Prioritization (CDP): Enumerating and Enabling L3 CDP Technology
17.19.5.1 Mapping Between L3 CDP Masks and CAT Masks
17.19.6 Code and Data Prioritization (CDP): Enumerating and Enabling L2 CDP Technology
17.19.6.1 Mapping Between L2 CDP Masks and L2 CAT Masks
17.19.6.2 Common L2 and L3 CDP Programming Considerations
17.19.6.3 Cache Allocation Technology Dynamic Configuration
17.19.6.4 Cache Allocation Technology Operation With Power Saving Features
17.19.6.5 Cache Allocation Technology Operation with Other Operating Modes
17.19.6.6 Associating Threads with CAT/CDP Classes of Service
17.19.7 Introduction to Memory Bandwidth Allocation
17.19.7.1 Memory Bandwidth Allocation Enumeration
17.19.7.2 Memory Bandwidth Allocation Configuration
17.19.7.3 Memory Bandwidth Allocation Usage Considerations
Chapter 18 Performance Monitoring
18.1 Performance Monitoring Overview
18.2 Architectural Performance Monitoring
18.2.1 Architectural Performance Monitoring Version 1
18.2.1.1 Architectural Performance Monitoring Version 1 Facilities
18.2.1.2 Pre-defined Architectural Performance Events
18.2.2 Architectural Performance Monitoring Version 2
18.2.3 Architectural Performance Monitoring Version 3
18.2.3.1 AnyThread Counting and Software Evolution
18.2.4 Architectural Performance Monitoring Version 4
18.2.4.1 Enhancement in IA32_PERF_GLOBAL_STATUS
18.2.4.2 IA32_PERF_GLOBAL_STATUS_RESET and IA32_PERF_GLOBAL_STATUS_SET MSRS
18.2.4.3 IA32_PERF_GLOBAL_INUSE MSR
18.2.5 Architectural Performance Monitoring Version 5
18.2.5.1 AnyThread Mode Deprecation
18.2.5.2 Fixed Counter Enumeration
18.2.6 Full-Width Writes to Performance Counter Registers
18.3 Performance Monitoring (Intel® Core™ Processors and Intel® Xeon® Processors)
18.3.1 Performance Monitoring for Processors Based on Intel® Microarchitecture Code Name Nehalem
18.3.1.1 Enhancements of Performance Monitoring in the Processor Core
18.3.1.2 Performance Monitoring Facility in the Uncore
18.3.1.3 Intel® Xeon® Processor 7500 Series Performance Monitoring Facility
18.3.2 Performance Monitoring for Processors Based on Intel® Microarchitecture Code Name Westmere
18.3.3 Intel® Xeon® Processor E7 Family Performance Monitoring Facility
18.3.4 Performance Monitoring for Processors Based on Intel® Microarchitecture Code Name Sandy Bridge
18.3.4.1 Global Counter Control Facilities In Intel® Microarchitecture Code Name Sandy Bridge
18.3.4.2 Counter Coalescence
18.3.4.3 Full Width Writes to Performance Counters
18.3.4.4 PEBS Support in Intel® Microarchitecture Code Name Sandy Bridge
18.3.4.5 Off-core Response Performance Monitoring
18.3.4.6 Uncore Performance Monitoring Facilities In Intel® Core™ i7-2xxx, Intel® Core™ i5-2xxx, Intel® Core™ i3-2xxx Processor Series
18.3.4.7 Intel® Xeon® Processor E5 Family Performance Monitoring Facility
18.3.4.8 Intel® Xeon® Processor E5 Family Uncore Performance Monitoring Facility
18.3.5 3rd Generation Intel® Core™ Processor Performance Monitoring Facility
18.3.5.1 Intel® Xeon® Processor E5 v2 and E7 v2 Family Uncore Performance Monitoring Facility
18.3.6 4th Generation Intel® Core™ Processor Performance Monitoring Facility
18.3.6.1 Processor Event Based Sampling (PEBS) Facility
18.3.6.2 PEBS Data Format
18.3.6.3 PEBS Data Address Profiling
18.3.6.4 Off-core Response Performance Monitoring
18.3.6.5 Performance Monitoring and Intel® TSX
18.3.6.6 Uncore Performance Monitoring Facilities in the 4th Generation Intel® Core™ Processors
18.3.6.7 Intel® Xeon® Processor E5 v3 Family Uncore Performance Monitoring Facility
18.3.7 5th Generation Intel® Core™ Processor and Intel® Core™ M Processor Performance Monitoring Facility
18.3.8 6th Generation, 7th Generation and 8th Generation Intel® Core™ Processor Performance Monitoring Facility
18.3.8.1 Processor Event Based Sampling (PEBS) Facility
18.3.8.2 Off-core Response Performance Monitoring
18.3.8.3 Uncore Performance Monitoring Facilities on Intel® Core™ Processors Based on Cannon Lake Microarchitecture
18.3.9 Next Generation Intel® Core™ Processor Performance Monitoring Facility
18.3.9.1 Processor Event Based Sampling (PEBS) Facility
18.3.9.2 Off-core Response Performance Monitoring
18.3.9.3 Performance Metrics
18.4 Performance monitoring (Intel® Xeon™ Phi Processors)
18.4.1 Intel® Xeon Phi™ Processor 7200/5200/3200 Performance Monitoring
18.4.1.1 Enhancements of Performance Monitoring in the Intel® Xeon Phi™ processor Tile
18.5 Performance Monitoring (Intel® Atom™ Processors)
18.5.1 Performance Monitoring (45 nm and 32 nm Intel® Atom™ Processors)
18.5.2 Performance Monitoring for Silvermont Microarchitecture
18.5.2.1 Enhancements of Performance Monitoring in the Processor Core
18.5.2.2 Offcore Response Event
18.5.2.3 Average Offcore Request Latency Measurement
18.5.3 Performance Monitoring for Goldmont Microarchitecture
18.5.3.1 Processor Event Based Sampling (PEBS)
18.5.3.2 Offcore Response Event
18.5.3.3 Average Offcore Request Latency Measurement
18.5.4 Performance Monitoring for Goldmont Plus Microarchitecture
18.5.4.1 Extended PEBS
18.5.5 Performance Monitoring for Tremont Microarchitecture
18.5.5.1 Adaptive PEBS
18.5.5.2 PEBS output to Intel® Processor Trace
18.5.5.3 Precise Distribution Support on Fixed Counter 0
18.5.5.4 Compatibility Enhancements to Offcore Response MSRs
18.6 Performance Monitoring (Legacy Intel Processors)
18.6.1 Performance Monitoring (Intel® Core™ Solo and Intel® Core™ Duo Processors)
18.6.2 Performance Monitoring (Processors Based on Intel® Core™ Microarchitecture)
18.6.2.1 Fixed-function Performance Counters
18.6.2.2 Global Counter Control Facilities
18.6.2.3 At-Retirement Events
18.6.2.4 Processor Event Based Sampling (PEBS)
18.6.3 Performance Monitoring (Processors Based on Intel NetBurst® Microarchitecture)
18.6.3.1 ESCR MSRs
18.6.3.2 Performance Counters
18.6.3.3 CCCR MSRs
18.6.3.4 Debug Store (DS) Mechanism
18.6.3.5 Programming the Performance Counters for Non-Retirement Events
18.6.3.6 At-Retirement Counting
18.6.3.7 Tagging Mechanism for Replay_event
18.6.3.8 Processor Event-Based Sampling (PEBS)
18.6.3.9 Operating System Implications
18.6.4 Performance Monitoring and Intel Hyper-Threading Technology in Processors Based on Intel NetBurst® Microarchitecture
18.6.4.1 ESCR MSRs
18.6.4.2 CCCR MSRs
18.6.4.3 IA32_PEBS_ENABLE MSR
18.6.4.4 Performance Monitoring Events
18.6.4.5 Counting Clocks on systems with Intel Hyper-Threading Technology in Processors Based on Intel NetBurst® Microarchitecture
18.6.5 Performance Monitoring and Dual-Core Technology
18.6.6 Performance Monitoring on 64-bit Intel Xeon Processor MP with Up to 8-MByte L3 Cache
18.6.7 Performance Monitoring on L3 and Caching Bus Controller Sub-Systems
18.6.7.1 Overview of Performance Monitoring with L3/Caching Bus Controller
18.6.7.2 GBSQ Event Interface
18.6.7.3 GSNPQ Event Interface
18.6.7.4 FSB Event Interface
18.6.7.5 Common Event Control Interface
18.6.8 Performance Monitoring (P6 Family Processor)
18.6.8.1 PerfEvtSel0 and PerfEvtSel1 MSRs
18.6.8.2 PerfCtr0 and PerfCtr1 MSRs
18.6.8.3 Starting and Stopping the Performance-Monitoring Counters
18.6.8.4 Event and Time-Stamp Monitoring Software
18.6.8.5 Monitoring Counter Overflow
18.6.9 Performance Monitoring (Pentium Processors)
18.6.9.1 Control and Event Select Register (CESR)
18.6.9.2 Use of the Performance-Monitoring Pins
18.6.9.3 Events Counted
18.7 Counting Clocks
18.7.1 Non-Halted Reference Clockticks
18.7.2 Cycle Counting and Opportunistic Processor Operation
18.7.3 Determining the Processor Base Frequency
18.7.3.1 For Intel® Processors Based on Microarchitecture Code Name Sandy Bridge, Ivy Bridge, Haswell and Broadwell
18.7.3.2 For Intel® Processors Based on Microarchitecture Code Name Nehalem
18.7.3.3 For Intel® Atom™ Processors Based on the Silvermont Microarchitecture (Including Intel Processors Based on Airmont Microarchitecture)
18.7.3.4 For Intel® Core™ 2 Processor Family and for Intel® Xeon® Processors Based on Intel Core Microarchitecture
18.8 IA32_PERF_CAPABILITIES MSR Enumeration
18.8.1 Filtering of SMM Handler Overhead
18.9 PEBS Facility
18.9.1 Extended PEBS
18.9.2 Adaptive PEBS
18.9.2.1 Adaptive_Record Counter Control
18.9.2.2 PEBS Record Format
18.9.2.3 MSR_PEBS_DATA_CFG
18.9.2.4 PEBS Record Examples
18.9.3 Precise Distribution of Instructions Retired (PDIR) Facility
18.9.4 Reduced Skid PEBS
Chapter 19 Performance Monitoring Events
19.1 Architectural Performance Monitoring Events
19.2 Performance Monitoring Events for Intel® Xeon® Processor Scalable Family
19.3 Performance Monitoring Events for Future Intel® Core™ Processors
19.4 Performance Monitoring Events for 6th Generation, 7th Generation and 8th Generation Intel® Core™ Processors
19.5 Performance Monitoring Events for Intel® Xeon Phi™ Processor 3200, 5200, 7200 Series and Intel® Xeon Phi™ Processor 7215, 7285, 7295 Series
19.6 Performance Monitoring Events for the Intel® Core™ M and 5th Generation Intel® Core™ Processors
19.7 Performance Monitoring Events for the 4th Generation Intel® Core™ ProcessorS
19.7.1 Performance Monitoring Events in the Processor Core of Intel Xeon Processor E5 v3 Family
19.8 Performance Monitoring Events for 3rd Generation Intel® Core™ ProcessorS
19.8.1 Performance Monitoring Events in the Processor Core of Intel Xeon Processor E5 v2 Family and Intel Xeon Processor E7 v2 Family
19.9 Performance Monitoring Events for 2nd Generation Intel® Core™ i7-2xxx, Intel® Core™ i5-2xxx, Intel® Core™ i3-2xxx Processor Series
19.10 Performance Monitoring Events for Intel® Core™ i7 Processor Family and Intel® Xeon® Processor Family
19.11 Performance Monitoring Events for processors based on Intel® microarchitecture Code Name Westmere
19.12 Performance Monitoring Events for Intel® Xeon® Processor 5200, 5400 Series and Intel® Core™2 Extreme Processors QX 9000 Series
19.13 Performance Monitoring Events for Intel® Xeon® Processor 3000, 3200, 5100, 5300 Series and Intel® Core™2 Duo ProcessorS
19.14 Performance Monitoring Events for Processors Based on the Goldmont Plus Microarchitecture
19.15 Performance Monitoring Events for Processors Based on the Goldmont Microarchitecture
19.16 Performance Monitoring Events for Processors Based on the Silvermont Microarchitecture
19.16.1 Performance Monitoring Events for Processors Based on the Airmont Microarchitecture
19.17 Performance Monitoring Events for 45 nm and 32 nm Intel® Atom™ Processors
19.18 Performance Monitoring Events for Intel® Core™ Solo and Intel® Core™ Duo Processors
19.19 Pentium® 4 and Intel® Xeon® Processor Performance Monitoring Events
19.20 Performance Monitoring Events for Intel® Pentium® M Processors
19.21 P6 Family Processor Performance Monitoring Events
19.22 Pentium Processor Performance Monitoring Events
Chapter 20 8086 Emulation
20.1 Real-Address Mode
20.1.1 Address Translation in Real-Address Mode
20.1.2 Registers Supported in Real-Address Mode
20.1.3 Instructions Supported in Real-Address Mode
20.1.4 Interrupt and Exception Handling
20.2 Virtual-8086 Mode
20.2.1 Enabling Virtual-8086 Mode
20.2.2 Structure of a Virtual-8086 Task
20.2.3 Paging of Virtual-8086 Tasks
20.2.4 Protection within a Virtual-8086 Task
20.2.5 Entering Virtual-8086 Mode
20.2.6 Leaving Virtual-8086 Mode
20.2.7 Sensitive Instructions
20.2.8 Virtual-8086 Mode I/O
20.2.8.1 I/O-Port-Mapped I/O
20.2.8.2 Memory-Mapped I/O
20.2.8.3 Special I/O Buffers
20.3 Interrupt and Exception Handling in Virtual-8086 Mode
20.3.1 Class 1—Hardware Interrupt and Exception Handling in Virtual-8086 Mode
20.3.1.1 Handling an Interrupt or Exception Through a Protected-Mode Trap or Interrupt Gate
20.3.1.2 Handling an Interrupt or Exception With an 8086 Program Interrupt or Exception Handler
20.3.1.3 Handling an Interrupt or Exception Through a Task Gate
20.3.2 Class 2—Maskable Hardware Interrupt Handling in Virtual-8086 Mode Using the Virtual Interrupt Mechanism
20.3.3 Class 3—Software Interrupt Handling in Virtual-8086 Mode
20.3.3.1 Method 1: Software Interrupt Handling
20.3.3.2 Methods 2 and 3: Software Interrupt Handling
20.3.3.3 Method 4: Software Interrupt Handling
20.3.3.4 Method 5: Software Interrupt Handling
20.3.3.5 Method 6: Software Interrupt Handling
20.4 Protected-Mode Virtual Interrupts
Chapter 21 Mixing 16-Bit and 32-Bit Code
21.1 Defining 16-Bit and 32-Bit Program Modules
21.2 Mixing 16-Bit and 32-Bit Operations Within a Code Segment
21.3 Sharing Data Among Mixed-Size Code Segments
21.4 Transferring Control Among Mixed-Size Code Segments
21.4.1 Code-Segment Pointer Size
21.4.2 Stack Management for Control Transfer
21.4.2.1 Controlling the Operand-Size Attribute For a Call
21.4.2.2 Passing Parameters With a Gate
21.4.3 Interrupt Control Transfers
21.4.4 Parameter Translation
21.4.5 Writing Interface Procedures
Chapter 22 Architecture Compatibility
22.1 Processor Families and Categories
22.2 Reserved Bits
22.3 Enabling New Functions and Modes
22.4 Detecting the Presence of New Features Through Software
22.5 Intel MMX Technology
22.6 Streaming SIMD Extensions (SSE)
22.7 Streaming SIMD Extensions 2 (SSE2)
22.8 Streaming SIMD Extensions 3 (SSE3)
22.9 Additional Streaming SIMD Extensions
22.10 Intel Hyper-Threading Technology
22.11 Multi-Core Technology
22.12 Specific Features of Dual-Core Processor
22.13 New Instructions In the Pentium and Later IA-32 Processors
22.13.1 Instructions Added Prior to the Pentium Processor
22.14 Obsolete Instructions
22.15 Undefined Opcodes
22.16 New Flags in the EFLAGS Register
22.16.1 Using EFLAGS Flags to Distinguish Between 32-Bit IA-32 Processors
22.17 Stack Operations and User Software
22.17.1 PUSH SP
22.17.2 EFLAGS Pushed on the Stack
22.18 x87 FPU
22.18.1 Control Register CR0 Flags
22.18.2 x87 FPU Status Word
22.18.2.1 Condition Code Flags (C0 through C3)
22.18.2.2 Stack Fault Flag
22.18.3 x87 FPU Control Word
22.18.4 x87 FPU Tag Word
22.18.5 Data Types
22.18.5.1 NaNs
22.18.5.2 Pseudo-zero, Pseudo-NaN, Pseudo-infinity, and Unnormal Formats
22.18.6 Floating-Point Exceptions
22.18.6.1 Denormal Operand Exception (#D)
22.18.6.2 Numeric Overflow Exception (#O)
22.18.6.3 Numeric Underflow Exception (#U)
22.18.6.4 Exception Precedence
22.18.6.5 CS and EIP For FPU Exceptions
22.18.6.6 FPU Error Signals
22.18.6.7 Assertion of the FERR# Pin
22.18.6.8 Invalid Operation Exception On Denormals
22.18.6.9 Alignment Check Exceptions (#AC)
22.18.6.10 Segment Not Present Exception During FLDENV
22.18.6.11 Device Not Available Exception (#NM)
22.18.6.12 Coprocessor Segment Overrun Exception
22.18.6.13 General Protection Exception (#GP)
22.18.6.14 Floating-Point Error Exception (#MF)
22.18.7 Changes to Floating-Point Instructions
22.18.7.1 FDIV, FPREM, and FSQRT Instructions
22.18.7.2 FSCALE Instruction
22.18.7.3 FPREM1 Instruction
22.18.7.4 FPREM Instruction
22.18.7.5 FUCOM, FUCOMP, and FUCOMPP Instructions
22.18.7.6 FPTAN Instruction
22.18.7.7 Stack Overflow
22.18.7.8 FSIN, FCOS, and FSINCOS Instructions
22.18.7.9 FPATAN Instruction
22.18.7.10 F2XM1 Instruction
22.18.7.11 FLD Instruction
22.18.7.12 FXTRACT Instruction
22.18.7.13 Load Constant Instructions
22.18.7.14 FXAM Instruction
22.18.7.15 FSAVE and FSTENV Instructions
22.18.8 Transcendental Instructions
22.18.9 Obsolete Instructions and Undefined Opcodes
22.18.10 WAIT/FWAIT Prefix Differences
22.18.11 Operands Split Across Segments and/or Pages
22.18.12 FPU Instruction Synchronization
22.19 Serializing Instructions
22.20 FPU and Math Coprocessor Initialization
22.20.1 Intel® 387 and Intel® 287 Math Coprocessor Initialization
22.20.2 Intel486 SX Processor and Intel 487 SX Math Coprocessor Initialization
22.21 Control Registers
22.22 Memory Management Facilities
22.22.1 New Memory Management Control Flags
22.22.1.1 Physical Memory Addressing Extension
22.22.1.2 Global Pages
22.22.1.3 Larger Page Sizes
22.22.2 CD and NW Cache Control Flags
22.22.3 Descriptor Types and Contents
22.22.4 Changes in Segment Descriptor Loads
22.23 Debug Facilities
22.23.1 Differences in Debug Register DR6
22.23.2 Differences in Debug Register DR7
22.23.3 Debug Registers DR4 and DR5
22.24 Recognition of Breakpoints
22.25 Exceptions and/or Exception Conditions
22.25.1 Machine-Check Architecture
22.25.2 Priority of Exceptions
22.25.3 Exception Conditions of Legacy SIMD Instructions Operating on MMX Registers
22.26 Interrupts
22.26.1 Interrupt Propagation Delay
22.26.2 NMI Interrupts
22.26.3 IDT Limit
22.27 Advanced Programmable Interrupt Controller (APIC)
22.27.1 Software Visible Differences Between the Local APIC and the 82489DX
22.27.2 New Features Incorporated in the Local APIC for the P6 Family and Pentium Processors
22.27.3 New Features Incorporated in the Local APIC of the Pentium 4 and Intel Xeon Processors
22.28 Task Switching and TSs
22.28.1 P6 Family and Pentium Processor TSS
22.28.2 TSS Selector Writes
22.28.3 Order of Reads/Writes to the TSS
22.28.4 Using A 16-Bit TSS with 32-Bit Constructs
22.28.5 Differences in I/O Map Base Addresses
22.29 Cache Management
22.29.1 Self-Modifying Code with Cache Enabled
22.29.2 Disabling the L3 Cache
22.30 Paging
22.30.1 Large Pages
22.30.2 PCD and PWT Flags
22.30.3 Enabling and Disabling Paging
22.31 Stack Operations and Supervisor Software
22.31.1 Selector Pushes and Pops
22.31.2 Error Code Pushes
22.31.3 Fault Handling Effects on the Stack
22.31.4 Interlevel RET/IRET From a 16-Bit Interrupt or Call Gate
22.32 Mixing 16- and 32-Bit Segments
22.33 Segment and Address Wraparound
22.33.1 Segment Wraparound
22.34 Store Buffers and Memory Ordering
22.35 Bus Locking
22.36 Bus Hold
22.37 Model-Specific Extensions to the IA-32
22.37.1 Model-Specific Registers
22.37.2 RDMSR and WRMSR Instructions
22.37.3 Memory Type Range Registers
22.37.4 Machine-Check Exception and Architecture
22.37.5 Performance-Monitoring Counters
22.38 Two Ways to Run Intel 286 Processor Tasks
22.39 Initial State of Pentium, Pentium Pro and Pentium 4 Processors
Chapter 23 Introduction to Virtual Machine Extensions
23.1 Overview
23.2 Virtual Machine Architecture
23.3 Introduction to VMX Operation
23.4 Life Cycle of VMM Software
23.5 Virtual-Machine Control Structure
23.6 Discovering Support for VMX
23.7 Enabling and Entering VMX Operation
23.8 Restrictions on VMX Operation
Chapter 24 Virtual Machine Control Structures
24.1 Overview
24.2 Format of the VMCS Region
24.3 Organization of VMCS Data
24.4 Guest-State Area
24.4.1 Guest Register State
24.4.2 Guest Non-Register State
24.5 Host-State Area
24.6 VM-Execution Control Fields
24.6.1 Pin-Based VM-Execution Controls
24.6.2 Processor-Based VM-Execution Controls
24.6.3 Exception Bitmap
24.6.4 I/O-Bitmap Addresses
24.6.5 Time-Stamp Counter Offset and Multiplier
24.6.6 Guest/Host Masks and Read Shadows for CR0 and CR4
24.6.7 CR3-Target Controls
24.6.8 Controls for APIC Virtualization
24.6.9 MSR-Bitmap Address
24.6.10 Executive-VMCS Pointer
24.6.11 Extended-Page-Table Pointer (EPTP)
24.6.12 Virtual-Processor Identifier (VPID)
24.6.13 Controls for PAUSE-Loop Exiting
24.6.14 VM-Function Controls
24.6.15 VMCS Shadowing Bitmap Addresses
24.6.16 ENCLS-Exiting Bitmap
24.6.17 ENCLV-Exiting Bitmap
24.6.18 Control Field for Page-Modification Logging
24.6.19 Controls for Virtualization Exceptions
24.6.20 XSS-Exiting Bitmap
24.6.21 Sub-Page-Permission-Table Pointer (SPPTP)
24.7 VM-Exit Control Fields
24.7.1 VM-Exit Controls
24.7.2 VM-Exit Controls for MSRs
24.8 VM-Entry Control Fields
24.8.1 VM-Entry Controls
24.8.2 VM-Entry Controls for MSRs
24.8.3 VM-Entry Controls for Event Injection
24.9 VM-Exit Information Fields
24.9.1 Basic VM-Exit Information
24.9.2 Information for VM Exits Due to Vectored Events
24.9.3 Information for VM Exits That Occur During Event Delivery
24.9.4 Information for VM Exits Due to Instruction Execution
24.9.5 VM-Instruction Error Field
24.10 VMCS Types: Ordinary and Shadow
24.11 Software Use of the VMCS and Related Structures
24.11.1 Software Use of Virtual-Machine Control Structures
24.11.2 VMREAD, VMWRITE, and Encodings of VMCS Fields
24.11.3 Initializing a VMCS
24.11.4 Software Access to Related Structures
24.11.5 VMXON Region
Chapter 25 VMX Non-Root Operation
25.1 Instructions That Cause VM Exits
25.1.1 Relative Priority of Faults and VM Exits
25.1.2 Instructions That Cause VM Exits Unconditionally
25.1.3 Instructions That Cause VM Exits Conditionally
25.2 Other Causes of VM Exits
25.3 Changes to Instruction Behavior in VMX Non-Root Operation
25.4 Other Changes in VMX Non-Root Operation
25.4.1 Event Blocking
25.4.2 Treatment of Task Switches
25.5 Features Specific to VMX Non-Root Operation
25.5.1 VMX-Preemption Timer
25.5.2 Monitor Trap Flag
25.5.3 Translation of Guest-Physical Addresses Using EPT
25.5.3.1 Guest-Physical Address Translation for Intel PT: Details
25.5.3.2 Trace-Address Pre-Translation (TAPT)
25.5.4 APIC Virtualization
25.5.5 VM Functions
25.5.5.1 Enabling VM Functions
25.5.5.2 General Operation of the VMFUNC Instruction
25.5.5.3 EPTP Switching
25.5.6 Virtualization Exceptions
25.5.6.1 Convertible EPT Violations
25.5.6.2 Virtualization-Exception Information
25.5.6.3 Delivery of Virtualization Exceptions
25.6 Unrestricted Guests
Chapter 26 VM Entries
26.1 Basic VM-Entry Checks
26.2 Checks on VMX Controls and Host-State Area
26.2.1 Checks on VMX Controls
26.2.1.1 VM-Execution Control Fields
26.2.1.2 VM-Exit Control Fields
26.2.1.3 VM-Entry Control Fields
26.2.2 Checks on Host Control Registers and MSRs
26.2.3 Checks on Host Segment and Descriptor-Table Registers
26.2.4 Checks Related to Address-Space Size
26.3 Checking and Loading Guest State
26.3.1 Checks on the Guest State Area
26.3.1.1 Checks on Guest Control Registers, Debug Registers, and MSRs
26.3.1.2 Checks on Guest Segment Registers
26.3.1.3 Checks on Guest Descriptor-Table Registers
26.3.1.4 Checks on Guest RIP and RFLAGS
26.3.1.5 Checks on Guest Non-Register State
26.3.1.6 Checks on Guest Page-Directory-Pointer-Table Entries
26.3.2 Loading Guest State
26.3.2.1 Loading Guest Control Registers, Debug Registers, and MSRs
26.3.2.2 Loading Guest Segment Registers and Descriptor-Table Registers
26.3.2.3 Loading Guest RIP, RSP, and RFLAGS
26.3.2.4 Loading Page-Directory-Pointer-Table Entries
26.3.2.5 Updating Non-Register State
26.3.3 Clearing Address-Range Monitoring
26.4 Loading MSRs
26.5 Trace-Address Pre-Translation (TAPT)
26.6 Event Injection
26.6.1 Vectored-Event Injection
26.6.1.1 Details of Vectored-Event Injection
26.6.1.2 VM Exits During Event Injection
26.6.1.3 Event Injection for VM Entries to Real-Address Mode
26.6.2 Injection of Pending MTF VM Exits
26.7 Special Features of VM Entry
26.7.1 Interruptibility State
26.7.2 Activity State
26.7.3 Delivery of Pending Debug Exceptions after VM Entry
26.7.4 VMX-Preemption Timer
26.7.5 Interrupt-Window Exiting and Virtual-Interrupt Delivery
26.7.6 NMI-Window Exiting
26.7.7 VM Exits Induced by the TPR Threshold
26.7.8 Pending MTF VM Exits
26.7.9 VM Entries and Advanced Debugging Features
26.8 VM-Entry Failures During or After Loading Guest State
26.9 Machine-Check Events During VM Entry
Chapter 27 VM Exits
27.1 Architectural State Before a VM Exit
27.2 Recording VM-Exit Information and Updating VM-Entry Control Fields
27.2.1 Basic VM-Exit Information
27.2.2 Information for VM Exits Due to Vectored Events
27.2.3 Information About NMI Unblocking Due to IRET
27.2.4 Information for VM Exits During Event Delivery
27.2.5 Information for VM Exits Due to Instruction Execution
27.3 Saving Guest State
27.3.1 Saving Control Registers, Debug Registers, and MSRs
27.3.2 Saving Segment Registers and Descriptor-Table Registers
27.3.3 Saving RIP, RSP, and RFLAGS
27.3.4 Saving Non-Register State
27.4 Saving MSRs
27.5 Loading Host State
27.5.1 Loading Host Control Registers, Debug Registers, MSRs
27.5.2 Loading Host Segment and Descriptor-Table Registers
27.5.3 Loading Host RIP, RSP, and RFLAGS
27.5.4 Checking and Loading Host Page-Directory-Pointer-Table Entries
27.5.5 Updating Non-Register State
27.5.6 Clearing Address-Range Monitoring
27.6 Loading MSRs
27.7 VMX Aborts
27.8 Machine-Check Events During VM Exit
Chapter 28 VMX Support for Address Translation
28.1 Virtual Processor Identifiers (VPIDs)
28.2 The Extended Page Table Mechanism (EPT)
28.2.1 EPT Overview
28.2.2 EPT Translation Mechanism
28.2.3 EPT-Induced VM Exits
28.2.3.1 EPT Misconfigurations
28.2.3.2 EPT Violations
28.2.3.3 Prioritization of EPT Misconfigurations and EPT Violations
28.2.4 Sub-Page Write Permissions
28.2.4.1 Write Accesses That Are Eligible for Sub-Page Write Permissions
28.2.4.2 Determining an Access’s Sub-Page Write Permission
28.2.5 Accessed and Dirty Flags for EPT
28.2.6 Page-Modification Logging
28.2.7 EPT and Memory Typing
28.2.7.1 Memory Type Used for Accessing EPT Paging Structures
28.2.7.2 Memory Type Used for Translated Guest-Physical Addresses
28.3 Caching Translation Information
28.3.1 Information That May Be Cached
28.3.2 Creating and Using Cached Translation Information
28.3.3 Invalidating Cached Translation Information
28.3.3.1 Operations that Invalidate Cached Mappings
28.3.3.2 Operations that Need Not Invalidate Cached Mappings
28.3.3.3 Guidelines for Use of the INVVPID Instruction
28.3.3.4 Guidelines for Use of the INVEPT Instruction
Chapter 29 APIC Virtualization and Virtual Interrupts
29.1 Virtual APIC State
29.1.1 Virtualized APIC Registers
29.1.2 TPR Virtualization
29.1.3 PPR Virtualization
29.1.4 EOI Virtualization
29.1.5 Self-IPI Virtualization
29.2 Evaluation and Delivery of Virtual Interrupts
29.2.1 Evaluation of Pending Virtual Interrupts
29.2.2 Virtual-Interrupt Delivery
29.3 Virtualizing CR8-Based TPR Accesses
29.4 Virtualizing Memory-Mapped APIC Accesses
29.4.1 Priority of APIC-Access VM Exits
29.4.2 Virtualizing Reads from the APIC-Access Page
29.4.3 Virtualizing Writes to the APIC-Access Page
29.4.3.1 Determining Whether a Write Access is Virtualized
29.4.3.2 APIC-Write Emulation
29.4.3.3 APIC-Write VM Exits
29.4.4 Instruction-Specific Considerations
29.4.5 Issues Pertaining to Page Size and TLB Management
29.4.6 APIC Accesses Not Directly Resulting From Linear Addresses
29.4.6.1 Guest-Physical Accesses to the APIC-Access Page
29.4.6.2 Physical Accesses to the APIC-Access Page
29.5 Virtualizing MSR-Based APIC Accesses
29.6 Posted-Interrupt Processing
Chapter 30 VMX Instruction Reference
30.1 Overview
30.2 Conventions
30.3 VMX Instructions
INVEPT— Invalidate Translations Derived from EPT
INVVPID— Invalidate Translations Based on VPID
VMCALL—Call to VM Monitor
VMCLEAR—Clear Virtual-Machine Control Structure
VMFUNC—Invoke VM function
VMLAUNCH/VMRESUME—Launch/Resume Virtual Machine
VMPTRLD—Load Pointer to Virtual-Machine Control Structure
VMPTRST—Store Pointer to Virtual-Machine Control Structure
VMREAD—Read Field from Virtual-Machine Control Structure
VMRESUME—Resume Virtual Machine
VMWRITE—Write Field to Virtual-Machine Control Structure
VMXOFF—Leave VMX Operation
VMXON—Enter VMX Operation
30.4 VM Instruction Error Numbers
Chapter 31 Virtual-Machine Monitor Programming Considerations
31.1 VMX System Programming Overview
31.2 Supporting Processor Operating Modes in Guest Environments
31.2.1 Using Unrestricted Guest Mode
31.3 Managing VMCS Regions and Pointers
31.4 Using VMX Instructions
31.5 VMM Setup & Tear Down
31.5.1 Algorithms for Determining VMX Capabilities
31.6 Preparation and Launching a Virtual Machine
31.7 Handling of VM Exits
31.7.1 Handling VM Exits Due to Exceptions
31.7.1.1 Reflecting Exceptions to Guest Software
31.7.1.2 Resuming Guest Software after Handling an Exception
31.8 Multi-Processor Considerations
31.8.1 Initialization
31.8.2 Moving a VMCS Between Processors
31.8.3 Paired Index-Data Registers
31.8.4 External Data Structures
31.8.5 CPUID Emulation
31.9 32-Bit and 64-Bit Guest Environments
31.9.1 Operating Modes of Guest Environments
31.9.2 Handling Widths of VMCS Fields
31.9.2.1 Natural-Width VMCS Fields
31.9.2.2 64-Bit VMCS Fields
31.9.3 IA-32e Mode Hosts
31.9.4 IA-32e Mode Guests
31.9.5 32-Bit Guests
31.10 Handling Model Specific Registers
31.10.1 Using VM-Execution Controls
31.10.2 Using VM-Exit Controls for MSRs
31.10.3 Using VM-Entry Controls for MSRs
31.10.4 Handling Special-Case MSRs and Instructions
31.10.4.1 Handling IA32_EFER MSR
31.10.4.2 Handling the SYSENTER and SYSEXIT Instructions
31.10.4.3 Handling the SYSCALL and SYSRET Instructions
31.10.4.4 Handling the SWAPGS Instruction
31.10.4.5 Implementation Specific Behavior on Writing to Certain MSRs
31.10.5 Handling Accesses to Reserved MSR Addresses
31.11 Handling Accesses to Control Registers
31.12 Performance Considerations
31.13 Use of The VMX-Preemption Timer
Chapter 32 Virtualization of System Resources
32.1 Overview
32.2 Virtualization Support for Debugging Facilities
32.2.1 Debug Exceptions
32.3 Memory Virtualization
32.3.1 Processor Operating Modes & Memory Virtualization
32.3.2 Guest & Host Physical Address Spaces
32.3.3 Virtualizing Virtual Memory by Brute Force
32.3.4 Alternate Approach to Memory Virtualization
32.3.5 Details of Virtual TLB Operation
32.3.5.1 Initialization of Virtual TLB
32.3.5.2 Response to Page Faults
32.3.5.3 Response to Uses of INVLPG
32.3.5.4 Response to CR3 Writes
32.4 Microcode Update Facility
32.4.1 Early Load of Microcode Updates
32.4.2 Late Load of Microcode Updates
Chapter 33 Handling Boundary Conditions in a Virtual Machine Monitor
33.1 Overview
33.2 Interrupt Handling in VMX Operation
33.3 External Interrupt Virtualization
33.3.1 Virtualization of Interrupt Vector Space
33.3.2 Control of Platform Interrupts
33.3.2.1 PIC Virtualization
33.3.2.2 xAPIC Virtualization
33.3.2.3 Local APIC Virtualization
33.3.2.4 I/O APIC Virtualization
33.3.2.5 Virtualization of Message Signaled Interrupts
33.3.3 Examples of Handling of External Interrupts
33.3.3.1 Guest Setup
33.3.3.2 Processor Treatment of External Interrupt
33.3.3.3 Processing of External Interrupts by VMM
33.3.3.4 Generation of Virtual Interrupt Events by VMM
33.4 Error Handling by VMM
33.4.1 VM-Exit Failures
33.4.2 Machine-Check Considerations
33.4.3 MCA Error Handling Guidelines for VMM
33.4.3.1 VMM Error Handling Strategies
33.4.3.2 Basic VMM MCA error recovery handling
33.4.3.3 Implementation Considerations for the Basic Model
33.4.3.4 MCA Virtualization
33.4.3.5 Implementation Considerations for the MCA Virtualization Model
33.5 Handling Activity States by VMM
Chapter 34 System Management Mode
34.1 System Management Mode Overview
34.1.1 System Management Mode and VMX Operation
34.2 System Management Interrupt (SMI)
34.3 Switching Between SMM and the Other Processor Operating Modes
34.3.1 Entering SMM
34.3.2 Exiting From SMM
34.4 SMRAM
34.4.1 SMRAM State Save Map
34.4.1.1 SMRAM State Save Map and Intel 64 Architecture
34.4.2 SMRAM Caching
34.4.2.1 System Management Range Registers (SMRR)
34.5 SMI Handler Execution Environment
34.5.1 Initial SMM Execution Environment
34.5.2 SMI Handler Operating Mode Switching
34.6 Exceptions and Interrupts Within SMM
34.7 Managing Synchronous and Asynchronous System Management Interrupts
34.7.1 I/O State Implementation
34.8 NMI Handling While in SMM
34.9 SMM Revision Identifier
34.10 Auto HALT Restart
34.10.1 Executing the HLT Instruction in SMM
34.11 SMBASE Relocation
34.12 I/O Instruction Restart
34.12.1 Back-to-Back SMI Interrupts When I/O Instruction Restart Is Being Used
34.13 SMM Multiple-Processor Considerations
34.14 Default Treatment of SMIs and SMM with VMX Operation and SMX Operation
34.14.1 Default Treatment of SMI Delivery
34.14.2 Default Treatment of RSM
34.14.3 Protection of CR4.VMXE in SMM
34.14.4 VMXOFF and SMI Unblocking
34.15 Dual-Monitor Treatment of SMIs and SMM
34.15.1 Dual-Monitor Treatment Overview
34.15.2 SMM VM Exits
34.15.2.1 Architectural State Before a VM Exit
34.15.2.2 Updating the Current-VMCS and Executive-VMCS Pointers
34.15.2.3 Recording VM-Exit Information
34.15.2.4 Saving Guest State
34.15.2.5 Updating State
34.15.3 Operation of the SMM-Transfer Monitor
34.15.4 VM Entries that Return from SMM
34.15.4.1 Checks on the Executive-VMCS Pointer Field
34.15.4.2 Checks on VM-Execution Control Fields
34.15.4.3 Checks on VM-Entry Control Fields
34.15.4.4 Checks on the Guest State Area
34.15.4.5 Loading Guest State
34.15.4.6 VMX-Preemption Timer
34.15.4.7 Updating the Current-VMCS and SMM-Transfer VMCS Pointers
34.15.4.8 VM Exits Induced by VM Entry
34.15.4.9 SMI Blocking
34.15.4.10 Failures of VM Entries That Return from SMM
34.15.5 Enabling the Dual-Monitor Treatment
34.15.6 Activating the Dual-Monitor Treatment
34.15.6.1 Initial Checks
34.15.6.2 Updating the Current-VMCS and Executive-VMCS Pointers
34.15.6.3 Saving Guest State
34.15.6.4 Saving MSRs
34.15.6.5 Loading Host State
34.15.6.6 Loading MSRs
34.15.7 Deactivating the Dual-Monitor Treatment
34.16 SMI and Processor Extended State Management
34.17 Model-Specific System Management Enhancement
34.17.1 SMM Handler Code Access Control
34.17.2 SMI Delivery Delay Reporting
34.17.3 Blocked SMI Reporting
Chapter 35 Intel® Processor Trace
35.1 Overview
35.1.1 Features and Capabilities
35.1.1.1 Packet Summary
35.2 Intel® Processor Trace Operational Model
35.2.1 Change of Flow Instruction (COFI) Tracing
35.2.1.1 Direct Transfer COFI
35.2.1.2 Indirect Transfer COFI
35.2.1.3 Far Transfer COFI
35.2.2 Software Trace Instrumentation with PTWRITE
35.2.3 Power Event Tracing
35.2.4 Trace Filtering
35.2.4.1 Filtering by Current Privilege Level (CPL)
35.2.4.2 Filtering by CR3
35.2.4.3 Filtering by IP
35.2.5 Packet Generation Enable Controls
35.2.5.1 Packet Enable (PacketEn)
35.2.5.2 Trigger Enable (TriggerEn)
35.2.5.3 Context Enable (ContextEn)
35.2.5.4 Branch Enable (BranchEn)
35.2.5.5 Filter Enable (FilterEn)
35.2.6 Trace Output
35.2.6.1 Single Range Output
35.2.6.2 Table of Physical Addresses (ToPA)
Single Output Region ToPA Implementation
ToPA Table Entry Format
ToPA STOP
ToPA PMI
ToPA PMI and Single Output Region ToPA Implementation
ToPA PMI and XSAVES/XRSTORS State Handling
ToPA Errors
35.2.6.3 Trace Transport Subsystem
35.2.6.4 Restricted Memory Access
Modifications to Restricted Memory Regions
35.2.7 Enabling and Configuration MSRs
35.2.7.1 General Considerations
35.2.7.2 IA32_RTIT_CTL MSR
35.2.7.3 Enabling and Disabling Packet Generation with TraceEn
Disabling Packet Generation
Other Writes to IA32_RTIT_CTL
35.2.7.4 IA32_RTIT_STATUS MSR
35.2.7.5 IA32_RTIT_ADDRn_A and IA32_RTIT_ADDRn_B MSRs
35.2.7.6 IA32_RTIT_CR3_MATCH MSR
35.2.7.7 IA32_RTIT_OUTPUT_BASE MSR
35.2.7.8 IA32_RTIT_OUTPUT_MASK_PTRS MSR
35.2.8 Interaction of Intel® Processor Trace and Other Processor Features
35.2.8.1 Intel® Transactional Synchronization Extensions (Intel® TSX)
35.2.8.2 TSX and IP Filtering
35.2.8.3 System Management Mode (SMM)
35.2.8.4 Virtual-Machine Extensions (VMX)
35.2.8.5 Intel® Software Guard Extensions (Intel® SGX)
35.2.8.6 SENTER/ENTERACCS and ACM
35.2.8.7 Intel® Memory Protection Extensions (Intel® MPX)
35.3 Configuration and programming Guideline
35.3.1 Detection of Intel Processor Trace and Capability Enumeration
35.3.1.1 Packet Decoding of RIP versus LIP
35.3.1.2 Model Specific Capability Restrictions
35.3.2 Enabling and Configuration of Trace Packet Generation
35.3.2.1 Enabling Packet Generation
35.3.2.2 Disabling Packet Generation
35.3.3 Flushing Trace Output
35.3.4 Warm Reset
35.3.5 Context Switch Consideration
35.3.5.1 Manual Trace Configuration Context Switch
35.3.5.2 Trace Configuration Context Switch Using XSAVES/XRSTORS
35.3.6 Cycle-Accurate Mode
35.3.6.1 Cycle Counter
35.3.6.2 Cycle Packet Semantics
35.3.6.3 Cycle Thresholds
35.3.7 Decoder Synchronization (PSB+)
35.3.8 Internal Buffer Overflow
35.3.8.1 Overflow Impact on Enables
35.3.8.2 Overflow Impact on Timing Packets
35.3.9 Operational Errors
35.4 Trace Packets and Data Types
35.4.1 Packet Relationships and Ordering
35.4.1.1 Packet Blocks
Decoder Implications
35.4.2 Packet Definitions
35.4.2.1 Taken/Not-taken (TNT) Packet
35.4.2.2 Target IP (TIP) Packet
IP Compression
Indirect Transfer Compression for Returns (RET)
35.4.2.3 Deferred TIPs
35.4.2.4 Packet Generation Enable (TIP.PGE) Packet
35.4.2.5 Packet Generation Disable (TIP.PGD) Packet
35.4.2.6 Flow Update (FUP) Packet
FUP IP Payload
35.4.2.7 Paging Information (PIP) Packet
35.4.2.8 MODE Packets
MODE.Exec Packet
MODE.TSX Packet
35.4.2.9 TraceStop Packet
35.4.2.10 Core:Bus Ratio (CBR) Packet
35.4.2.11 Timestamp Counter (TSC) Packet
35.4.2.12 Mini Time Counter (MTC) Packet
35.4.2.13 TSC/MTC Alignment (TMA) Packet
35.4.2.14 Cycle Count (CYC) Packet
35.4.2.15 VMCS Packet
35.4.2.16 Overflow (OVF) Packet
35.4.2.17 Packet Stream Boundary (PSB) Packet
35.4.2.18 PSBEND Packet
35.4.2.19 Maintenance (MNT) Packet
35.4.2.20 PAD Packet
35.4.2.21 PTWRITE (PTW) Packet
35.4.2.22 Execution Stop (EXSTOP) Packet
35.4.2.23 MWAIT Packet
35.4.2.24 Power Entry (PWRE) Packet
35.4.2.25 Power Exit (PWRX) Packet
35.4.2.26 Block Begin Packet (BBP)
35.4.2.27 Block Item Packet (BIP)
BIP State Value Encodings
35.4.2.28 Block End Packet (BEP)
35.5 Tracing in VMX Operation
35.5.1 VMX-Specific Packets and VMCS Controls
35.5.2 Managing Trace Packet Generation Across VMX Transitions
35.5.2.1 System-Wide Tracing
35.5.2.2 Host-Only Tracing
35.5.2.3 Guest-Only Tracing
35.5.2.4 Virtualization of Guest Output Packet Streams
35.5.2.5 Emulation of Intel PT Traced State
35.5.2.6 TSC Scaling
35.5.2.7 Failed VM Entry
35.5.2.8 VMX Abort
35.6 Tracing and SMM Transfer Monitor (STM)
35.7 Packet Generation Scenarios
35.8 Software Considerations
35.8.1 Tracing SMM Code
35.8.2 Cooperative Transition of Multiple Trace Collection Agents
35.8.3 Tracking Time
35.8.3.1 Time Domain Relationships
35.8.3.2 Estimating TSC within Intel PT
35.8.3.3 VMX TSC Manipulation
35.8.3.4 Calculating Frequency with Intel PT
Chapter 36 Introduction to Intel® Software Guard Extensions
36.1 Overview
36.2 Enclave Interaction and Protection
36.3 Enclave Life Cycle
36.4 Data Structures and Enclave Operation
36.5 Enclave Page Cache
36.5.1 Enclave Page Cache Map (EPCM)
36.6 Enclave Instructions and Intel® SGX
36.7 Discovering Support for Intel® SGX and enabling Enclave Instructions
36.7.1 Intel® SGX Opt-In Configuration
36.7.2 Intel® SGX Resource Enumeration Leaves
Chapter 37 Enclave Access Control and Data Structures
37.1 Overview of Enclave Execution Environment
37.2 Terminology
37.3 Access-control Requirements
37.4 Segment-based Access Control
37.5 Page-based Access Control
37.5.1 Access-control for Accesses that Originate from non-SGX Instructions
37.5.2 Memory Accesses that Split across ELRANGE
37.5.3 Implicit vs. Explicit Accesses
37.5.3.1 Explicit Accesses
37.5.3.2 Implicit Accesses
37.6 Intel® SGX Data Structures Overview
37.7 SGX Enclave Control Structure (SECS)
37.7.1 ATTRIBUTES
37.7.2 SECS.MISCSELECT Field
37.8 Thread Control Structure (TCS)
37.8.1 TCS.FLAGS
37.8.2 State Save Area Offset (OSSA)
37.8.3 Current State Save Area Frame (CSSA)
37.8.4 Number of State Save Area Frames (NSSA)
37.9 State Save Area (SSA) Frame
37.9.1 GPRSGX Region
37.9.1.1 EXITINFO
37.9.1.2 VECTOR Field Definition
37.9.2 MISC Region
37.9.2.1 EXINFO Structure
37.9.2.2 Page Fault Error Code
37.10 Page Information (PAGEINFO)
37.11 Security Information (SECINFO)
37.11.1 SECINFO.FLAGS
37.11.2 PAGE_TYPE Field Definition
37.12 Paging Crypto MetaData (PCMD)
37.13 Enclave Signature Structure (SIGSTRUCT)
37.14 EINIT Token Structure (EINITTOKEN)
37.15 Report (REPORT)
37.15.1 REPORTDATA
37.16 Report Target Info (TARGETINFO)
37.17 Key Request (KEYREQUEST)
37.17.1 KEY REQUEST KeyNames
37.17.2 Key Request Policy Structure
37.18 Version Array (VA)
37.19 Enclave Page Cache Map (EPCM)
37.20 Read Info (RDINFO)
37.20.1 RDINFO Status Structure
37.20.2 RDINFO Flags Structure
Chapter 38 Enclave Operation
38.1 Constructing an Enclave
1. The application hands over the enclave content along with additional information required by the enclave creation API to the enclave creation service running at privilege level 0.
38.1.1 ECREATE
38.1.2 EADD and EEXTEND Interaction
38.1.3 EINIT Interaction
38.1.4 Intel® SGX Launch Control Configuration
38.2 Enclave Entry and Exiting
38.2.1 Controlled Entry and Exit
1. Check that TCS is not busy and flush all cached linear-to-physical mappings.
1. Clear enclave mode and flush all cached linear-to-physical mappings.
38.2.2 Asynchronous Enclave Exit (AEX)
38.2.3 Resuming Execution after AEX
38.2.3.1 ERESUME Interaction
38.3 Calling Enclave Procedures
38.3.1 Calling Convention
38.3.2 Register Preservation
38.3.3 Returning to Caller
38.4 Intel® SGX Key and Attestation
38.4.1 Enclave Measurement and Identification
38.4.1.1 MRENCLAVE
38.4.1.2 MRSIGNER
38.4.1.3 CONFIGID
38.4.2 Security Version Numbers (SVN)
38.4.2.1 Enclave Security Version
38.4.2.2 Hardware Security Version
38.4.2.3 CONFIGID Security Version
38.4.3 Keys
38.4.3.1 Sealing Enclave Data
38.4.3.2 Using REPORTs for Local Attestation
1. The source enclave determines the identity of the target enclave to populate TARGETINFO.
38.5 EPC and Management of EPC Pages
38.5.1 EPC Implementation
38.5.2 OS Management of EPC Pages
38.5.2.1 Enhancement to Managing EPC Pages
38.5.3 Eviction of Enclave Pages
1. For each page to be evicted from the EPC:
a. Select an empty slot in a Version Array (VA) page.
38.5.4 Loading an Enclave Page
1. Execute ELDB/ELDU (depending on the desired BLOCKED state for the page), passing as parameters: the EPC page linear address, the VA slot, the encrypted page, and the page metadata.
38.5.5 Eviction of an SECS Page
1. Ensure all pages are evicted from enclave.
38.5.6 Eviction of a Version Array Page
1. Select a slot in a Version Array page other than the page being evicted.
38.5.7 Allocating a Regular Page
1. Enclave requests additional memory from OS when the current allocation becomes insufficient.
a. EAUG may only be called on a free EPC page.
38.5.8 Allocating a TCS Page
1. Enclave requests an additional page from the OS.
a. EAUG may only be called on a free EPC page.
a. The parameters to EMODT indicate that the regular page should be converted into a TCS.
38.5.9 Trimming a Page
1. Enclave signals OS that a particular page is no longer in use.
a. SECS and VA pages cannot be trimmed in this way, so the initial type of the page must be PT_REG or PT_TCS.
38.5.10 Restricting the EPCM Permissions of a Page
1. Enclave requests that the OS to restrict the permissions of an EPC page.
a. Invokes the EMODPR leaf function to restrict permissions (EMODPR may only be called on VALID pages).
a. Enclave may access the page throughout the entire process.
38.5.11 Extending the EPCM Permissions of a Page
1. Enclave invokes EMODPE to extend the EPCM permissions associated with an EPC page (EMODPE may only be called on VALID pages).
a. If cached linear-address to physical-address translations are present to the more restrictive permissions, the enclave thread will page fault. The SGX2-aware OS will see that the page tables permit the access and resume the thread, which can now s...
38.5.12 VMM Oversubscription of EPC
1. VMM creates data structures for SECS tracking including a count of child pages.
a. ENCLAVECONTEXT field in RDINFO structure will indicate the location of SECS, and the PAGE_TYPE field will indicate the page type.
38.6 Changes to Instruction Behavior Inside an Enclave
38.6.1 Illegal Instructions
38.6.2 RDRAND and RDSEED Instructions
38.6.3 PAUSE Instruction
38.6.4 Executions of INT1 and INT3 Inside an Enclave
38.6.5 INVD Handling when Enclaves Are Enabled
Chapter 39 Enclave Exiting Events
39.1 Compatible Switch to the Exiting Stack of AEX
39.2 State Saving by AEX
39.3 Synthetic State on Asynchronous Enclave Exit
39.3.1 Processor Synthetic State on Asynchronous Enclave Exit
39.3.2 Synthetic State for Extended Features
39.3.3 Synthetic State for MISC Features
39.4 AEX Flow
1. The exact processor state saved into the current SSA frame depends on whether the enclave is a 32-bit or a 64- bit enclave. In 32-bit mode (IA32_EFER.LMA = 0 || CS.L = 0), the low 32 bits of the legacy registers (EAX, EBX, ECX, EDX, ESP, EBP, ESI,...
39.4.1 AEX Operational Detail
Chapter 40 SGX Instruction References
40.1 Intel® SGX Instruction Syntax and Operation
40.1.1 ENCLS Register Usage Summary
40.1.2 ENCLU Register Usage Summary
40.1.3 ENCLV Register Usage Summary
40.1.4 Information and Error Codes
40.1.5 Internal CREGs
40.1.6 Concurrent Operation Restrictions
40.1.6.1 Concurrency Tables of Intel® SGX Instructions
40.2 Intel® SGX Instruction Reference
ENCLS—Execute an Enclave System Function of Specified Leaf Number
ENCLU—Execute an Enclave User Function of Specified Leaf Number
ENCLV—Execute an Enclave VMM Function of Specified Leaf Number
40.3 Intel® SGX System Leaf Function Reference
EADD—Add a Page to an Uninitialized Enclave
EAUG—Add a Page to an Initialized Enclave
EBLOCK—Mark a page in EPC as Blocked
ECREATE—Create an SECS page in the Enclave Page Cache
EDBGRD—Read From a Debug Enclave
EDBGWR—Write to a Debug Enclave
EEXTEND—Extend Uninitialized Enclave Measurement by 256 Bytes
EINIT—Initialize an Enclave for Execution
ELDB/ELDU/ELDBC/ELBUC—Load an EPC Page and Mark its State
EMODPR—Restrict the Permissions of an EPC Page
EMODT—Change the Type of an EPC Page
EPA—Add Version Array
ERDINFO—Read Type and Status Information About an EPC Page
EREMOVE—Remove a page from the EPC
ETRACK—Activates EBLOCK Checks
ETRACKC—Activates EBLOCK Checks
EWB—Invalidate an EPC Page and Write out to Main Memory
40.4 Intel® SGX User Leaf Function Reference
EACCEPT—Accept Changes to an EPC Page
EACCEPTCOPY—Initialize a Pending Page
EENTER—Enters an Enclave
EEXIT—Exits an Enclave
EGETKEY—Retrieves a Cryptographic Key
EMODPE—Extend an EPC Page Permissions
EREPORT—Create a Cryptographic Report of the Enclave
ERESUME—Re-Enters an Enclave
40.5 Intel® SGX VIRTUALIZATION Leaf Function Reference
EDECVIRTCHILD—Decrement VIRTCHILDCNT in SECS
EINCVIRTCHILD—Increment VIRTCHILDCNT in SECS
ESETCONTEXT—Set the ENCLAVECONTEXT Field in SECS
Chapter 41 Intel® SGX Interactions with IA32 and Intel® 64 Architecture
41.1 Intel® SGX Availability in Various Processor Modes
41.2 IA32_FEATURE_CONTROL
41.2.1 Availability of Intel SGX
41.2.2 Intel SGX Launch Control Configuration
41.3 Interactions with Segmentation
41.3.1 Scope of Interaction
41.3.2 Interactions of Intel® SGX Instructions with Segment, Operand, and Addressing Prefixes
41.3.3 Interaction of Intel® SGX Instructions with Segmentation
41.3.4 Interactions of Enclave Execution with Segmentation
41.4 Interactions with Paging
41.5 Interactions with VMX
41.5.1 VMM Controls to Configure Guest Support of Intel® SGX
41.5.2 Interactions with the Extended Page Table Mechanism (EPT)
41.5.3 Interactions with APIC Virtualization
41.5.4 Interactions with VT and SGX concurrency
41.5.5 Virtual Child Tracking
41.5.6 Handling EPCM Entry Lock Conflicts
41.5.7 Context Tracking
41.6 Intel® SGX Interactions with Architecturally-visible Events
41.7 Interactions with the Processor Extended State and Miscellaneous State
41.7.1 Requirements and Architecture Overview
41.7.2 Relevant Fields in Various Data Structures
41.7.2.1 SECS.ATTRIBUTES.XFRM
41.7.2.2 SECS.SSAFRAMESIZE
41.7.2.3 XSAVE Area in SSA
41.7.2.4 MISC Area in SSA
41.7.2.5 SIGSTRUCT Fields
41.7.2.6 REPORT.ATTRIBUTES.XFRM and REPORT.MISCSELECT
41.7.2.7 KEYREQUEST
41.7.3 Processor Extended States and ENCLS[ECREATE]
41.7.4 Processor Extended States and ENCLU[EENTER]
41.7.4.1 Fault Checking
41.7.4.2 State Loading
41.7.5 Processor Extended States and AEX
41.7.5.1 State Saving
41.7.5.2 State Synthesis
41.7.6 Processor Extended States and ENCLU[ERESUME]
41.7.6.1 Fault Checking
41.7.6.2 State Loading
41.7.7 Processor Extended States and ENCLU[EEXIT]
41.7.8 Processor Extended States and ENCLU[EREPORT]
41.7.9 Processor Extended States and ENCLU[EGETKEY]
41.8 Interactions with SMM
41.8.1 Availability of Intel® SGX instructions in SMM
41.8.2 SMI while Inside an Enclave
41.8.3 SMRAM Synthetic State of AEX Triggered by SMI
41.9 Interactions of INIT, SIPI, and Wait-for-SIPI with Intel® SGX
41.10 Interactions with DMA
41.11 Interactions with TXT
41.11.1 Enclaves Created Prior to Execution of GETSEC
41.11.2 Interaction of GETSEC with Intel® SGX
41.11.3 Interactions with Authenticated Code Modules (ACMs)
41.12 Interactions with Caching of Linear-address Translations
41.13 Interactions with Intel® Transactional Synchronization Extensions (Intel® TSX)
41.13.1 HLE and RTM Debug
41.14 Intel® SGX Interactions with S states
41.15 Intel® SGX Interactions with Machine Check Architecture (MCA)
41.15.1 Interactions with MCA Events
41.15.2 Machine Check Enables (IA32_MCi_CTL)
41.15.3 CR4.MCE
41.16 Intel® SGX INTERACTIONS WITH PROTECTED MODE VIRTUAL INTERRUPTS
41.17 Intel SGX Interaction with Protection Keys
Chapter 42 Enclave Code Debug and Profiling
42.1 Configuration and Controls
42.1.1 Debug Enclave vs. Production Enclave
42.1.2 Tool-Chain Opt-in
42.2 Single Step Debug
42.2.1 Single Stepping ENCLS Instruction Leafs
42.2.2 Single Stepping ENCLU Instruction Leafs
42.2.3 Single-Stepping Enclave Entry with Opt-out Entry
42.2.3.1 Single Stepping without AEX
42.2.3.2 Single Step Preempted by AEX Due to Non-SMI Event
42.2.4 RFLAGS.TF Treatment on AEX
42.2.5 Restriction on Setting of TF after an Opt-Out Entry
42.2.6 Trampoline Code Considerations
42.3 Code and Data Breakpoints
42.3.1 Breakpoint Suppression
42.3.2 Reporting of Instruction Breakpoint on Next Instruction on a Debug Trap
42.3.3 RF Treatment on AEX
42.3.4 Breakpoint Matching in Intel® SGX Instruction Flows
42.4 Consideration of the INT1 and INT3 Instructions
42.4.1 Behavior of INT1 and INT3 Inside an Enclave
42.4.2 Debugger Considerations
42.4.3 VMM Considerations
42.5 Branch Tracing
42.5.1 BTF Treatment
42.5.2 LBR Treatment
42.5.2.1 LBR Stack on Opt-in Entry
42.5.2.2 LBR Stack on Opt-out Entry
42.5.2.3 Mispredict Bit, Record Type, and Filtering
42.6 Interaction with Performance Monitoring
42.6.1 IA32_PERF_GLOBAL_STATUS Enhancement
42.6.2 Performance Monitoring with Opt-in Entry
42.6.3 Performance Monitoring with Opt-out Entry
42.6.4 Enclave Exit and Performance Monitoring
42.6.5 PEBS Record Generation on Intel® SGX Instructions
42.6.6 Exception-Handling on PEBS/BTS Loads/Stores after AEX
42.6.6.1 Other Interactions with Performance Monitoring
Appendix A VMX Capability Reporting Facility
A.1 Basic VMX Information
A.2 Reserved Controls and Default Settings
A.3 VM-Execution Controls
A.3.1 Pin-Based VM-Execution Controls
A.3.2 Primary Processor-Based VM-Execution Controls
A.3.3 Secondary Processor-Based VM-Execution Controls
A.4 VM-Exit Controls
A.5 VM-Entry Controls
A.6 Miscellaneous Data
A.7 VMX-Fixed Bits in CR0
A.8 VMX-Fixed Bits in CR4
A.9 VMCS Enumeration
A.10 VPID and EPT Capabilities
A.11 VM Functions
Appendix B Field Encoding in VMCS
B.1 16-Bit Fields
B.1.1 16-Bit Control Fields
B.1.2 16-Bit Guest-State Fields
B.1.3 16-Bit Host-State Fields
B.2 64-Bit Fields
B.2.1 64-Bit Control Fields
B.2.2 64-Bit Read-Only Data Field
B.2.3 64-Bit Guest-State Fields
B.2.4 64-Bit Host-State Fields
B.3 32-Bit Fields
B.3.1 32-Bit Control Fields
B.3.2 32-Bit Read-Only Data Fields
B.3.3 32-Bit Guest-State Fields
B.3.4 32-Bit Host-State Field
B.4 Natural-Width Fields
B.4.1 Natural-Width Control Fields
B.4.2 Natural-Width Read-Only Data Fields
B.4.3 Natural-Width Guest-State Fields
B.4.4 Natural-Width Host-State Fields
Appendix C VMX Basic Exit Reasons
Volume 4:Model-Specific Registers
Chapter 1 About This Manual
1.1 Intel® 64 and IA-32 Processors Covered in this Manual
1.2 Overview of The SYSTEM PROGRAMMING GUIDE
1.3 Notational Conventions
1.3.1 Bit and Byte Order
1.3.2 Reserved Bits and Software Compatibility
1.3.3 Instruction Operands
1.3.4 Hexadecimal and Binary Numbers
1.3.5 Segmented Addressing
1.3.6 Syntax for CPUID, CR, and MSR Values
1.3.7 Exceptions
1.4 Related Literature
Chapter 2 Model-Specific Registers (MSRs)
2.1 Architectural MSRs
2.2 MSRs In the Intel® Core™ 2 Processor Family
2.3 MSRs In the 45 nm and 32 nm Intel® Atom™ Processor Family
2.4 MSRs In Intel Processors Based on Silvermont Microarchitecture
2.4.1 MSRs with Model-Specific Behavior in the Silvermont Microarchitecture
2.4.2 MSRs In Intel Atom Processors Based on Airmont Microarchitecture
2.5 MSRs In Intel Atom Processors based on Goldmont Microarchitecture
2.6 MSRs In Intel Atom Processors Based on Goldmont Plus Microarchitecture
2.7 MSRs In Intel Atom Processors Based on Tremont Microarchitecture
2.8 MSRs In the Intel® Microarchitecture Code Name Nehalem
2.8.1 Additional MSRs in the Intel® Xeon® Processor 5500 and 3400 Series
2.8.2 Additional MSRs in the Intel® Xeon® Processor 7500 Series
2.9 MSRs In the Intel® Xeon® Processor 5600 Series (Based on Intel® Microarchitecture Code Name Westmere)
2.10 MSRs In the Intel® Xeon® Processor E7 Family (Based on Intel® Microarchitecture Code Name Westmere)
2.11 MSRs In Intel® Processor Family Based on Intel® Microarchitecture Code Name Sandy Bridge
2.11.1 MSRs In 2nd Generation Intel® Core™ Processor Family (Based on Intel® Microarchitecture Code Name Sandy Bridge)
2.11.2 MSRs In Intel® Xeon® Processor E5 Family (Based on Intel® Microarchitecture Code Name Sandy Bridge)
2.11.3 Additional Uncore PMU MSRs in the Intel® Xeon® Processor E5 Family
2.12 MSRs In the 3rd Generation Intel® Core™ Processor Family (Based on Intel® microarchitecture code name Ivy Bridge)
2.12.1 MSRs In Intel® Xeon® Processor E5 v2 Product Family (Based on Ivy Bridge-E Microarchitecture)
2.12.2 Additional MSRs Supported by Intel® Xeon® Processor E7 v2 Family
2.12.3 Additional Uncore PMU MSRs in the Intel® Xeon® Processor E5 v2 and E7 v2 Families
2.13 MSRs In the 4th Generation Intel® Core™ Processors (Based on Haswell Microarchitecture)
2.13.1 MSRs in 4th Generation Intel® Core™ Processor Family (based on Haswell Microarchitecture)
2.13.2 Additional Residency MSRs Supported in 4th Generation Intel® Core™ Processors
2.14 MSRs In Intel® Xeon® Processor E5 v3 and E7 v3 Product Family
2.14.1 Additional Uncore PMU MSRs in the Intel® Xeon® Processor E5 v3 Family
2.15 MSRs In Intel® Core™ M Processors and 5th Generation Intel Core Processors
2.16 MSRs In Intel® Xeon® Processors E5 v4 Family
2.16.1 Additional MSRs Supported in the Intel® Xeon® Processor D Product Family
2.16.2 Additional MSRs Supported in Intel® Xeon® Processors E5 v4 and E7 v4 Families
2.17 MSRs In the 6th Generation, 7th Generation, 8th Generation, and 9th Generation Intel® Core™ Processors, Intel® Xeon® Processor Scalable Family, 8th Generation Intel® Core™ i3 Processors, and Intel® Xeon® E processors
2.17.1 MSRs Specific to 7th Generation and 8th Generation Intel® Core™ Processors based on Kaby Lake Microarchitecture and Coffee Lake Microarchitecture
2.17.2 MSRs Specific to 8th Generation Intel® Core™ i3 Processors
2.17.3 MSRs Specific to Intel® Xeon® Processor Scalable Family
2.18 MSRs In Intel® Xeon Phi™ Processor 3200/5200/7200 Series and Intel® Xeon Phi™ Processor 7215/7285/7295 Series
2.19 MSRs In the Pentium® 4 and Intel® Xeon® Processors
2.19.1 MSRs Unique to Intel® Xeon® Processor MP with L3 Cache
2.20 MSRs In Intel® Core™ Solo and Intel® Core™ Duo Processors
2.21 MSRs In the Pentium M Processor
2.22 MSRs In the P6 Family Processors
2.23 MSRs in Pentium Processors
2.24 MSR Index

📜 SIMILAR VOLUMES

Intel® 64 and IA-32 Architectures Softwa

📁 Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1: Basic Architecture

✍ Intel Corporation 📂 Library 📅 2006 🌐 English

The Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 1, describes the basic architecture and programming environment of an Intel 64 and IA-32 processor.

IA-32 Intel® Architecture Software Devel

📁 IA-32 Intel® Architecture Software Developer’s Manual. Volume 1: Basic Architecture

✍ Intel Corporation 📂 Library 📅 2003 🌐 English

IA-32 Intel® Architecture Software Devel

📁 IA-32 Intel® Architecture Software Developer’s Manual, Volume 1: Basic Architecture

✍ Intel Corporation 📂 Library 📅 2004 🌐 English

The IA-32 Intel Architecture Software Developer's Manual, Volume 1, describes the basic architecture and programming environment of an IA-32 processor.

IA-32 Intel® Architecture Software Devel

📁 IA-32 Intel® Architecture Software Developer’s Manual. Volume 1: Basic Architecture

✍ Intel Corporation 📂 Library 📅 2002 🌐 English

Intel® 64 and IA-32 Architectures Softwa

📁 Intel® 64 and IA-32 Architectures Software Developer’s Manual

✍ coll. 📂 Library 📅 2020 🌐 English

Intel® 64 and IA-32 Architectures Softwa

📁 Intel® 64 and IA-32 Architectures Software Developer’s Manual

✍ Intel Corporation 📂 Library 📅 2017 🌐 English

This document contains the following: Volume 1: Describes the architecture and programming environment of processors supporting IA-32 and Intel® 64 architectures. Volume 2: Includes the full instruction set reference, A-Z. Describes the format of the instruction and provides reference pages fo