The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers

Categories:

Recommended

Introduction

About this manual

This is the third in a series of five manuals:

  1. Optimizing software in C++: An optimization guide for WindowsLinux, and Mac platforms.
  2. Optimizing subroutines in assembly language: An optimization guide for x86 platforms.
  3. The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers.
  4. Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs.
  5. Calling conventions for different C++ compilers and operating systems.

The latest versions of these manuals are always available from www.agner.org/optimize. Copyright conditions are listed on page 271 below.

The present manual describes the details of the microarchitectures of x86 microprocessors from Intel, AMD, and VIA. The Itanium processor is not covered. The purpose of this manual is to enable assembly programmers and compiler makers to optimize software for these microprocessors. The main focus is on details that are relevant to calculations of how much time a piece of code takes to execute, such as the latencies of different execution units and the throughputs of various parts of the pipelines. Branch prediction algorithms are also covered.

This manual will also be interesting to students of microarchitecture. But it must be noted that the technical descriptions are mostly based on my own research, which is limited to what is measurable. The descriptions of the “mechanics” of the pipelines are therefore limited to what can be measured by counting clock cycles or micro-operations (µops) and what can be deduced from these measurements and from documents from the vendors. Mechanistic explanations in this manual should be regarded as a model which is useful for predicting microprocessor behavior. I have no way of knowing with certainty whether it is in accordance with the actual physical structure of the microprocessors. The main purpose of providing this information is to enable programmers and compiler makers to optimize their code.

On the other hand, my method of deducing information from measurements rather than relying on information published by microprocessor vendors provides a lot of new information that cannot be found anywhere else. Technical details published by microprocessor vendors is often superficial, incomplete, selective, and sometimes misleading.

My findings are sometimes in disagreement with data published by microprocessor vendors. Reasons for this discrepancy might be that such data are theoretical while my data are obtained experimentally under a particular set of testing conditions. I do not claim that all information in this manual is exact. Some timings etc. can be difficult or impossible to measure exactly, and I do not have access to the inside information on technical implementations that microprocessor vendors base their technical manuals on.

The tests are done mostly in 32-bit and 64-bit protected mode. Most timing results are independent of the processor mode. Important differences are noted where appropriate. Far jumps, far calls and interrupts have mostly been tested in 16-bit mode for older processors. Call gates etc. have not been tested. The detailed timing results are listed in manual 4: “Instruction tables”.

Most of the information in this manual is based on my own research. Many people have sent me useful information and corrections, which I am very thankful for. I keep updating the manual whenever I have new important information. This manual is therefore more detailed, comprehensive and exact than other sources of information, and it contains many details not found anywhere else.

This manual is not for beginners. It is assumed that the reader has a good understanding of assembly programming and microprocessor architecture. If not, then please read some books on the subject and get some programming experience before you begin doing complicated optimizations. See the literature list in manual 2: “Optimizing subroutines in assembly language” or follow the links from www.agner.org/optimize.

The reader may skip chapters describing old microprocessor designs unless you are using these processors in embedded systems or you are interested in historical developments in microarchitecture.

Please do not send your programming questions to me. I am not doing your homework for you! There are various discussion forums on the Internet where you can get answers to your programming questions if you cannot find the answers in the relevant books and manuals.

Attribution

Agner Fog. The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers. https://www.agner.org/optimize/

VP Flipbook Maker

This flipbook was made with the free flipbook maker from Visual Paradigm Online, you can also develop a book like this. Create online flipbooks, design, publish and share your flipbooks online, try it now!