|
| 1 | + The MSI Driver Guide HOWTO |
| 2 | + |
| 3 | + 10/03/2003 |
| 4 | + |
| 5 | +1. About this guide |
| 6 | + |
| 7 | +This guide describes the basics of Message Signaled Interrupts(MSI), the |
| 8 | +advantages of using MSI over traditional interrupt mechanisms, and how |
| 9 | +to enable your driver to use MSI or MSI-X. Also included is a Frequently |
| 10 | +Asked Questions. |
| 11 | + |
| 12 | +2. Copyright 2003 Intel Corporation |
| 13 | + |
| 14 | +3. What is MSI/MSI-X? |
| 15 | + |
| 16 | +Message Signaled Interrupt (MSI), as described in the PCI Local Bus |
| 17 | +Specification Revision 2.3 or latest, is an optional feature, and a |
| 18 | +required feature for PCI Express devices. MSI enables a device function |
| 19 | +to request service by sending an Inbound Memory Write on its PCI bus to |
| 20 | +the FSB as a Message Signal Interrupt transaction. Because MSI is |
| 21 | +generated in the form of a Memory Write, all transaction conditions, |
| 22 | +such as a Retry, Master-Abort, Target-Abort or normal completion, are |
| 23 | +supported. |
| 24 | + |
| 25 | +A PCI device that supports MSI must also support pin IRQ assertion |
| 26 | +interrupt mechanism to provide backward compatibility for systems that |
| 27 | +do not support MSI. In Systems, which support MSI, the bus driver is |
| 28 | +responsible for initializing the message address and message data of |
| 29 | +the device function's MSI/MSI-X capability structure during device |
| 30 | +initial configuration. |
| 31 | + |
| 32 | +An MSI capable device function indicates MSI support by implementing |
| 33 | +the MSI/MSI-X capability structure in its PCI capability list. The |
| 34 | +device function may implement both the MSI capability structure and |
| 35 | +the MSI-X capability structure; however, the bus driver should not |
| 36 | +enable both, but instead enable only the MSI-X capability structure. |
| 37 | + |
| 38 | +The MSI capability structure contains Message Control register, |
| 39 | +Message Address register and Message Data register. These registers |
| 40 | +provide the bus driver control over MSI. The Message Control register |
| 41 | +indicates the MSI capability supported by the device. The Message |
| 42 | +Address register specifies the target address and the Message Data |
| 43 | +register specifies the characteristics of the message. To request |
| 44 | +service, the device function writes the content of the Message Data |
| 45 | +register to the target address. The device and its software driver |
| 46 | +are prohibited from writing to these registers. |
| 47 | + |
| 48 | +The MSI-X capability structure is an optional extension to MSI. It |
| 49 | +uses an independent and separate capability structure. There are |
| 50 | +some key advantages to implementing the MSI-X capability structure |
| 51 | +over the MSI capability structure as described below. |
| 52 | + |
| 53 | + - Support a larger maximum number of vectors per function. |
| 54 | + |
| 55 | + - Provide the ability for system software to configure |
| 56 | + each vector with an independent message address and message |
| 57 | + data, specified by a table that resides in Memory Space. |
| 58 | + |
| 59 | + - MSI and MSI-X both support per-vector masking. Per-vector |
| 60 | + masking is an optional extension of MSI but a required |
| 61 | + feature for MSI-X. Per-vector masking provides the kernel |
| 62 | + the ability to mask/unmask MSI when servicing its software |
| 63 | + interrupt service routing handler. If per-vector masking is |
| 64 | + not supported, then the device driver should provide the |
| 65 | + hardware/software synchronization to ensure that the device |
| 66 | + generates MSI when the driver wants it to do so. |
| 67 | + |
| 68 | +4. Why use MSI? |
| 69 | + |
| 70 | +As a benefit the simplification of board design, MSI allows board |
| 71 | +designers to remove out of band interrupt routing. MSI is another |
| 72 | +step towards a legacy-free environment. |
| 73 | + |
| 74 | +Due to increasing pressure on chipset and processor packages to |
| 75 | +reduce pin count, the need for interrupt pins is expected to |
| 76 | +diminish over time. Devices, due to pin constraints, may implement |
| 77 | +messages to increase performance. |
| 78 | + |
| 79 | +PCI Express endpoints uses INTx emulation (in-band messages) instead |
| 80 | +of IRQ pin assertion. Using INTx emulation requires interrupt |
| 81 | +sharing among devices connected to the same node (PCI bridge) while |
| 82 | +MSI is unique (non-shared) and does not require BIOS configuration |
| 83 | +support. As a result, the PCI Express technology requires MSI |
| 84 | +support for better interrupt performance. |
| 85 | + |
| 86 | +Using MSI enables the device functions to support two or more |
| 87 | +vectors, which can be configure to target different CPU's to |
| 88 | +increase scalability. |
| 89 | + |
| 90 | +5. Configuring a driver to use MSI/MSI-X |
| 91 | + |
| 92 | +By default, the kernel will not enable MSI/MSI-X on all devices that |
| 93 | +support this capability once the patch is installed. A kernel |
| 94 | +configuration option must be selected to enable MSI/MSI-X support. |
| 95 | + |
| 96 | +5.1 Including MSI support into the kernel |
| 97 | + |
| 98 | +To include MSI support into the kernel requires users to patch the |
| 99 | +VECTOR-base patch first and then the MSI patch because the MSI |
| 100 | +support needs VECTOR based scheme. Once these patches are installed, |
| 101 | +setting CONFIG_PCI_USE_VECTOR enables the VECTOR based scheme and |
| 102 | +the option for MSI-capable device drivers to selectively enable MSI |
| 103 | +(using pci_enable_msi as desribed below). |
| 104 | + |
| 105 | +Since the target of the inbound message is the local APIC, providing |
| 106 | +CONFIG_PCI_USE_VECTOR is dependent on whether CONFIG_X86_LOCAL_APIC |
| 107 | +is enabled or not. |
| 108 | + |
| 109 | +int pci_enable_msi(struct pci_dev *) |
| 110 | + |
| 111 | +With this new API, any existing device driver, which like to have |
| 112 | +MSI enabled on its device function, must call this explicitly. A |
| 113 | +successful call will initialize the MSI/MSI-X capability structure |
| 114 | +with ONE vector, regardless of whether the device function is |
| 115 | +capable of supporting multiple messages. This vector replaces the |
| 116 | +pre-assigned dev->irq with a new MSI vector. To avoid the conflict |
| 117 | +of new assigned vector with existing pre-assigned vector requires |
| 118 | +the device driver to call this API before calling request_irq(...). |
| 119 | + |
| 120 | +The below diagram shows the events, which switches the interrupt |
| 121 | +mode on the MSI-capable device function between MSI mode and |
| 122 | +PIN-IRQ assertion mode. |
| 123 | + |
| 124 | + ------------ pci_enable_msi ------------------------ |
| 125 | + | | <=============== | | |
| 126 | + | MSI MODE | | PIN-IRQ ASSERTION MODE | |
| 127 | + | | ===============> | | |
| 128 | + ------------ free_irq ------------------------ |
| 129 | + |
| 130 | +5.2 Configuring for MSI support |
| 131 | + |
| 132 | +Due to the non-contiguous fashion in vector assignment of the |
| 133 | +existing Linux kernel, this patch does not support multiple |
| 134 | +messages regardless of the device function is capable of supporting |
| 135 | +more than one vector. The bus driver initializes only entry 0 of |
| 136 | +this capability if pci_enable_msi(...) is called successfully by |
| 137 | +the device driver. |
| 138 | + |
| 139 | +5.3 Configuring for MSI-X support |
| 140 | + |
| 141 | +Both the MSI capability structure and the MSI-X capability structure |
| 142 | +share the same above semantics; however, due to the ability of the |
| 143 | +system software to configure each vector of the MSI-X capability |
| 144 | +structure with an independent message address and message data, the |
| 145 | +non-contiguous fashion in vector assignment of the existing Linux |
| 146 | +kernel has no impact on supporting multiple messages on an MSI-X |
| 147 | +capable device functions. By default, as mentioned above, ONE vector |
| 148 | +should be always allocated to the MSI-X capability structure at |
| 149 | +entry 0. The bus driver does not initialize other entries of the |
| 150 | +MSI-X table. |
| 151 | + |
| 152 | +Note that the PCI subsystem should have full control of a MSI-X |
| 153 | +table that resides in Memory Space. The software device driver |
| 154 | +should not access this table. |
| 155 | + |
| 156 | +To request for additional vectors, the device software driver should |
| 157 | +call function msi_alloc_vectors(). It is recommended that the |
| 158 | +software driver should call this function once during the |
| 159 | +initialization phase of the device driver. |
| 160 | + |
| 161 | +The function msi_alloc_vectors(), once invoked, enables either |
| 162 | +all or nothing, depending on the current availability of vector |
| 163 | +resources. If no vector resources are available, the device function |
| 164 | +still works with ONE vector. If the vector resources are available |
| 165 | +for the number of vectors requested by the driver, this function |
| 166 | +will reconfigure the MSI-X capability structure of the device with |
| 167 | +additional messages, starting from entry 1. To emphasize this |
| 168 | +reason, for example, the device may be capable for supporting the |
| 169 | +maximum of 32 vectors while its software driver usually may request |
| 170 | +4 vectors. |
| 171 | + |
| 172 | +For each vector, after this successful call, the device driver is |
| 173 | +responsible to call other functions like request_irq(), enable_irq(), |
| 174 | +etc. to enable this vector with its corresponding interrupt service |
| 175 | +handler. It is the device driver's choice to have all vectors shared |
| 176 | +the same interrupt service handler or each vector with a unique |
| 177 | +interrupt service handler. |
| 178 | + |
| 179 | +In addition to the function msi_alloc_vectors(), another function |
| 180 | +msi_free_vectors() is provided to allow the software driver to |
| 181 | +release a number of vectors back to the vector resources. Once |
| 182 | +invoked, the PCI subsystem disables (masks) each vector released. |
| 183 | +These vectors are no longer valid for the hardware device and its |
| 184 | +software driver to use. Like free_irq, it recommends that the |
| 185 | +device driver should also call msi_free_vectors to release all |
| 186 | +additional vectors previously requested. |
| 187 | + |
| 188 | +int msi_alloc_vectors(struct pci_dev *dev, int *vector, int nvec) |
| 189 | + |
| 190 | +This API enables the software driver to request the PCI subsystem |
| 191 | +for additional messages. Depending on the number of vectors |
| 192 | +available, the PCI subsystem enables either all or nothing. |
| 193 | + |
| 194 | +Argument dev points to the device (pci_dev) structure. |
| 195 | +Argument vector is a pointer of integer type. The number of |
| 196 | +elements is indicated in argument nvec. |
| 197 | +Argument nvec is an integer indicating the number of messages |
| 198 | +requested. |
| 199 | +A return of zero indicates that the number of allocated vector is |
| 200 | +successfully allocated. Otherwise, indicate resources not |
| 201 | +available. |
| 202 | + |
| 203 | +int msi_free_vectors(struct pci_dev* dev, int *vector, int nvec) |
| 204 | + |
| 205 | +This API enables the software driver to inform the PCI subsystem |
| 206 | +that it is willing to release a number of vectors back to the |
| 207 | +MSI resource pool. Once invoked, the PCI subsystem disables each |
| 208 | +MSI-X entry associated with each vector stored in the argument 2. |
| 209 | +These vectors are no longer valid for the hardware device and |
| 210 | +its software driver to use. |
| 211 | + |
| 212 | +Argument dev points to the device (pci_dev) structure. |
| 213 | +Argument vector is a pointer of integer type. The number of |
| 214 | +elements is indicated in argument nvec. |
| 215 | +Argument nvec is an integer indicating the number of messages |
| 216 | +released. |
| 217 | +A return of zero indicates that the number of allocated vectors |
| 218 | +is successfully released. Otherwise, indicates a failure. |
| 219 | + |
| 220 | +5.4 Hardware requirements for MSI support |
| 221 | +MSI support requires support from both system hardware and |
| 222 | +individual hardware device functions. |
| 223 | + |
| 224 | +5.4.1 System hardware support |
| 225 | +Since the target of MSI address is the local APIC CPU, enabling |
| 226 | +MSI support in Linux kernel is dependent on whether existing |
| 227 | +system hardware supports local APIC. Users should verify their |
| 228 | +system whether it runs when CONFIG_X86_LOCAL_APIC=y. |
| 229 | + |
| 230 | +In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set; |
| 231 | +however, in UP environment, users must manually set |
| 232 | +CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting |
| 233 | +CONFIG_PCI_USE_VECTOR enables the VECTOR based scheme and |
| 234 | +the option for MSI-capable device drivers to selectively enable |
| 235 | +MSI (using pci_enable_msi as desribed below). |
| 236 | + |
| 237 | +Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI |
| 238 | +vector is allocated new during runtime and MSI support does not |
| 239 | +depend on BIOS support. This key independency enables MSI support |
| 240 | +on future IOxAPIC free platform. |
| 241 | + |
| 242 | +5.4.2 Device hardware support |
| 243 | +The hardware device function supports MSI by indicating the |
| 244 | +MSI/MSI-X capability structure on its PCI capability list. By |
| 245 | +default, this capability structure will not be initialized by |
| 246 | +the kernel to enable MSI during the system boot. In other words, |
| 247 | +the device function is running on its default pin assertion mode. |
| 248 | +Note that in many cases the hardware supporting MSI have bugs, |
| 249 | +which may result in system hang. The software driver of specific |
| 250 | +MSI-capable hardware is responsible for whether calling |
| 251 | +pci_enable_msi or not. A return of zero indicates the kernel |
| 252 | +successfully initializes the MSI/MSI-X capability structure of the |
| 253 | +device funtion. The device function is now running on MSI mode. |
| 254 | + |
| 255 | +5.5 How to tell whether MSI is enabled on device function |
| 256 | + |
| 257 | +At the driver level, a return of zero from pci_enable_msi(...) |
| 258 | +indicates to the device driver that its device function is |
| 259 | +initialized successfully and ready to run in MSI mode. |
| 260 | + |
| 261 | +At the user level, users can use command 'cat /proc/interrupts' |
| 262 | +to display the vector allocated for the device and its interrupt |
| 263 | +mode, as shown below. |
| 264 | + |
| 265 | + CPU0 CPU1 |
| 266 | + 0: 324639 0 IO-APIC-edge timer |
| 267 | + 1: 1186 0 IO-APIC-edge i8042 |
| 268 | + 2: 0 0 XT-PIC cascade |
| 269 | + 12: 2797 0 IO-APIC-edge i8042 |
| 270 | + 14: 6543 0 IO-APIC-edge ide0 |
| 271 | + 15: 1 0 IO-APIC-edge ide1 |
| 272 | +169: 0 0 IO-APIC-level uhci-hcd |
| 273 | +185: 0 0 IO-APIC-level uhci-hcd |
| 274 | +193: 138 10 PCI MSI aic79xx |
| 275 | +201: 30 0 PCI MSI aic79xx |
| 276 | +225: 30 0 IO-APIC-level aic7xxx |
| 277 | +233: 30 0 IO-APIC-level aic7xxx |
| 278 | +NMI: 0 0 |
| 279 | +LOC: 324553 325068 |
| 280 | +ERR: 0 |
| 281 | +MIS: 0 |
| 282 | + |
| 283 | +6. FAQ |
| 284 | + |
| 285 | +Q1. Are there any limitations on using the MSI? |
| 286 | + |
| 287 | +A1. If the PCI device supports MSI and conforms to the |
| 288 | +specification and the platform supports the APIC local bus, |
| 289 | +then using MSI should work. |
| 290 | + |
| 291 | +Q2. Will it work on all the Pentium processors (P3, P4, Xeon, |
| 292 | +AMD processors)? In P3 IPI's are transmitted on the APIC local |
| 293 | +bus and in P4 and Xeon they are transmitted on the system |
| 294 | +bus. Are there any implications with this? |
| 295 | + |
| 296 | +A2. MSI support enables a PCI device sending an inbound |
| 297 | +memory write (0xfeexxxxx as target address) on its PCI bus |
| 298 | +directly to the FSB. Since the message address has a |
| 299 | +redirection hint bit cleared, it should work. |
| 300 | + |
| 301 | +Q3. The target address 0xfeexxxxx will be translated by the |
| 302 | +Host Bridge into an interrupt message. Are there any |
| 303 | +limitations on the chipsets such as Intel 8xx, Intel e7xxx, |
| 304 | +or VIA? |
| 305 | + |
| 306 | +A3. If these chipsets support an inbound memory write with |
| 307 | +target address set as 0xfeexxxxx, as conformed to PCI |
| 308 | +specification 2.3 or latest, then it should work. |
| 309 | + |
| 310 | +Q4. From the driver point of view, if the MSI is lost because |
| 311 | +of the errors occur during inbound memory write, then it may |
| 312 | +wait for ever. Is there a mechanism for it to recover? |
| 313 | + |
| 314 | +A4. Since the target of the transaction is an inbound memory |
| 315 | +write, all transaction termination conditions (Retry, |
| 316 | +Master-Abort, Target-Abort, or normal completion) are |
| 317 | +supported. A device sending an MSI must abide by all the PCI |
| 318 | +rules and conditions regarding that inbound memory write. So, |
| 319 | +if a retry is signaled it must retry, etc... We believe that |
| 320 | +the recommendation for Abort is also a retry (refer to PCI |
| 321 | +specification 2.3 or latest). |
0 commit comments