A PXE network card initialization method compatible with 64-bit addressing, device and medium
By constructing a dual-path mechanism of MMIO and PCI configuration spaces during PXE network card initialization, the compatibility issue of network card initialization under 64-bit hardware architecture is resolved. This enables flexible adaptation and robust error handling for network cards from different manufacturers, ensuring the integrity and stability of the network boot process.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANGHAI YUNMAI XINLIAN TECH CO LTD
- Filing Date
- 2026-03-30
- Publication Date
- 2026-06-26
AI Technical Summary
Traditional PXE implementations cannot directly access Option ROM and MMIO resources on 64-bit hardware architectures, leading to boot failures. Furthermore, hardware differences between network cards from different manufacturers result in complex driver adaptation, a lack of a unified access mechanism, and insufficient systematic error handling, which limits functional expansion and compatibility.
By constructing a dual-path mechanism of Memory Mapped Input/Output (MMIO) and PCI configuration space access, the initialization code is allowed to switch to a custom capability structure in the PCI configuration space to perform read and write operations on hardware registers when MMIO fails. This provides a standardized alternative access channel and ensures the continued execution of the hardware initialization process.
It improves the robustness and hardware compatibility of network card initialization, solves the incompatibility problem between 64-bit BAR and 32-bit protocol stack, enhances the adaptability to different hardware designs, ensures the smooth progress of the network boot process, and improves the deployment success rate and system stability in large-scale batch installation scenarios.
Smart Images

Figure CN121934902B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of network card initialization technology, and in particular to a method, device and medium for initializing a PXE network card compatible with 64-bit addressing. Background Technology
[0002] Preboot Execution Environment (PXE) is a key technology for large-scale operating system deployment and diskless booting. It enables network interaction with remote servers through boot code in the network interface card's (NIC) Option ROM. Traditional PXE implementations rely on the BIOS scanning PCI devices, mapping, and executing the Option ROM during the POST phase. However, with the widespread adoption of 64-bit hardware architectures, existing solutions face significant challenges: the 32-bit PXE protocol stack cannot directly access the Option ROM and MMIO resources located in the 64-bit address space, leading to boot failures; hardware differences between NICs from different manufacturers complicate driver adaptation, especially in cases of MMIO mapping failures or the lack of a unified alternative access mechanism in BAR compression mode; simultaneously, existing frameworks lack systematic error handling and rollback processes in resource management, easily leading to memory leaks or device initialization failures, and their insufficient utilization of vendor-defined capabilities (VSC) limits functional expansion and compatibility. Therefore, there is an urgent need for a NIC initialization and PXE booting solution that is compatible with 64-bit addressing, supports flexible hardware adaptation, and possesses robust error handling capabilities. Summary of the Invention
[0003] To address the aforementioned technical problems, the technical solution adopted by this invention is as follows:
[0004] According to a first aspect of this application, a method for initializing a PXE network interface card compatible with 64-bit addressing is provided, the method comprising the following steps:
[0005] S100, in response to the PXE boot command issued by the user through the baseboard management controller, the system firmware scans the PCI bus during the power-on self-test process and discovers the network card device; the network card device has a preset PCI configuration space;
[0006] S200: Read the PCI configuration space of the network card device, obtain the extended ROM base address register information, and map the network card Option ROM to the memory shadow area;
[0007] S300, execute the initialization code in Option ROM, so that the initialization code accesses the hardware registers of the network card device through memory-mapped input / output to complete hardware initialization;
[0008] S400, when the memory-mapped input / output access fails, the initialization code performs read and write operations on the hardware registers through the custom capability structure in the PCI configuration space of the network card device;
[0009] Based on the successful access to the hardware registers, the S500 completes the hardware initialization of the network card device and establishes the PXE protocol stack structure in memory.
[0010] The S600 registers itself with the system firmware as a PXE boot device and enters the protocol interaction phase with the PXE server.
[0011] According to another aspect of this application, a non-transitory computer-readable storage medium is also provided, wherein at least one instruction or at least one program is stored in the storage medium, and the at least one instruction or at least one program is loaded and executed by a processor to implement the above-described 64-bit addressable PXE network card initialization method.
[0012] According to another aspect of this application, an electronic device is also provided, including a processor and the aforementioned non-transitory computer-readable storage medium.
[0013] The present invention has at least the following beneficial effects:
[0014] The 64-bit addressable PXE network card initialization method of this invention significantly improves the robustness and hardware compatibility of network card initialization by constructing a dual-path mechanism of Memory Mapped Input / Output (MMIO) and PCI configuration space access. Specifically, when the system firmware or Option ROM code fails to access hardware registers via standard MMIO in a 32-bit PXE protocol stack environment—for example, due to the network card BAR being located in a 64-bit address space, the BAR operating in compressed mode, or a resource mapping function returning an exception—this invention allows the initialization code to seamlessly switch to reading and writing hardware registers through a custom capability structure in the PCI configuration space. This bypasses the addressing limitations or mapping failures of the MMIO path, ensuring that the hardware initialization process can continue. This mechanism not only solves the PXE boot failure problem caused by the incompatibility between the 64-bit BAR and the 32-bit protocol stack in traditional solutions, but also provides a standardized alternative access channel for network cards from different manufacturers, reducing dependence on specific BAR layouts and enhancing the adaptability of the solution to different hardware designs. Meanwhile, based on the successfully accessed hardware registers, the network card is initialized, the PXE protocol stack is established, and it is registered as a boot device, ensuring the smooth progress of the subsequent network boot process and improving the deployment success rate and system stability in large-scale batch installation scenarios. Attached Figure Description
[0015] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0016] Figure 1 A flowchart of a PXE network card initialization method compatible with 64-bit addressing provided in an embodiment of the present invention. Detailed Implementation
[0017] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0018] It should be noted that, based on this disclosure, those skilled in the art will understand that one aspect described herein can be implemented independently of any other aspect, and two or more of these aspects can be combined in various ways. For example, any number of aspects set forth herein can be used to implement the device and / or practice the method. Furthermore, this device and / or practice the method can be implemented using other structures and / or functionalities besides one or more of the aspects set forth herein.
[0019] The following will refer to Figure 1 The flowchart shown illustrates a method for initializing a PXE network card compatible with 64-bit addressing, introducing such a method.
[0020] The 64-bit addressable PXE network card initialization method may include the following steps:
[0021] 1. A method for initializing a PXE network card compatible with 64-bit addressing, characterized in that the method includes the following steps:
[0022] S100, in response to the PXE boot command issued by the user through the baseboard management controller, the system firmware scans the PCI bus during the power-on self-test process and discovers the network card device; the network card device has a preset PCI configuration space.
[0023] The user sends the IPMI command `ipmitool chassis bootdev pxe` to the Baseboard Management Controller (BMC) of the target server via the management network. Upon receiving the command, the BMC translates it into a system boot configuration, sets the boot order flag in non-volatile memory (NVRAM), and instructs the system to boot from PXE on the next system startup.
[0024] After the system powers on or resets, the CPU first executes the Power-On Self-Test (POST) code in the firmware (such as BIOS or UEFI). In the early stages of POST, the firmware enumerates the PCI bus: identifying the devices present on the bus by reading the header of the PCI configuration space (such as Vendor ID and Device ID).
[0025] Specifically, the firmware sequentially accesses the configuration space of the Bus / Device / Function (BDF) of each PCI device, reading the class code and programming interface to identify the device type as an Ethernet controller (typically with class code 0x02). When a network interface card (NIC) device is identified, the firmware records the device's BDF and configuration space base address, and allocates necessary resources (such as I / O space and memory space). This step also involves detecting whether the device supports extended ROM, i.e., checking whether the Expansion ROM Base Address Register in the configuration space is writable and non-zero.
[0026] By remotely issuing PXE boot commands via the BMC, unattended remote network booting is achieved, forming the basis for large-scale batch installations. During the POST phase, the firmware automatically scans the PCI bus and identifies the network card without manual intervention, ensuring automated and accurate hardware discovery. Simultaneously, recording the device's PCI configuration space lays the foundation for subsequent Option ROM loading and hardware access.
[0027] S200: Read the PCI configuration space of the network card device, obtain the extended ROM base address register information, and map the network card Option ROM to the memory shadow area.
[0028] After recognizing the network interface card (NIC), the firmware further reads the Expansion ROM Base Address Register at offset 0x30 in its PCI configuration space. This register indicates the physical base address and size of the Option ROM on the NIC.
[0029] The firmware first writes all 1s to the register, then reads back the ROM size (calculated using the alignment mask), and restores the original value. Next, the firmware allocates a contiguous shadow RAM region within the lower 4GB of system physical memory (compatible with 32-bit addressing), and copies the contents of the Option ROM from the device to this region. Specifically, the firmware sets the ROM base address register to the allocated memory address (enabling memory mapping), then reads the code from the ROM and writes it to the shadow RAM through memory read cycles. After mapping is complete, the firmware records the Option ROM entry point in a system data structure for subsequent calls.
[0030] Mapping the Option ROM to the memory shadow area allows the CPU to directly execute code in the ROM, avoiding the need to read the slow ROM via the PCI bus on every access, significantly improving initialization speed. Simultaneously, by reading the extended ROM base address register, the firmware can dynamically identify the presence and size of the ROM, ensuring compatibility with network cards from different manufacturers.
[0031] S300, execute the initialization code in Option ROM, so that the initialization code accesses the hardware registers of the network card device through memory-mapped input / output to complete hardware initialization.
[0032] The firmware executes the initialization entry point of the Option ROM (usually the initialization function pointer at offset 0x03) via a far call or interrupt call.
[0033] The code in the Option ROM (usually a PXE extended ROM) first obtains the system-provided entry parameters (such as PnP extended information), and then begins to initialize the network card hardware.
[0034] The initialization code attempts to access the network card registers via Memory-Mapped Input / Output (MMIO): it reads the contents of the base address register (BAR0) from the PCI configuration space, which provides the physical base address of the device mapped to memory space. Then, the code calls firmware-provided services (such as pci_ioremap) to map this physical address to a virtual address space (in real mode, the physical address may be used directly). Subsequently, it reads / writes these mapped virtual addresses to manipulate the network card's transmit / receive queues, MAC address, interrupt control, and other hardware registers, thus completing the basic hardware setup. For example, the code might write to control registers to enable DMA, set link speed, etc.
[0035] MMIO is the standard way for modern PCI devices to access registers, offering advantages such as low access latency and support for burst transmissions. By directly accessing hardware registers via MMIO, initialization code can efficiently complete network card configuration, laying the foundation for subsequent PXE protocol interaction.
[0036] S400, when the memory-mapped input / output access fails, the initialization code performs read and write operations on the hardware registers through the custom capability structure in the PCI configuration space of the network card device.
[0037] In some cases, MMIO may fail, for example: the network card's BAR0 is a 64-bit address, but the current PXE protocol stack is running in 32-bit mode and cannot directly access the 64-bit address space; or the BAR mapping is unavailable due to hardware compressed mode, causing standard registers to be unavailable; or the pci_ioremap function returns a null value due to insufficient resources. In these situations, the initialization code switches to an alternative mechanism: indirectly accessing hardware registers through vendor-specific capabilities (VSCs) in the PCI configuration space.
[0038] This mechanism provides a hardware access path independent of MMIO through the standard PCI configuration space channel, effectively solving the compatibility issues between 64-bit BARs and 32-bit protocol stacks, as well as access challenges in BAR mapping failures or compressed modes. The step-by-step read / write process, combined with opcode design, enables flexible transmission of register data of arbitrary length and type, and the correctness of multi-step operations is guaranteed by serial numbers. Simultaneously, the atomic locking mechanism for configuration space access ensures uninterrupted operation, improving reliability. This provides a unified alternative access method for network cards from different manufacturers, significantly enhancing the robustness of PXE initialization and hardware adaptability.
[0039] Based on the successful access to the hardware registers, the S500 completes the hardware initialization of the network card device and establishes the PXE protocol stack structure in memory.
[0040] After successfully reading or writing hardware registers via MMIO or VSC, the initialization code continues to execute the complete hardware initialization process, including:
[0041] Set the MAC address (possibly read from EEPROM or provided by firmware).
[0042] Initialize the receive and transmit descriptor rings and allocate DMA buffers.
[0043] Configuration interrupt (although polling may be used during the PXE phase).
[0044] Initiate link negotiation and wait for the physical link to be established.
[0045] Once completed, the initialization code constructs the data structures required for the PXE protocol stack in memory, primarily the PXE structure (!PXE) and the UNDI structure (UNDI). The !PXE structure contains an entry point function pointer, API table address, version information, etc., used to provide a standardized PXE interface to the system. Specifically, the code populates the !PXE's Signature with !PXE, sets the structure length, and points the API entry address to the internally implemented PXE API function (such as PXENV+entry). Simultaneously, the initialization code may register interrupt handling (optional). Finally, this data structure is placed in a pre-defined memory location (e.g., after 0x7C00) for subsequent BIOS calls.
[0046] Through complete hardware initialization and PXE protocol stack construction, the network card acquires network communication capabilities and exposes standard PXE APIs, enabling upper-layer firmware to call these APIs for network booting. The in-memory structures provide the foundation for subsequent device registration and protocol interaction.
[0047] The S600 registers itself with the system firmware as a PXE boot device and enters the protocol interaction phase with the PXE server.
[0048] The initialization code registers itself as a boot device by calling services provided by the firmware (such as the PnP extension of the PCI Option ROM or the Int1Ah function of the BIOS).
[0049] For example, in a traditional BIOS, the Option ROM can add devices to the boot order list using the BCV (Boot Connection Vector) mechanism.
[0050] Specifically, the code fills in a boot entry, specifying the device type as a network card, and returns control to the BIOS. The BIOS then calls the boot entry point for that device according to the user-defined boot order (e.g., PXE first). When the system boots to the boot device selection screen, the BIOS calls the entry point of the PXE structure (usually via `Int 1Ah PXENV+`) to enter the PXE protocol interaction phase. At this point, the PXE client begins broadcasting a DHCP request using the UDP / IP protocol to obtain an IP address and boot file name, then downloads the network bootloader (e.g., pxelinux.0) via TFTP, and finally loads and executes the program to complete the operating system installation or boot.
[0051] Registering the boot device makes PXE an optional boot option for the system, alongside local hard drives and optical drives. By interacting with the PXE server using standard protocols, the goal of obtaining the operating system image from the network is achieved, supporting large-scale unattended deployments. This step completes a seamless transition from hardware initialization to network protocol stack operation, ensuring the integrity of the entire PXE boot process.
[0052] In this embodiment, by constructing a dual-path mechanism of Memory Mapped Input / Output (MMIO) and PCI configuration space access, the robustness and hardware compatibility of network card initialization are significantly improved. Specifically, when the system firmware or Option ROM code fails to access hardware registers via standard MMIO in a 32-bit PXE protocol stack environment—for example, due to the network card BAR being located in a 64-bit address space, the BAR operating in compressed mode, or a resource mapping function returning an exception—this invention allows the initialization code to seamlessly switch to reading and writing hardware registers via a custom capability structure in the PCI configuration space, thereby bypassing the addressing limitations or mapping failures of the MMIO path and ensuring that the hardware initialization process can continue to execute. This mechanism not only solves the PXE boot failure problem caused by the incompatibility between 64-bit BARs and 32-bit protocol stacks in traditional solutions, but also provides a standardized alternative access channel for network cards from different manufacturers, reducing dependence on specific BAR layouts and enhancing the adaptability of the solution to different hardware designs. Meanwhile, based on the successfully accessed hardware registers, the network card is initialized, the PXE protocol stack is established, and it is registered as a boot device, ensuring the smooth progress of the subsequent network boot process and improving the deployment success rate and system stability in large-scale batch installation scenarios.
[0053] Furthermore, the custom capability structure includes an opcode field and an address / data field in the PCI configuration space; the opcode field is used to store predefined operation type identifiers, which include at least an address setting identifier, a length setting identifier, a sequence control identifier, a value identifier, and a reset identifier.
[0054] In this embodiment, the implementation of the custom capability structure in the PCI configuration space follows the vendor-specific capability structure specification defined by PCI SIG, and its structural layout is customized by the network card manufacturer.
[0055] In practice, the capability structure includes a standard capability header and a vendor-defined area. The standard header is located at the beginning of the capability linked list, which is pointed to by the configuration space offset of 0x34. It contains a Capability ID (fixed at 0x09, indicating a vendor-specific capability), a Next Capability Pointer (pointing to the next capability structure), and a Capability Length (indicating the length of the entire capability structure in bytes).
[0056] The manufacturer-defined area is divided according to design requirements. In this embodiment, a 32-bit opcode register is set at offset 0x4c of the capability structure to store the predefined operation type identifier; a 32-bit address / data register (addr / val) is set at offset 0x50 to store the internal address of the target hardware register or the data to be read or written.
[0057] The operation type identifier must contain at least the following five enumeration values: address setting identifier (VCS_OP_ADDR, value 1), length setting identifier (VCS_OP_LEN, value 2), sequence control identifier (VCS_OP_SEQ, value 3), value identifier (VCS_OP_VALUE, value 4), and reset identifier (VCS_OP_RESET, value 5).
[0058] When the initialization code needs to access hardware registers through the configuration space, it first traverses the capability list of the PCI configuration space to locate the vendor-specific capability structure, and then achieves step-by-step indirect register access by writing different opcodes to offset 0x4c and coordinating with reading and writing to offset 0x50.
[0059] For example, in a read operation, VCS_OP_ADDR (sets the register address), VCS_OP_LEN (sets the read length), and VCS_OP_SEQ (sets the sequence number) are written sequentially. Then, data is read from 0x50, and this process is repeated until all data is obtained. Finally, VCS_OP_RESET is written to clear the status. In a write operation, in addition to setting the address and length as described above, VCS_OP_VALUE needs to be written immediately after VCS_OP_SEQ, along with the data value. This process is repeated until all data has been written.
[0060] The advantages of this architecture design are as follows: A standardized PCI configuration space access interface provides a reliable alternative path for scenarios where memory-mapped input / output fails; the diversity of opcodes allows a single configuration space access to convey complex operational semantics (such as address, length, serial number, data, and reset), thus supporting multi-byte and multi-step register access; fixed offsets (0x4c and 0x50) simplify driver code implementation, eliminating the need for dynamic resolution of the capability structure's internal layout; an atomic locking mechanism ensures that multi-step operations are not interrupted, guaranteeing the correctness of reads and writes; and the use of a reset flag promptly cleans up the internal state machine, preventing residual states from interfering with subsequent accesses. In summary, this custom capability structure, through concise register mapping and rich opcode definitions, constructs a flexible, reliable, and highly compatible indirect hardware access mechanism.
[0061] Furthermore, step S400, which involves reading and writing hardware registers using a custom capability structure, includes the following steps:
[0062] S411, writes the address setting flag to the opcode field and writes the internal address of the target hardware register to the address / data field.
[0063] The initialization code first writes the address setting identifier to the opcode field of the custom capability structure through PCI configuration space access functions (such as pci_config_write).
[0064] As defined in the above embodiments, the opcode field is located at offset 0x4c in the PCI configuration space, and the address setting flag is set to 1 (VCS_OP_ADDR). After writing the opcode, the internal address of the target hardware register is immediately written to the address / data field (offset 0x50). This internal address is a register number defined by the network card manufacturer; for example, the offset of a certain control register is 0x100. In the x86 architecture, the PCI configuration space can be accessed via the OUT instruction or MMIO method. In specific implementations, ports 0xCF8 and 0xCFC are typically used for configuration space read and write operations.
[0065] For example, to access the configuration space offset 0x4c of bus 0, device 0, and function 0, you need to first write the address (0x80000000|(bus<<16)|(dev<<11)|(func<<8)|offset) to 0xCF8, and then read or write data from 0xCFC. A code example is shown below:
[0066] / / Set address
[0067] pci_config_write(bdf,0x4c,1); / / Write to VCS_OP_ADDR
[0068] pci_config_write(bdf,0x50,0x100); / / Write to register internal address 0x100.
[0069] By writing the address setting identifier and register address, the target register for subsequent operations is explicitly communicated to the firmware, establishing a context for multi-step access. This step translates the abstract hardware register number into an internal address that the firmware can recognize, avoiding the redundancy of repeatedly specifying the address for each access, while ensuring the accuracy of the operation.
[0070] S412, write the length setting flag to the opcode field and write the length value of the data to be read to the address / data field.
[0071] After setting the address, the initialization code continues by writing the length setting flag (VCS_OP_LEN, value 2) to offset 0x4c, and then writes the length value of the data to be read to offset 0x50. The length value is in bytes; for example, if a 4-byte register needs to be read, the length value is 4. For scenarios where multiple consecutive registers need to be read (such as a descriptor ring), the length value can be set to a value greater than 4, representing the total number of bytes to be read. Code example is as follows:
[0072] pci_config_write(bdf,0x4c,2); / / Write to VCS_OP_LEN
[0073] pci_config_write(bdf,0x50,4); / / Read length is 4 bytes.
[0074] The length setting flag allows the operation to adapt to data transfer requirements of different sizes, supporting both single register reads and batch reads of multiple register values. The firmware predicts the total amount of data to be returned based on the length value, thus preparing sufficient buffer space and controlling the number of subsequent reads, improving the flexibility of data transfer.
[0075] S413, write the sequence control identifier to the opcode field and write the sequence number of this read operation to the address / data field.
[0076] Next, we enter the preparation phase for the loop read. The initialization code writes the sequence control flag (VCS_OP_SEQ, value 3) to offset 0x4c, and then writes the sequence number for this read to offset 0x50. The sequence number starts from 0 and increments sequentially. A code example is shown below:
[0077] for (seq=0;seq <length;seq++){
[0078] pci_config_write(bdf,0x4c,3); / / Write to VCS_OP_SEQ
[0079] pci_config_write(bdf,0x50,seq); / / Write the sequence number
[0080] / / ...subsequent read operations
[0081] }
[0082] Each time a serial number is written, the internal logic of the firmware associates that serial number with the previously set address and length, preparing to return the data at the corresponding offset.
[0083] The sequence control identifier, in conjunction with the serial number, enables sequential access, overcoming the limitation that a single access to the PCI configuration space can only transfer a limited amount of data. Through the serial number, the firmware can distinguish different data units within the same batch of read operations, ensuring a one-to-one correspondence between the read data and the register address, thus preventing data corruption.
[0084] S414, Read the returned data from the address / data field, the data corresponding to the register value of the sequence number.
[0085] After writing the serial number, data is immediately read from offset 0x50. The data read at this time is the register value returned by the firmware corresponding to the current serial number. For example, when seq=0, the value read is the value at base address register 0x100; when seq=1, the value read is the value at base address +4. Code example is as follows:
[0086] uint32_t value=pci_config_read(bdf,0x50); / / Read data.
[0087] At the hardware level, writing the serial number triggers the PIO module inside the firmware to encapsulate the access request into a TLP (Transaction Layer Packet) and send it to the management engine or firmware for processing. After the firmware parses the request, it reads the value of the corresponding address from the hardware register and temporarily stores the data, waiting for the initialization code to read it from offset 0x50.
[0088] By reading the address / data fields, the initialization code can retrieve register values one by one without needing to understand the specific implementation details within the firmware. This approach abstracts complex hardware access into simple configuration space read / write operations, reducing the complexity of driver development. Furthermore, binding the read operation to the serial number ensures the correct order of data.
[0089] S415, Repeat steps S413 to S414, incrementing the sequence number sequentially until the number of sequence values specified by the length value is obtained.
[0090] Based on the length value set in step S412 (e.g., 4 bytes), the initialization code iteratively executes steps S413 and S414, incrementing the sequence number from 0 to length-1. Each loop retrieves one data unit until the total number of bytes read reaches the length value. For example, if length=4, the loop executes 4 times, reading 4 bytes of data (which may be combined into a single 32-bit register value or distributed across multiple buffers). A code example is shown below:
[0091] uint8_t buffer[4];
[0092] for(seq=0;seq<4;seq++) {
[0093] pci_config_write(bdf,0x4c,3); / / VCS_OP_SEQ
[0094] pci_config_write(bdf,0x50,seq);
[0095] buffer[seq] = pci_config_read(bdf, 0x50) & 0xFF; / / Assuming one byte is read each time.
[0096] }
[0097] On the firmware side, each time a serial number is received for writing, the register address corresponding to the current serial number is calculated (base address + serial number × data width), and then the actual hardware read operation is performed. The firmware internally maintains a state machine to record the address, length, and processed serial numbers of the current operation, ensuring that data is returned in order.
[0098] The loop mechanism allows multi-byte read operations to be completed in steps, breaking through the 4-byte limit of a single access to the PCI configuration space. By incrementing the serial number, a simple flow control protocol is established between the firmware and the driver, ensuring data integrity while preventing excessively long accesses from occupying configuration space for too long. This design is particularly suitable for scenarios requiring batch data retrieval, such as reading descriptor rings and statistical information.
[0099] S416, write a reset flag to the opcode field to clear the status of this read process.
[0100] After all data has been read, the initialization code writes a reset flag (VCS_OP_RESET, value 5) to offset 0x4c. This operation instructs the firmware to clear the state machine of this read operation, including the previously set address, length, and serial number counters, in order to perform the next independent read / write operation.
[0101] Upon receiving a reset flag, the firmware releases its internal buffer, resets the state machine to an idle state, and prepares for the next access. In some implementations, the reset operation also clears any remaining error flags to ensure that the next operation is not affected by residual state.
[0102] The use of a reset flag is crucial for ensuring the reliability of multi-step operations. It avoids state interference between different operations and prevents residual information from previous operations (such as addresses and lengths) from affecting the correctness of subsequent accesses. Simultaneously, the reset operation allows firmware to reclaim internal resources, preventing memory leaks or state machine deadlocks, thus improving system stability and reentrancy.
[0103] Through the aforementioned step-by-step read operation process, this invention constructs a reliable mechanism for indirectly accessing hardware registers via the PCI configuration space. This mechanism provides a standardized alternative path for network card initialization in scenarios where memory-mapped input / output fails. Specific beneficial effects include: First, by setting the address and length, precise location and variable-length data reading of any hardware register are achieved, overcoming the limitation of traditional configuration space access which can only read and write fixed-width registers; second, the sequence control and cyclic reading mechanism allows single multi-byte read operations to be completed in steps, breaking through the bandwidth bottleneck of 4 bytes per access in the PCI configuration space, making it particularly suitable for scenarios requiring batch acquisition of hardware status (such as descriptor rings and statistical information); third, the strict correspondence between the sequence number and the data ensures the consistency of multi-byte data, avoiding data corruption; finally, the introduction of the reset flag ensures the independence of the state of each operation, preventing mutual interference between operations and improving system stability and reentrancy. Overall, this read operation process, through its ingenious step-by-step design and state management, expands the limited PCI configuration space access capability into a fully functional hardware register indirect access channel, providing a solid guarantee for the successful execution of PXE initialization in complex hardware environments.
[0104] Furthermore, step S400, which involves reading and writing hardware registers using a custom capability structure, includes the following steps:
[0105] S421, writes the address setting flag to the opcode field and writes the internal address of the target hardware register to the address / data field.
[0106] The initialization code writes the address setting flag to the opcode field of the custom capability structure via PCI configuration space access functions. The opcode field is located at PCI configuration space offset 0x4c, and the address setting flag value is 1 (VCS_OP_ADDR). Immediately after writing the opcode, the internal address of the target hardware register is written to the address / data field (offset 0x50). This address is defined by the network card manufacturer; for example, the offset of a control register might be 0x200. On the x86 platform, access to the PCI configuration space is typically done through I / O ports 0xCF8 and 0xCFC: first, the target device's BDF (Bus / Device / Function) and offset are written to 0xCF8, and then 32-bit read / write operations are performed via 0xCFC. A code example is shown below:
[0107] / / Set the target register address to 0x200
[0108] pci_config_write(bdf, 0x4c, 1); / / VCS_OP_ADDR
[0109] pci_config_write(bdf, 0x50, 0x200); / / Internal address of the register.
[0110] At the hardware level, write operations to the configuration space will trigger the PCH (Platform Controller Center) to send the TLP to the firmware or management engine to record the register context of the current operation.
[0111] By explicitly specifying the hardware register to be operated on through address setting identifiers, a target context is established for subsequent write operations. This step passes the register number abstracted by the driver layer to the firmware, ensuring that subsequent data can be written to the correct hardware unit and avoiding address ambiguity in multi-step operations.
[0112] S422, write the length setting flag to the opcode field and write the length value of the data to be written to the address / data field.
[0113] After setting the address, the initialization code writes the length setting flag (VCS_OP_LEN, value 2) to offset 0x4c, and then writes the length value (in bytes) of the data to be written to offset 0x50. For example, to write to a 32-bit register, the length value is 4; if multiple consecutive registers need to be written (such as a configuration descriptor ring), the length value can be set to the total number of bytes (e.g., 16 bytes). Code example is as follows:
[0114] pci_config_write(bdf, 0x4c, 2); / / VCS_OP_LEN
[0115] pci_config_write(bdf, 0x50, 4); / / Write 4 bytes of data.
[0116] After receiving the length setting, the firmware will allocate an internal buffer or prepare a state machine to receive subsequent serial numbers and data.
[0117] The length setting flag enables write operations to adapt to different data volumes, supporting both single register writes and batch data updates. The firmware predicts the total amount of data to be received based on the length value, thereby managing the buffer effectively and controlling the number of subsequent loops, improving the flexibility and efficiency of data writing.
[0118] S423, write the sequence control identifier to the opcode field and write the sequence number of this write operation to the address / data field.
[0119] Next, the loop writing phase begins. The initialization code first writes the sequence control flag (VCS_OP_SEQ, value 3) to offset 0x4c, and then writes the sequence number for this write operation to offset 0x50. The sequence number starts from 0 and increments sequentially, indicating which unit in the data block is being written. For example, when writing 4 bytes of data, it might be written in four separate byte-by-byte operations, with sequence numbers 0, 1, 2, and 3 respectively. A code example is shown below:
[0120] for(seq=0;seq <length;seq++){
[0121] pci_config_write(bdf, 0x4c, 3); / / VCS_OP_SEQ
[0122] pci_config_write(bdf, 0x50, seq); / / Write the current sequence number
[0123] / / Next, data will be written...
[0124] }
[0125] Each time a serial number is written, the firmware records the current serial number and prepares to receive the corresponding data value.
[0126] The sequence control identifier, in conjunction with the serial number, enables sequential management of multi-byte write operations, ensuring that data is received by the firmware in the correct order. Through the serial number, the firmware can associate subsequently received data values with a specific offset, preventing out-of-order or overwriting errors.
[0127] S424: Write a numeric identifier to the opcode field and write the register configuration value to be written to the address / data field.
[0128] Following the previous step, the initialization code immediately writes a numerical identifier (VCS_OP_VALUE, value 4) to offset 0x4c, and then writes the register configuration value corresponding to the current sequence number to offset 0x50. For example, when writing 4 bytes of data 0x12345678, it can be split into four writes: the first write is the low byte 0x78, the second is 0x56, the third is 0x34, and the fourth is 0x12. A code example is shown below:
[0129] uint32_t data=0x12345678;
[0130] for(seq=0;seq<4;seq++) {
[0131] pci_config_write(bdf,0x4c,3); / / VCS_OP_SEQ
[0132] pci_config_write(bdf,0x50,seq); / / Serial number
[0133] pci_config_write(bdf,0x4c,4); / / VCS_OP_VALUE
[0134] pci_config_write(bdf,0x50,(data>>(seq*8))&0xFF); / / Write byte by byte
[0135] }
[0136] When the firmware receives VCS_OP_VALUE, it temporarily stores the data corresponding to the current sequence number in an internal buffer. When the length is greater than 1, the firmware will wait for the data of all sequence numbers to arrive before performing the actual hardware write, but depending on the implementation, it can also write while receiving. In a typical implementation, the firmware maintains an array value[seq], and triggers the hardware write operation only when the last sequence number (length-1) is received.
[0137] The combination of numerical identifiers and serial numbers overcomes the limitation that a single configuration space write operation can only transfer a small amount of data (typically 4 bytes), enabling the writing of data of arbitrary length through step-by-step transmission. This approach is firmware-friendly because the firmware can receive and cache data sequentially, ultimately writing it to the hardware in one go or in batches, reducing the number of interactions with the hardware.
[0138] S425, Repeat steps S423 to S424, incrementing the sequence number sequentially until the number of data specified by the length value is written.
[0139] S423 and S424 are executed repeatedly, with the sequence number incremented from 0 to (length-1). Each iteration writes a sequence number and its corresponding data value. After the loop ends, all data to be written has been transferred to the firmware's internal buffer. For example, if length=4, the loop will run 4 times, transferring 4 bytes of data. A code example has been given in the previous step. Firmware behavior: When a VCS_OP_VALUE with sequence number i is received, the data is stored in the buffer at index i; when a VCS_OP_VALUE with sequence number length-1 is received, the firmware detects that all data is ready (because the length was specified in S422), and then calls a hardware abstraction layer function (such as hw_write(reg_addr, buffer, length)) to write the buffer contents to the target hardware register (reg_addr is the internal address set in S421) all at once. If a single register is being written to, only one hardware write may be needed; if multiple consecutive registers are being written to, the firmware will write them sequentially in ascending order of address.
[0140] The circular mechanism ensures that data of any length can be transmitted through multiple accesses to the configuration space, and the incrementing sequence number guarantees data order. The firmware writes all data to the hardware at once after it has been received, optimizing hardware access performance (e.g., merged write operations) while avoiding intermediate states caused by partial writes. This mechanism is transparent to the driver; the driver only needs to send data in order and does not need to concern itself with how the firmware internally merges data.
[0141] S426, Write a reset flag to the opcode field to clear the status of this write process.
[0142] After all data has been written, the initialization code writes a reset flag (VCS_OP_RESET, value 5) to offset 0x4c. This operation notifies the firmware to clear the state machine of this write operation, including the previously set address, length, sequence number counter, and internal data buffer, in order to perform the next independent read / write operation.
[0143] Upon receiving a reset flag, the firmware releases its internal buffer, resets the state machine to idle, and prepares to process the next operation request. Reset can also be used for error recovery if an error occurs during the write process (such as a data verification failure).
[0144] The reset flag is crucial for ensuring the integrity and isolation of multi-step operations. It prevents residual state from previous write operations (such as partially cached data or address pointers) from affecting subsequent operations, thus avoiding data corruption. Simultaneously, reset allows firmware to reclaim resources, improving system stability and reentrancy. Even in the event of an operation failure, the driver can forcefully clear the state through reset, restoring the device to a known state.
[0145] Through the step-by-step write operation process described above, this embodiment constructs a reliable mechanism for indirectly writing to hardware registers via the PCI configuration space, forming a complete and symmetrical alternative access path with read operations. This mechanism provides a standardized data write channel for network card initialization when memory-mapped input / output fails. Specific beneficial effects include: First, the address setting and length setting steps accurately locate the target register and declare the data volume, laying the foundation for subsequent batch transmission; second, the alternating use of sequence control and numerical identifiers allows data of any length to be transmitted through multiple configuration space writes, breaking the 4-byte limit of a single PCI configuration space access, making it particularly suitable for scenarios requiring the configuration of large blocks of data (such as MAC address tables and descriptor rings); third, the strict incrementing of the sequence number during the cyclic write process ensures the correctness of the data order, and the firmware, through caching and a final one-time write, guarantees data consistency and optimizes hardware access efficiency; finally, the introduction of a reset flag isolates the state of each operation, preventing residual interference and improving the robustness of the system. Overall, this write operation process, through its ingenious step-by-step design and state management, expands the limited PCI configuration space access capability into a fully functional hardware register indirect write channel. Together with the read operation, it constitutes a complete backup hardware access scheme, significantly improving the success rate and reliability of PXE initialization in complex hardware environments.
[0146] Furthermore, when performing read and write operations through a custom capability structure, the supported data widths include 1 byte, 2 bytes, and 4 bytes; and multi-step operations are guaranteed to be atomic through a locking mechanism, preventing interruption by other accesses.
[0147] PCI configuration space access supports multiple data widths, including 1 byte (Byte), 2 bytes (Word), and 4 bytes (Dword), which are characteristics defined by the PCI bus specification. When accessing registers indirectly through custom capability structures, the initialization code (such as the IPXE driver) needs to select an appropriate width for reading and writing based on actual needs to ensure alignment with the natural boundaries of the hardware registers and meet the firmware's requirements for operation length parsing.
[0148] The driver encapsulates low-level PCI configuration space read / write functions, supporting width parameters. For example, on x86 platforms, when accessing via I / O ports 0xCF8 and 0xCFC, different widths can be achieved using different assembly instructions. Alternatively, a unified `pci_config_read` function can be used, with the `size` parameter determining whether to call `inb`, `inw`, or `inl`.
[0149] In S411~S416 (read process) and S421~S426 (write process), each access to offsets 0x4c and 0x50 may involve different widths. For example:
[0150] When writing to the opcode field (offset 0x4c), 4 bytes are typically used because the opcode is defined as a 32-bit value.
[0151] When writing to the address / data field (offset 0x50), the width depends on the actual data width: if the register address to be written is 16 bits, 2 bytes can be used; if the data value is 8 bits, 1 byte can be used. However, to simplify the implementation, 4 bytes can be used for access, but the high byte must be kept meaningless.
[0152] When reading data, the driver determines the width of each read based on the previously set length value. For example, if the length value indicates that 4 bytes of data need to be read, and the hardware register is 32 bits wide, it may loop 4 times to read 1 byte each time, or loop 2 times to read 2 bytes each time, or directly read 4 bytes at once (but note that the serial number control mechanism may require reading byte by byte). The specific implementation depends on the firmware, but the driver must adapt to the granularity expected by the firmware.
[0153] Firmware Processing: When the firmware receives a configuration space access request, it parses the requested width from the TLP (via byte enable signals or request size). The firmware's internal state machine interprets the address / data fields with the appropriate width based on the opcode and the current stage. For example, when the opcode is VCS_OP_VALUE, if the write access is 1 byte, the firmware only updates the lower 8 bits of the corresponding sequence number in the internal buffer; if it's 4 bytes, it may overwrite the buffer for 4 sequence numbers (depending on the design). To ensure compatibility, it's generally agreed that all configuration space accesses are in 4-byte increments, with the firmware ignoring extra bytes or processing them as needed.
[0154] The principle behind atomic lock mechanisms: Multi-step read / write operations (such as S411~S416 or S421~S426) consist of multiple configuration space accesses. If these operations are interrupted by other code (such as interrupt handlers or other CPU cores), it may lead to firmware state machine corruption or data inconsistency. Therefore, a lock mechanism is needed to ensure the atomicity of the entire operation sequence. In IPXE or BIOS environments, implementation methods include disabling interrupts, using spinlocks, or semaphores.
[0155] Disable interrupts (applicable to single-processor or real mode): Before starting a multi-step operation, execute the CLI instruction to disable maskable interrupts; after the operation is completed, execute STI to resume. This prevents the current execution flow from being interrupted by interrupt service routines, ensuring continuous execution of steps.
[0156] Spinlocks (for multi-core protection mode): In multi-processor environments, spinlocks are used to protect critical sections. The driver acquires the spinlock before starting a multi-step operation and releases it after the operation is complete. Other CPUs attempting to perform the same operation will spin-wait.
[0157] Firmware-side state machine protection: The firmware itself must also ensure safe concurrent access to the same device. Typically, firmware handles configuration space access serially (through PCIe primitives), but if multiple proxies exist (such as simultaneous access by the BIOS and BMC), the firmware must implement mutual exclusion. However, this solution primarily focuses on driver-side guarantees.
[0158] Implementation in IPXE: IPXE runs in x86 real mode or protected mode, and atomicity is typically achieved by disabling interrupts. Before entering the read / write process, `__asm__ __volatile__("cli" : : : "memory")` is called, and `sti` is called after the operation. Memory barriers are used to ensure that the compiler does not optimize reordering.
[0159] In the read operation process, S411~S416 are a series of consecutive configuration space accesses. If an interrupt occurs after the serial number is written in S413 and before the data is read in S414, the interrupt handler may also attempt to access the hardware through the same VSC structure, causing the firmware state to be overwritten, resulting in S414 reading incorrect data. Therefore, lock protection is required.
[0160] By explicitly supporting 1 / 2 / 4-byte data widths, this invention enables drivers to flexibly select access granularity based on the actual bit width of hardware registers, avoiding unnecessary byte alignment processing while maintaining compatibility with register definitions from various vendors. The introduction of a locking mechanism ensures the atomicity of multi-step read / write operations: during the execution of the configuration space access sequence, interrupts or mutual exclusion access are prohibited to prevent other code paths from interfering with the firmware's internal state machine, thereby ensuring that the strict correspondence between address, length, sequence number, and data is not disrupted. This atomicity guarantee is the cornerstone of the reliability of indirect register access, avoiding state corruption, data errors, or deadlocks caused by concurrent access, enabling the mechanism of simulating complex hardware access through the PCI configuration space to operate stably in multi-tasking or interrupt-driven environments. Ultimately, these two mechanisms together improve the success rate of PXE initialization in complex hardware environments, ensuring system stability during large-scale deployments.
[0161] Furthermore, in step S400, when performing read / write operations through a custom capability structure, the access request issued by the initialization code is transmitted to the system firmware in a predefined data format; the system firmware parses the data format, extracts at least one of the following information: opcode, register address, read / write type, serial number, and data value, and calls the underlying hardware access interface to perform actual operations on the hardware registers based on the extracted information.
[0162] This embodiment can be achieved through the following steps:
[0163] S431: The firmware receives the initialization code's access request to the PCI configuration space and parses the access offset and read / write type.
[0164] When initialization code (such as an IPXE driver) initiates read / write operations to the PCI configuration space via PIO (Programmed I / O), the CPU converts the access into a PCI configuration cycle and sends it to the target device's configuration space via the host bridge (such as the PCH). The device-side firmware (such as UEFI runtime services, BIOS interrupt handlers, or the BMC management engine) listens for these configuration space accesses. The firmware first captures the access request and determines the target bus / device / function, configuration space offset, and read / write type (configuration read or configuration write) based on information in the TLP (Transaction Layer Packet). For example, in x86 systems, access to the configuration space is ultimately converted into a PCI configuration transaction by the host bridge; the firmware can access this by hooking the CF8 / CFC port or by intercepting it through the ACPI hardware abstraction layer. The firmware records the offset (such as 0x4c or 0x50) and operation type (read / write) of this access.
[0165] The firmware accurately captures every configuration space access, providing the raw data source for subsequent parsing. By parsing the offset and read / write type, the firmware can distinguish whether the current access is to the opcode field or the address / data field, thus correctly interpreting the driver intent. This is the foundation of the entire indirect access mechanism.
[0166] S432, the firmware identifies the opcode field and address / data field based on the access offset and extracts the opcode value.
[0167] The firmware determines whether the current access is to the opcode field (offset 0x4c) or the address / data field (offset 0x50) based on the offset. If the offset is 0x4c, the data being written or read is extracted as the opcode value; if the offset is 0x50, the data is extracted as the address / data information. For write operations, the firmware directly obtains the written value; for read operations, the firmware needs to return the previously stored value (such as the data corresponding to the sequence number). The firmware internally maintains a state machine to record the current stage of the operation (such as waiting address, waiting length, waiting sequence number, etc.). For example, when a write operation at offset 0x4c is detected and the data is 1 (VCS_OP_ADDR), the firmware switches the state to the "receive address" stage and saves the opcode. When a subsequent write operation at offset 0x50 arrives, the firmware knows that a register address is being written.
[0168] By using offsets to distinguish between two key registers, the firmware can identify whether the current drive is performing an opcode setting or a data transfer, thus correctly driving the state machine. This design simplifies the firmware logic, avoids complex packet parsing, and enables complete protocol interaction using only two fixed offsets.
[0169] S433, the system firmware determines the meaning of the current step based on the opcode value, and extracts at least one of the following information: register address, data length, serial number, and data value, in conjunction with the context saved in the previous step.
[0170] The system firmware parses the information carried in this access based on the current opcode value and the previously saved context (such as the received address, length, serial number, etc.). The specific rules are as follows:
[0171] If the opcode is VCS_OP_ADDR(1), then the current stage is the address setting stage. The firmware will interpret the value of the next write (or read) to 0x50 as the internal address (reg_addr) of the target hardware register and save it to the state machine.
[0172] If the opcode is VCS_OP_LEN(2), the next value written to 0x50 will be interpreted as the length of the operation data, and the firmware will initialize the internal buffer and prepare to receive data.
[0173] If the opcode is VCS_OP_SEQ (3), the next value written to 0x50 will be interpreted as a sequence number (seq). The firmware records the current sequence number and prepares to receive the corresponding data.
[0174] If the opcode is VCS_OP_VALUE (4), the next value written to 0x50 will be interpreted as a data value (value), and the firmware will store the data into the corresponding position in the buffer according to the current sequence number (e.g., buffer[seq] = value).
[0175] If the opcode is VCS_OP_RESET (5), the firmware clears all internal states (address, length, buffer, sequence counter) and returns to the idle state.
[0176] For read operations, upon receiving VCS_OP_SEQ, the firmware needs to retrieve the corresponding value from the previously saved register data based on the current serial number, and return the value on the next read request for 0x50.
[0177] In addition, the firmware also needs to record the overall read / write type of this operation (implied by the driver in the first step; for example, a read process usually requires reading data from the hardware and returning it, while a write process requires writing data to the hardware). This information can be inferred from whether the driver is executing a read sequence or a write sequence, or it can be determined by an implicit flag (such as whether there is a VALUE opcode).
[0178] By using a state machine-driven parsing approach, the firmware can accurately extract all information elements transmitted by the driver, including register addresses, data lengths, serial numbers, and data values. This design enables close collaboration between the driver and firmware, allowing complex multi-step operations to be decomposed into simple configuration space read / write sequences, reducing the complexity of firmware implementation while ensuring the integrity and accuracy of information transmission.
[0179] S434: Once all the necessary information is ready, the system firmware calls the underlying hardware access interface to perform actual read and write operations on the target hardware register.
[0180] The firmware internally maintains a "data ready" condition. For write operations, the condition is: the address and length have been received, and the data values corresponding to all sequence numbers have been received (i.e., sequence numbers from 0 to length-1 have all arrived). At this point, the firmware calls a hardware abstraction layer function, such as hw_write(reg_addr, buffer, length), which, based on hardware characteristics, writes the buffer data to the hardware register via MMIO or indirect access. For read operations, the condition is: the address and length have been received, and a read request for a specific sequence number has been received. In this case, data at the corresponding address needs to be read from the hardware. However, read operations are typically performed on demand; that is, after each VCS_OP_SEQ is received, hw_read(reg_addr+seq*width,width) is immediately called to read a single register and the result is buffered for subsequent reads. If the hardware supports batch reads, all data can be read at once and then buffered.
[0181] When firmware calls low-level interfaces, it must consider the access width (1 / 2 / 4 bytes) and endianness of hardware registers to ensure data correctness. For example, for write operations, it might use a loop to write word by word; for read operations, it would return the read value.
[0182] Separating actual hardware access from protocol parsing allows firmware to reuse existing hardware driver libraries while maintaining consistency with upper-layer protocols. Firmware operates on hardware only after information is ready, avoiding intermediate states caused by partial updates and ensuring the integrity of hardware configuration. For write operations, batch writing reduces the number of hardware accesses and improves efficiency; for read operations, on-demand reading avoids pre-reading unnecessary data.
[0183] S435: The system firmware returns the read data or operation status to the initialization code, and clears the internal status according to the reset flag after the operation is completed.
[0184] For read operations, upon receiving a read request for 0x50, the firmware returns the data previously read from the hardware and cached to the driver. For write operations, the firmware can return a success status after completing the hardware write (typically, configuration space write operations themselves do not return a value, but the firmware can provide feedback through subsequent read status registers or interrupts; however, simple write operations in this scheme usually do not require explicit acknowledgment). When the driver finally writes VCS_OP_RESET, the firmware clears all internal state machine variables (address, length, buffer, sequence counter), releases memory, and prepares for the next operation. If an error occurs during hardware access (such as timeout or invalid address), the firmware can internally record the error and return an error code through subsequent read operations, or trigger a reset to clear the status.
[0185] The data return mechanism ensures that the driver can correctly obtain the read register values and complete initialization. The reset operation promptly clears the state, preventing residual data from affecting subsequent operations and enhancing system stability and reentrancy. Error handling capabilities enable the driver to detect hardware faults and take recovery measures.
[0186] Through the aforementioned firmware processing flow, this invention constructs a complete indirect register access protocol based on the PCI configuration space's custom capability structure. The firmware acts as a bridge between the driver and hardware, parsing the configuration space access sequence issued by the driver, extracting the opcode, address, length, sequence number, and data value, and calling the underlying hardware interface at appropriate times to perform actual read / write operations. Finally, it returns the result and clears the state. The beneficial effects of this mechanism are: First, the division of labor between the firmware and driver is clear; the driver only needs to access two fixed offsets through the standard configuration space to complete complex operations without needing to understand hardware details, reducing the difficulty of driver development. Second, the internal state machine design of the firmware ensures the sequentiality and integrity of multi-step operations, preventing hardware anomalies caused by partial writes or erroneous sequences. Third, through caching and batch processing, the firmware can optimize hardware access efficiency, such as merging multiple write operations or pre-reading batch data. Finally, the reset flag and error handling mechanism ensure the robustness of the system, enabling recovery even in abnormal situations. In summary, this firmware processing flow is the core support for achieving 64-bit addressing compatibility and hardware adaptation flexibility in this invention, providing a reliable guarantee for the successful execution of PXE initialization in various complex environments.
[0187] Furthermore, the failure of the memory-mapped input / output access method includes:
[0188] S310, when performing memory mapping on the basic address register of the network card device, the pci_ioremap function returns a null value, indicating that the memory-mapped input / output access method has failed.
[0189] During the execution of the Option ROM initialization code, the physical address and size of the network card device's base address register (BAR0) must first be obtained through the PCI configuration space. The specific steps are as follows:
[0190] Read the BAR0 register at PCI configuration space offset 0x10 to obtain the physical base address and attributes (such as whether it is I / O space, whether it is 64-bit, whether it is prefetchable, etc.).
[0191] Based on the memory space size indicated in BAR0, the length of the memory region mapped by BAR0 can be calculated by writing all 1s and then reading back (for example, writing 0xFFFFFFFF to BAR0, then reading back, inverting and adding 1 to get the size).
[0192] After obtaining the physical address and length, the firmware-provided memory mapping function (such as pci_ioremap) is called to map the physical address region to the system memory's virtual address space (in real mode, the physical address may be used directly, but in protected mode, a page table mapping needs to be established). pci_ioremap returns the mapped virtual address; if it returns NULL, it indicates that the mapping failed.
[0193] By checking the return value of pci_ioremap, MMIO mapping failures can be detected immediately, preventing system crashes caused by the subsequent use of invalid addresses. This explicit error detection mechanism provides clear triggering conditions for switching to alternative access paths, enhancing code robustness.
[0194] S320: When it detects that the basic address register of the network card device is working in compressed mode and cannot be accessed through standard register remapping, it determines that the memory-mapped input / output access method has failed.
[0195] Some network interface card (NIC) hardware designs support BAR (Compressed Mode), where the address space mapped by BAR0 is divided into multiple regions, but the standard register remapping mechanism cannot directly access all regions. During probing, the initialization code determines whether it is in compressed mode by checking the size of BAR0 or reading specific registers. Based on code snippets in the material, such as checking `mmio_len == PF_BAR0_SIZE_256M` (256MB), this might be one way to identify compressed mode. Specific implementations might include:
[0196] Read the length of BAR0. If the length matches a preset compression mode length value (such as 256MB), it is considered to be in compression mode.
[0197] Read vendor-specific capability registers, such as querying whether compression mode is enabled through a certain extended capability structure in the PCI configuration space.
[0198] An attempt to write to a standard register and read it back may indicate that a special access method is required if the written value is not as expected.
[0199] Once the system is determined to be in compression mode, the initialization code considers the standard MMIO method unavailable (because the register layout is non-standard and cannot be accessed by a simple base address + offset), so it marks MMIO as failed and prepares to use the VSC method.
[0200] By identifying compression modes, erroneous accesses caused by non-standard register layouts are avoided. Early detection and switching to VSC mode prevents subsequent drivers from accessing incorrect addresses, thus improving the solution's adaptability to different hardware designs.
[0201] S330: When it is detected that the basic address register of the network card device is 64-bit BAR, while the currently executed PXE protocol stack is 32-bit, it is determined that the 64-bit address space cannot be directly accessed, and the memory-mapped input / output access method has failed.
[0202] The PCIe specification allows BARs to be 64-bit, in which case BAR0 and BAR1 are combined into a 64-bit address. Methods for detecting 64-bit BARs:
[0203] Read the lowest bits of BAR0: If bit0 is 0, it represents memory space; bits 2 and 1 indicate the type. If the type is 0x2 (i.e., bit2=1, bit1=0), it represents a 64-bit address. In this case, BAR0 contains the lower 32 bits, and BAR1 contains the higher 32 bits.
[0204] The initialization code needs to read BAR1 to obtain the complete 64-bit physical address.
[0205] In a traditional 32-bit PXE protocol stack environment (such as running in real mode or 32-bit protected mode), the CPU cannot directly access physical addresses higher than 4GB because the virtual address space is limited or the page tables are not mapped. Therefore, even if the physical address of BAR0 is lower than 4GB, if BAR0 is indicated as 64-bit, it may mean that the device expects to use the entire 64-bit address range for DMA, etc., but the initialization code may not be able to map it safely.
[0206] This detection mechanism directly solves the compatibility problem between 64-bit hardware and 32-bit software stacks. By identifying the 64-bit BAR and recognizing the current 32-bit environment, it proactively abandons the potentially unsuccessful MMIO method and switches to the configuration space access path, thereby ensuring that PXE initialization can continue. This judgment avoids system crashes caused by address truncation or mapping errors and is one of the key aspects of achieving 64-bit compatibility in this invention.
[0207] By accurately detecting and identifying three typical scenarios of MMIO access failure, this embodiment constructs a complete pre-emptive exception handling mechanism. When pci_ioremap returns a null value, resource mapping failure is promptly captured; when BAR compression mode is detected, access risks caused by non-standard register layouts are avoided in advance; when a mismatch is found between the 64-bit BAR and the 32-bit protocol stack, unreliable MMIO paths are proactively abandoned. These detection steps ensure that the path is only used when MMIO is indeed available; otherwise, a seamless switch to the alternative access method using a custom capability structure defined by the PCI configuration space is achieved. This mechanism fundamentally solves the problem of initialization interruption caused by MMIO failure in traditional PXE solutions under complex hardware environments, significantly improving the success rate, compatibility, and robustness of network card initialization, and providing strong support for stable deployment in large-scale batch installation scenarios.
[0208] Furthermore, when it is detected that the basic address register of the network interface card (NIC) device is 64-bit BAR and the current PXE protocol stack is 32-bit, step S400 further includes the following preprocessing steps:
[0209] S401 compresses or remaps the address space indicated by the 64-bit BAR to a 32-bit accessible memory shadow region.
[0210] When the network interface card (NIC) device's Basic Address Register (BAR) is detected to be 64-bit and the current PXE protocol stack is running in a 32-bit environment (including real mode or 32-bit protected mode), the initialization code needs to map the device's register address space to a 32-bit accessible memory region to ensure that hardware registers can be accessed subsequently via MMIO. This step relies on the PCI resource reallocation mechanism provided by the system firmware or direct programming of the BAR register.
[0211] Implementation Method 1: Call the BIOS service to reallocate PCI resources
[0212] In a traditional BIOS environment, Option ROM can reconfigure PCI device resources by calling the PCI BIOS service (INT 1Ah, function B1h). The specific steps are as follows:
[0213] The initialization code first obtains the BAR0 and BAR1 values of the current device and confirms that it is a 64-bit BAR (the type field is 0x2).
[0214] The PCI BIOS's SET_PCI_REGISTER function (function B1h, subfunction 06h) is invoked to attempt to allocate a new base address for the device within the 32-bit address range. This service traverses the available address space in the system and assigns a physical address to the device that is less than 4GB.
[0215] If the allocation is successful, the BIOS will update the values of BAR0 and BAR1 and return a success status. At this point, the device's register space is remapped to a 32-bit physical address.
[0216] The initialization code needs to reread BAR0 and BAR1 to obtain the new 32-bit base address and attempt to access them via MMIO through pci_ioremap or by directly using the physical address.
[0217] Method 2: Utilizing Shadow Memory
[0218] During the BIOS POST phase, the Option ROM is already mapped to the memory shadow area (usually between C0000 and EFFFF). For the register space of the 64-bit BAR, it is possible to attempt to map it to extended memory (such as a region above 15MB), but this requires system support and ensuring that the region is not occupied. The specific operation can be completed by the BIOS during enumeration; the Option ROM only needs to check whether the current BAR has been allocated a 32-bit address.
[0219] Normally, the BIOS handles the remapping of 64-bit BARs during the PCI enumeration phase, and the Option ROM only needs to read the configured BARs. However, in this scheme, to handle situations where the BIOS fails to process correctly, the Option ROM can actively attempt the above operations. If all attempts fail, MMIO is marked as unavailable, and the subsequent VSC access process begins.
[0220] S402 verifies whether the remapped address can be accessed normally by the 32-bit PXE protocol stack to ensure that the Option ROM is loaded and executed correctly.
[0221] After the address remapping is complete, the initialization code must verify that the new address can indeed be correctly accessed by a 32-bit environment. The verification steps include:
[0222] Address range check: Confirm that the new physical address is less than 4GB and does not cross the 4GB boundary after adding the device mapping length. For example, if the device requires 16MB of space, the starting address should be between 0x00000000 and 0xFF000000.
[0223] Readability test: Attempt to read a known register (such as the device ID or vendor ID register) from the mapped address and compare it to the value read in the PCI configuration space. For example, reading the 32-bit value at 0xE8000000 via MMIO should match the device ID and vendor ID in the configuration space (or, depending on the hardware specification, some register values are predictable).
[0224] Writability test: Perform a safe write operation on the register (such as write and restore) to check if it is writable without affecting the normal operation of the device. Take care to avoid damaging the device state; you can choose to test read-only registers or use temporary bits.
[0225] DMA capability verification: Although DMA may not be used in the PXE stage, this ensures that subsequent drivers are available. You could try allocating a DMA buffer within a 32-bit address range, but this step is optional.
[0226] Consistency check: If the device behaves abnormally after remapping (such as access timeout or returning all F), it is considered a failure.
[0227] If the verification passes, the initialization code can continue to use MMIO to complete the hardware initialization; if the verification fails, MMIO is marked as unavailable, and access is switched to S400 via VSC.
[0228] Through the preprocessing steps S401 and S402, this invention provides a proactive adaptation mechanism for the compatibility of 64-bit BAR devices in a 32-bit PXE environment. S401 compresses and remaps the device register space to a 32-bit accessible address range by calling BIOS services or directly programming the BAR, potentially salvaging previously unusable MMIO paths. S402, through rigorous verification testing, ensures that the remapped addresses are indeed securely accessible, preventing system crashes due to invalid addresses or abnormal device responses. The combination of these two steps maximizes the possibility of using high-performance MMIO before entering the alternative VSC access path, improving hardware utilization and ensuring system stability. Even if remapping fails, a smooth transition to the VSC mechanism ensures uninterrupted PXE initialization. This design significantly enhances the solution's adaptability to different hardware configurations, especially in large-scale installation environments, enabling compatibility with more network card models and improving deployment success rates.
[0229] Furthermore, although the steps of the method in this disclosure are described in a specific order in the accompanying drawings, this does not require or imply that the steps must be performed in that specific order, or that all the steps shown must be performed to achieve the desired result. Additional or alternative steps may be omitted, multiple steps may be combined into one step, and / or a step may be broken down into multiple steps.
[0230] Embodiments of the present invention also provide a non-transitory computer-readable storage medium that can be disposed in an electronic device to store at least one instruction or at least one program related to implementing a method in the method embodiments, wherein the at least one instruction or the at least one program is loaded and executed by the processor to implement the method provided in the above embodiments.
[0231] The program product may employ any combination of one or more readable media. A readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of readable storage media (a non-exhaustive list) include: an electrical connection having one or more wires, a portable disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.
[0232] Computer-readable signal media may include data signals propagated in baseband or as part of a carrier wave, carrying readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A readable signal medium may also be any readable medium other than a readable storage medium, capable of sending, propagating, or transmitting programs for use by or in conjunction with an instruction execution system, apparatus, or device.
[0233] The program code contained on the readable medium may be transmitted using any suitable medium, including but not limited to wireless, wired, optical fiber, RF, etc., or any suitable combination thereof.
[0234] Program code for performing the operations of this application can be written in any combination of one or more programming languages, including object-oriented programming languages such as Java and C++, and conventional procedural programming languages such as C or similar languages. The program code can execute entirely on the user's computing device, partially on the user's device, as a standalone software package, partially on the user's computing device and partially on a remote computing device, or entirely on a remote computing device or server. In cases involving remote computing devices, the remote computing device can be connected to the user's computing device via any type of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (e.g., via the Internet using an Internet service provider).
[0235] Embodiments of the present invention also provide an electronic device, including a processor and the aforementioned non-transitory computer-readable storage medium.
[0236] The electronic device is merely an example and should not impose any limitations on the functionality and scope of use of the embodiments in this application.
[0237] Electronic devices are manifested in the form of general-purpose computing devices. Components of an electronic device may include, but are not limited to: at least one processor, at least one memory, and a bus connecting different system components (including memory and processor).
[0238] The memory stores program code that can be executed by the processor, causing the processor to perform the steps in the various embodiments described in this specification.
[0239] The memory may include readable media in the form of volatile memory, such as random access memory (RAM) and / or cache memory, and may further include read-only memory (ROM).
[0240] The memory may also include programs / utilities having a set (at least one) of program modules, including but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of these examples may include an implementation of a network environment.
[0241] A bus can represent one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus that uses any of the various bus structures.
[0242] Electronic devices can also communicate with one or more external devices (e.g., keyboards, pointing devices, Bluetooth devices, etc.), one or more devices that enable user interaction with the electronic device, and / or any device that enables the electronic device to communicate with one or more other computing devices (e.g., routers, modems, etc.). This communication can be achieved through input / output (I / O) interfaces. Furthermore, electronic devices can communicate with one or more networks (e.g., local area networks (LANs), wide area networks (WANs), and / or public networks, such as the Internet) via network adapters. The network adapter communicates with other modules of the electronic device via a bus. It should be understood that other hardware and / or software modules can be used in conjunction with the electronic device, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.
[0243] From the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein can be implemented by software or by combining software with necessary hardware. Therefore, the technical solutions according to the embodiments of this disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (such as a CD-ROM, USB flash drive, external hard drive, etc.) or on a network, including several instructions to cause a computing device (such as a personal computer, server, terminal device, or network device, etc.) to execute the methods according to the embodiments of this disclosure.
[0244] Embodiments of the present invention also provide a computer program product including program code, which, when the program product is run on an electronic device, causes the electronic device to perform the steps of the methods described above in various exemplary embodiments of the present invention.
[0245] While specific embodiments of the invention have been described in detail by way of examples, those skilled in the art should understand that the examples are for illustrative purposes only and are not intended to limit the scope of the invention. Those skilled in the art should also understand that various modifications can be made to the embodiments without departing from the scope and spirit of the invention.
Claims
1. A method for initializing a PXE network card compatible with 64-bit addressing, characterized in that, The method includes the following steps: S100, in response to the PXE boot command issued by the user through the baseboard management controller, the system firmware scans the PCI bus during the power-on self-test process and discovers the network card device; the network card device has a preset PCI configuration space; S200: Read the PCI configuration space of the network card device, obtain the extended ROM base address register information, and map the network card Option ROM to the memory shadow area; S300, execute the initialization code in Option ROM, so that the initialization code accesses the hardware registers of the network card device through memory-mapped input / output to complete hardware initialization; S400, when the memory-mapped input / output access fails, the initialization code performs read and write operations on the hardware registers through the custom capability structure in the PCI configuration space of the network card device; Based on the successful access to the hardware registers, the S500 completes the hardware initialization of the network card device and establishes the PXE protocol stack structure in memory. The S600 registers itself with the system firmware as a PXE boot device and enters the protocol interaction phase with the PXE server. The memory-mapped input / output access failure includes: S310, when performing memory mapping on the basic address register of the network card device, the pci_ioremap function returns an empty value, indicating that the memory-mapped input / output access method has failed. S320: When it is detected that the basic address register of the network card device is working in compressed mode and cannot be accessed through standard register remapping, it is determined that the memory-mapped input / output access method has failed. S330, when it is detected that the basic address register of the network card device is 64-bit BAR, while the currently executed PXE protocol stack is 32-bit implementation, it is unable to directly access the 64-bit address space, and it is determined that the memory-mapped input / output access method has failed. When the network interface card (NIC) device's basic address register (BAR) is detected to be 64-bit and the current PXE protocol stack is 32-bit, step S400 further includes the following preprocessing steps: S401 compresses or remaps the address space indicated by the 64-bit BAR to a 32-bit accessible memory shadow region. S402 verifies whether the remapped address can be accessed normally by the 32-bit PXE protocol stack to ensure that the Option ROM is loaded and executed correctly.
2. The PXE network card initialization method compatible with 64-bit addressing according to claim 1, characterized in that, The custom capability structure includes an opcode field and an address / data field in the PCI configuration space; the opcode field is used to store predefined operation type identifiers, which include at least an address setting identifier, a length setting identifier, a sequence control identifier, a value identifier, and a reset identifier.
3. The PXE network card initialization method compatible with 64-bit addressing according to claim 2, characterized in that, Step S400 involves reading and writing hardware registers using a custom capability structure, including the following steps: S411, write the address setting flag to the opcode field, and write the internal address of the target hardware register to the address / data field; S412, write the length setting flag to the opcode field and write the length value of the data to be read to the address / data field; S413, write the sequence control identifier to the opcode field and write the sequence number of this read operation to the address / data field; S414, Read the returned data from the address / data field, the data corresponding to the register value of the sequence number; S415, Repeat steps S413 to S414, incrementing the sequence number sequentially until the number of sequence values specified by the length value is obtained; S416, write a reset flag to the opcode field to clear the status of this read process.
4. The PXE network card initialization method compatible with 64-bit addressing according to claim 2, characterized in that, Step S400 involves reading and writing hardware registers using a custom capability structure, including the following steps: S421, write the address setting flag to the opcode field and write the internal address of the target hardware register to the address / data field; S422, write the length setting flag to the opcode field and write the length value of the data to be written to the address / data field; S423, Write the sequence control identifier to the opcode field and write the sequence number of this write operation to the address / data field; S424, write the numeric identifier to the opcode field and write the register configuration value to be written to the address / data field; S425, Repeat steps S423 to S424, incrementing the sequence number sequentially until the number of data specified by the length value is written; S426, Write a reset flag to the opcode field to clear the status of this write process.
5. The PXE network card initialization method compatible with 64-bit addressing according to claim 3 or 4, characterized in that, When performing read and write operations using a custom capability structure, the supported data widths include 1 byte, 2 bytes, and 4 bytes; and multi-step operations are guaranteed to be atomic through a locking mechanism, preventing interruption by other accesses.
6. The PXE network card initialization method compatible with 64-bit addressing according to claim 1, characterized in that, In step S400, when performing read / write operations through a custom capability structure, the access request issued by the initialization code is transmitted to the system firmware in a predefined data format. The system firmware parses the data format, extracts at least one of the following information: opcode, register address, read / write type, serial number, and data value, and calls the underlying hardware access interface to perform actual operations on the hardware registers based on the extracted information.
7. A non-transitory computer-readable storage medium, wherein the storage medium stores at least one instruction or at least one program segment, characterized in that, The at least one instruction or the at least one program segment is loaded and executed by the processor to implement the 64-bit addressable PXE network card initialization method as described in any one of claims 1-6.
8. An electronic device, characterized in that, Includes a processor and the non-transitory computer-readable storage medium as described in claim 7.