A method for determining the number of nodes of a multi-FPGA cache system for high-speed image acquisition, a medium and an apparatus

By employing a tree-like topology and segmented model in the high-speed image acquisition system, the output rate and propagation delay of the final stage board are accurately calculated, solving the problem of the accuracy of determining the number of nodes and achieving stable system operation and resource optimization.

CN122240549APending Publication Date: 2026-06-19台州光电产业创新中心

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
台州光电产业创新中心
Filing Date
2026-03-30
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing high-speed image acquisition systems, the node number determination method fails to accurately consider the dynamic changes in the output rate of the final stage board and the data propagation delay, resulting in wasted node configuration resources or data loss, and lacks a refined selection mechanism.

Method used

A multi-FPGA buffer system with a tree topology is used to accurately calculate the fill time and minimum number of nodes by establishing a piecewise model of the output rate of the final stage board and a function of the input rate changing with time. The number of nodes is optimized by considering propagation delay and output rate degradation.

Benefits of technology

It enables accurate characterization of the system's dynamic operation process, avoids resource waste and data loss, ensures stable operation of the system under continuous working time requirements, and reduces hardware costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240549A_ABST
    Figure CN122240549A_ABST
Patent Text Reader

Abstract

This invention relates to the field of data acquisition and transmission, and in particular to a method, medium, and device for determining the number of nodes in a multi-FPGA buffer system for high-speed image acquisition. The system adopts a tree topology, consisting of a data input layer, a distributed buffer layer, a convergence layer, and an output layer. The final-stage board is connected to a portable hard drive via USB, and other FPGAs are interconnected via optical fiber. The method determines the first time the buffer is filled by establishing a model of the input rate of the final-stage board changing over time, considering propagation delay and rise time; it introduces an output rate degradation mechanism, and accurately calculates the minimum number of nodes based on the system's continuous operating time requirements. This method avoids the problem of ignoring dynamic rate changes in traditional estimations, reducing hardware resources and lowering costs while meeting operational requirements.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data acquisition and transmission, and in particular to a method, medium, and device for determining the number of nodes in a multi-FPGA buffer system for high-speed image acquisition. Background Technology

[0002] In high-end applications such as industrial visual inspection, ultra-high-speed photography recording, and scientific research data acquisition, high-speed cameras typically require long-term continuous data acquisition, with large data volumes in a single continuous acquisition. In existing technologies, high-speed cameras are usually directly connected to a general-purpose computer system via an interface, where the computer receives the data in real time and writes it to local storage devices, forming a centralized real-time disk writing architecture based on a general-purpose computer. Besides centralized computer architectures, existing technologies also include distributed caching solutions based on single or multiple FPGA development boards. These solutions construct a tree-like topology using multiple FPGA development boards and utilize fiber optic interconnects to achieve data distribution and aggregation.

[0003] However, existing technologies still have the following technical problems in large-scale continuous high-speed acquisition scenarios: First, existing methods for determining the number of nodes are usually based on a simple ratio of total cache capacity to single-board capacity, without considering the dynamic degradation of the output rate of the final-level board (FPGA development board connected to the external hard drive) due to cache overflow during actual operation. This results in either redundant node configuration that wastes hardware resources or insufficient capacity that leads to data loss. Second, existing methods do not consider the propagation delay of data transmission in multi-level topologies and the input rate increase process. They incorrectly assume that the input rate of the final-level board reaches the total input rate at the moment of system startup, causing the calculation of fill time to deviate from the actual physical process, thus affecting the accuracy of node determination. Third, existing methods do not establish the concept of a range between the minimum and maximum number of nodes, lacking a refined node selection mechanism under the premise of meeting capacity requirements.

[0004] Therefore, it is necessary to provide a method that can accurately determine the number of nodes in a multi-FPGA cache system. This method should consider the dynamic changes in the output rate of the final stage board before and after the cache is filled, as well as the impact of data propagation delay on the rise of the input rate of the final stage board. In this way, under the premise of meeting the requirements of continuous system operation time, the minimum number of nodes required by the system can be accurately calculated, and together with the maximum number of nodes under extreme storage scenarios, they constitute the range of selectable number of nodes, providing accurate boundary constraints for subsequent topology optimization. Summary of the Invention

[0005] To address one of the aforementioned technical problems, the present invention adopts the following technical solution:

[0006] According to one aspect of the present invention, a method for determining the number of nodes in a high-speed image acquisition multi-FPGA cache system is provided. The system adopts a tree-like topology structure composed of multiple FPGA development boards, including a data input layer, a distributed cache layer, a convergence layer, and a final output layer. The FPGA development board connected to the mobile hard drive is the final stage board, which is connected to the mobile hard drive via the USB 3.0 protocol. The remaining FPGA development boards are interconnected via optical fiber, and the transmission rate of the optical fiber is greater than that of the USB 3.0.

[0007] The method includes the following steps:

[0008] Based on the maximum permissible rate R of the final stage board input interface max The maximum number of nodes corresponding to the system's maximum cache requirements. The time t1 for the final stage board buffer to be filled for the first time is generated; t1 is determined based on the function r(t) of the final stage board input rate as a function of time, and t1 satisfies:

[0009] ;

[0010] in, This represents the highest continuous output rate of the final stage board. Effective cache capacity for a single FPGA development board; The minimum propagation delay of the system is used to characterize the shortest time required for data to be transmitted from the system input to the final stage board; Positively correlated with the number of system topology layers L;

[0011] r(t) satisfies the following condition:

[0012] ;

[0013] in, , The total system input rate, Rise time is used to characterize the increase in the input rate of the final stage board from zero to... The time required With the maximum number of nodes in the system Positive correlation;

[0014] Based on the system target continuous working time Final stage board degraded output rate , , and Determine the minimum number of nodes in the system. ; The following conditions must be met:

[0015] ;

[0016] in, .

[0017] According to a second aspect of the present invention, a non-transitory computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, implements the above-described method for determining the number of nodes in a multi-FPGA cache system for high-speed image acquisition.

[0018] According to a third aspect of the present invention, an electronic device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the above-described method for determining the number of nodes in a multi-FPGA cache system for high-speed image acquisition.

[0019] This invention has at least one of the following beneficial effects:

[0020] This invention achieves accurate characterization of the dynamic operation of the system by establishing a piecewise model of the final stage board output rate. Specifically, this invention divides the final stage board output rate into the highest continuous output rate before it is filled. and the degraded output rate after filling And through integral equations Precise solution to fill time This leads to the minimum number of nodes. Compared to the simple adoption of existing technologies... The present invention, by introducing an output rate degradation mechanism, accurately reflects the dynamic behavior of the final stage board in actual operation, thereby avoiding the overestimation of the total output data volume caused by ignoring the degradation effect. This makes the minimum number of nodes calculation result more accurate, and under the premise of meeting the system's continuous working time requirements, it can achieve stable operation with minimal hardware resources, effectively reducing the system hardware cost.

[0021] This invention achieves accurate modeling of the data transmission process in a multi-level tree topology by establishing a function r(t) representing the time-varying input rate of the final stage board. Specifically, this invention divides the final stage board input rate into three stages: the zero-input stage (0≤t)... <t d ), linear rising phase (t) d ≤t <t d +t r ) and constant input phase (t≥t d +t r ), where the minimum propagation delay t d Characterizes the shortest time required for data to be transmitted from the input to the final stage board, the rise time t. r Characterizing the input rate increase from zero to The required time. Compared to existing technologies that assume the input rate of the final stage board reaches the total input rate at the moment of system startup. The simplified model of this invention, by introducing propagation delay and rise time, accurately reflects the actual physical process of data forwarding and converging step by step, making the calculation result of filling time t1 closer to the real physical scenario, thereby ensuring the accuracy of the minimum number of nodes calculation. Attached Figure Description

[0022] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0023] Figure 1 This is a flowchart illustrating a method for determining the number of nodes in a multi-FPGA buffer system for high-speed image acquisition, as provided in an embodiment of the present invention.

[0024] Figure 2 This is a topology diagram of a high-speed image acquisition multi-FPGA buffer system provided in an embodiment of the present invention. Detailed Implementation

[0025] The technologies described below, with reference to the accompanying drawings of the embodiments of the present invention, will be clearly and completely described. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0026] As one possible embodiment of the present invention, such as Figure 1 As shown, a method for determining the number of nodes in a multi-FPGA cache system for high-speed image acquisition is provided. The system adopts a tree topology structure composed of multiple FPGA development boards, including a data input layer, a distributed cache layer, a convergence layer, and a final output layer. The FPGA development board connected to the external hard drive is the final-level board, connected via USB 3.0 protocol. The remaining FPGA development boards are interconnected via optical fiber, with the optical fiber transmission rate exceeding that of USB 3.0. Specifically, the optical fiber transmission rate is 500MB / s, while the USB 3.0 transmission rate is 300MB / s.

[0027] Specifically, the data input layer is configured with four independent FPGA development boards, each connected to one of the four high-speed cameras via fiber optic cables to transmit data to the system. There is one final-stage board. The data input layer, aggregation layer, and final output layer, including the data input layer, are all single-layer structures, while the distributed cache layer is a two-layer structure, resulting in a five-layer topology. It should be noted that the number of topology layers L=5 in this embodiment is a preliminary estimate based on the system's input terminals M=4, output terminals S=1, maximum available fiber optic ports on a single board Fmax=8, and the total number of nodes. (Approximately 19 blocks) Typical design values ​​determined comprehensively. In actual engineering, the value of L is usually in the range of 3 to 6 layers, and the specific value can be adjusted according to constraints such as the total number of nodes, port limits, and link bandwidth. In this embodiment, a 5-layer structure can meet the needs of subsequent node number calculations, and this value is generally representative in similar application scenarios.

[0028] By constructing a distributed cache space through multi-node collaboration, the capacity bottleneck of a single node is overcome, and multiple FPGA development boards can build a larger total cache capacity. A full-node cache architecture is constructed, so that all levels of FPGA development boards in the system participate in image data caching and processing, and there are no functional nodes that only undertake forwarding or aggregation functions without participating in caching. The high-speed acquisition stage and the final storage stage are physically decoupled, and data caching is performed using only the onboard memory of the FPGA development board during the acquisition process, without relying on real-time writing from external storage devices. A tree-like topology based on high-speed fiber optic interconnection is constructed, and multi-channel parallel transmission capability is formed through parallel fiber optic links between multiple levels of FPGA development boards, thereby improving the overall bandwidth carrying capacity of the system.

[0029] The method includes the following steps:

[0030] S100: Based on the maximum permissible speed R of the final stage board input interface max The maximum number of nodes corresponding to the system's maximum cache requirements. The time t1 is the time when the final stage board buffer is first filled.

[0031] The reason this step focuses on the final-level board is that, in this system architecture, it is the only node connected to the external hard drive. Its output interface is USB 3.0, with a transfer rate (300MB / s) far lower than the front-end fiber optic transfer rate (500MB / s × 4 = 2000MB / s). Therefore, a significant difference exists between the input and output rates at the final-level board, making it the most likely bottleneck for data accumulation and buffer overflow. Furthermore, the input rate of the final-level board does not instantaneously reach the total system input rate; rather, it is constrained by the propagation process of data being forwarded and converged level by level in the multi-level tree topology. Therefore, a function r(t) representing the time-varying input rate of the final-level board is needed to accurately describe this dynamic process.

[0032] t1 is determined based on the function r(t) of the input rate of the final stage board as a function of time, and t1 satisfies:

[0033] .

[0034] in, This represents the highest continuous output rate of the final stage board. This refers to the effective cache capacity of a single FPGA development board. The minimum propagation delay of the system is used to characterize the shortest time required for data to be transmitted from the system input to the final stage board. It is positively correlated with the number of system topology layers L.

[0035] r(t) satisfies the following condition:

[0036] .

[0037] in, , The total system input rate, Rise time is used to characterize the increase in the input rate of the final stage board from zero to... The time required With the maximum number of nodes in the system Positive correlation.

[0038] Specifically,

[0039] satisfy .in, This is the preset single-topology layer forwarding delay coefficient.

[0040] Each time data passes through a layer of the FPGA development board, it needs to undergo processing such as receiving, buffering, and forwarding, introducing fixed processing delays and fiber optic transmission delays. Therefore, the propagation delay is linearly positively correlated with the number of topology layers. The value of L can be predetermined based on engineering constraints such as the number of system inputs, the number of outputs, and the capacity of single-board ports. In this embodiment, L=5.

[0041] satisfy .in, The number of nodes is the preset influence coefficient.

[0042] The more nodes there are, the more dispersed the data distribution becomes, the smoother the convergence to the final board, and the longer the rise time; the two are linearly positively correlated. (Topology layer number L and maximum number of nodes) All of these are known parameters from the system design phase, and L is determined by the system architecture design (e.g., Figure 2 (The 5-layer structure shown) The value can be obtained through subsequent S300 steps or by iteratively correcting the initial value based on experience.

[0043] Existing technologies assume that the input rate of the final stage board reaches the total input rate at the moment of system startup. This simplification ignores the propagation delays and rate increases that occur as data is forwarded and converged at each level in a multi-level tree-like topology. This simplified model leads to fill time... The calculation results deviate from the actual physical process, thus affecting the accuracy of determining the number of nodes.

[0044] Therefore, this step accurately characterizes the dynamic change process of the input rate of the final stage board by establishing a piecewise linear input rate function r(t). Specifically, the input process is divided into three stages: the first stage (0≤t) <t d The data has not yet been transmitted to the final stage board, and the input rate is 0; the second stage (t d ≤t <t d +t r The data begins to reach the final stage board, and the input rate increases linearly with time, with an increase slope of... The third stage (t≥t) d +t r The input rate has reached its limit (R). lim And remain constant. Among them, This demonstrates the impact of the number of topological layers on propagation delay: the more layers there are, the more nodes the data passes through, and the greater the propagation delay. This demonstrates the impact of system size on rise time: the more nodes there are, the more dispersed the data distribution, the more gradual the rate of convergence to the final board, and the longer the rise time.

[0045] Based on the above model, this step incorporates the two key physical parameters, propagation delay and rise time, into the calculation of fill time, thus enabling... The solution is closer to real-world physics. Integral equations The first item on the left indicates from t d The first term represents the total amount of data received by the final-level board during time period t1, and the second term represents the total amount of data output by the final-level board during the same time period. The difference between the two terms equals the amount of data accumulated in the final-level board's buffer. When the accumulated amount equals the final-level board's buffer capacity C... unit The time t1 is the moment when the cache is first filled.

[0046] Through this step, the system can accurately calculate the time t1 when the final stage board buffer is first filled. This is different from the existing technology that assumes the input rate reaches its peak instantaneously. The simplified model is obtained by introducing a propagation delay t in this step. d and rise time t rThis accurately reflects the actual physical process of data transmission in a multi-level topology, making the calculation result of t1 more precise. For example, when the number of system nodes is large, t r The input rate is relatively large, and the increase is slow, which will correspondingly delay the time t1 for the final stage board buffer to fill. This characteristic cannot be reflected in existing technologies. The accurate value of t1 provides accurate boundary conditions for subsequent calculations of the minimum number of nodes.

[0047] S200: Based on the system target continuous working time Final stage board degraded output rate , , and Determine the minimum number of nodes in the system. . The following conditions must be met: R low R in R high t1 and C unit N max

[0048] .

[0049] in, Maximum continuous output rate of the final stage board The actual continuous write speed of the USB 3.0 interface is used, with a reduced output speed. When the final stage board buffer overflows, the output rate decreases to a preset speed threshold. It can be 0.8 .

[0050] Existing technologies ignore the dynamic degradation of the output rate of the final stage board due to buffer overflow, and simply assume that the output rate is constant, which leads to an overestimation of the total output data volume, resulting in an underestimation of the number of nodes and potentially causing insufficient capacity in actual operation.

[0051] Therefore, this step introduces a segmented output model: before time t1, the output rate of the final stage board remains at the highest continuous output rate R. high After time t1, the final stage board buffer is full, and the output rate is reduced to R. lo w. The total amount of input data is The total output data volume is The difference between the two is the total amount of data that must be temporarily stored in the onboard memory of each FPGA development board during the entire working period, which is the minimum total cache requirement of the system. This total cache requirement is divided by the effective cache capacity C of a single board. unit Then round up to get the minimum number of nodes N required by the system. min .

[0052] Through this step, the system can accurately calculate the minimum number of nodes required to meet the continuous working time requirement. Compared to the simple approach used in existing technologies... The estimation method, in this step, accurately reflects the dynamic behavior of the final stage board during actual operation by introducing an output rate degradation mechanism. Because... After t1, the output rate decreases, and the total output data volume decreases. Therefore, the minimum total cache requirement is greater than the net input data volume, and the required number of nodes increases accordingly. This calculation method avoids the problem of under-configuration of the number of nodes due to ignoring the degradation effect, ensuring that the system will not lose data due to cache overflow during actual operation.

[0053] S300: According to and ,generate . satisfy: .

[0054] This step defines the maximum number of nodes under extreme storage scenarios. When the output rate of the final stage board is zero (i.e., ...) , When all input data is cached in the onboard memory of each FPGA development board, the total cache requirement is equal to the total amount of input data. The corresponding number of nodes is This value serves as the upper limit for the number of system nodes, and is related to the minimum number of nodes. Together they constitute the selectable range of the number of nodes.

[0055] This step establishes the upper bound of the number of nodes in the system. The corresponding working condition is a scenario where data acquisition and data output transmission are decoupled, that is, the system only performs data input while the output rate is zero. All input data is temporarily stored in the onboard memory of each FPGA development board. Under this condition, the system needs to have the ability to accommodate all input data. The corresponding working condition is a dynamic scenario in which data acquisition and data output transmission occur simultaneously. The system is like a reservoir, with continuous data input at the front end and continuous data output at the back end. Even if the output is degraded due to buffer overflow on the final board, the system can still ensure that the buffer capacity is sufficient to accommodate the amount of data accumulated during the dynamic process, so that the entire dynamic process is always in a stable and controllable state. and The shared interval division ensures that the system can operate normally under both dynamic transmission and fully buffered conditions, providing clear constraints for subsequent node number optimization.

[0056] As another possible embodiment of the present invention, a method for determining the optimal number of nodes within the node range determined in the above embodiments is provided. This method specifically includes the following steps:

[0057] S400: In The total number of candidate nodes selected is N.

[0058] This step will use the calculations obtained in Example 1. and As the lower and upper boundaries of the number of nodes, in the interval The total number of candidate nodes N is selected within this range. This range provides a clear search scope for subsequent topology optimization, ensuring that the number of all candidate nodes meets the continuous working time requirement (not less than...). And it will not exceed the hardware requirements in extreme scenarios (not higher than) ).

[0059] This step limits the selection of the number of nodes to a feasible range, avoiding resource waste or insufficient capacity caused by blind selection. This range considers both normal operation scenarios with degraded output and extreme storage scenarios with zero output, and has clear physical meaning and engineering boundaries, providing a foundation for subsequent fine-tuning.

[0060] S500: Based on the number of system inputs M and outputs S, construct a tree topology with L layers, where the number of nodes in each layer is N1, N2, ..., NL, and satisfies N1=M, NL=S, and .

[0061] This step decomposes the total number of candidate nodes N into the number of nodes in each layer N1, N2, ..., NL. The number of nodes in the first layer N1 equals the number of input terminals M (e.g., 4 high-speed cameras), and the number of nodes in the last layer NL equals the number of output terminals S (e.g., 1 external hard drive). The sum of the number of nodes in each layer equals the total number of candidate nodes N. This decomposition provides the foundational data for subsequent engineering constraint verification.

[0062] S600: Verify whether the topology corresponding to the total number of candidate nodes N satisfies the following constraints:

[0063] Capacity constraints: ,in .

[0064] Port constraint: For any node j, the number of its input fibers... Number of output optical fibers The sum shall not exceed the maximum number of available fiber optic ports on a single board. .

[0065] Link throughput constraint: For any physical link e, its actual data carrying rate No more than the effective throughput limit of a single link .

[0066] This step introduces a triple constraint for verification. The capacity constraint ensures that the sum of the total cache capacity of all nodes at each layer is greater than or equal to the system's minimum total cache requirement. This is a fundamental prerequisite for selecting the number of nodes. Port constraints ensure that the number of fiber optic ports on each FPGA development board does not exceed the hardware limit, avoiding connection failures due to insufficient ports. Link throughput constraints ensure that the data transmission rate of each fiber optic link does not exceed the physical limit, avoiding data congestion or loss due to insufficient bandwidth.

[0067] Through this step, the system verifies the engineering feasibility of the total number of candidate nodes, selecting feasible configurations that simultaneously meet capacity requirements, hardware port limitations, and link bandwidth limitations. This verification and selection mechanism ensures that the selected number of nodes is not only theoretically feasible but also can run stably on actual hardware platforms.

[0068] S700: When there are multiple candidate nodes whose total number satisfies all of the above constraints, select the configuration that makes the proportion of nodes in adjacent layers equal or reciprocals of each other and has the smallest total number of nodes as the target number of nodes.

[0069] The ratio of the number of nodes in adjacent layers is: or And the ratio is a positive integer or one-integer.

[0070] Preferably, the ratio of the number of nodes in adjacent layers is 2 or 1 / 2.

[0071] This step introduces the regularity of inter-layer ratios as an optimization criterion. When the ratio of the number of nodes in adjacent layers is equal or reciprocal to each other, the topology exhibits a regular expansion and convergence pattern. For example, a ratio of 2 indicates a doubling of expansion from the previous layer to the next, and a ratio of 1 / 2 indicates a doubling of convergence. When all inter-layer ratios are the same small integer or its reciprocal, the topology is most regular, and the connection relationships between nodes are highly uniform. Among multiple candidate node numbers that satisfy the constraints, the configuration with the smallest total number of nodes is selected to minimize hardware costs.

[0072] Through this step, the system can select the optimal configuration for project implementation within the feasible number of nodes. Choosing a configuration where the ratio of nodes in adjacent layers is equal or reciprocal to each other has the core advantage of avoiding uneven cache resource utilization caused by inconsistent ratios within the same layer. Specifically, if multiple ratios exist within an adjacent layer (e.g., some upper-level nodes are mapped 2:1 to lower-level nodes, while others are mapped 3:1), the amount of data output from the upper-level nodes will be unevenly distributed to the lower-level nodes: for nodes with a 2:1 mapping, each lower-level node needs to handle more data; for nodes with a 3:1 mapping, each lower-level node handles less data. This imbalance can cause some FPGA development boards to fill their cache capacity prematurely, while other boards still have a large amount of idle cache, resulting in a decrease in overall cache resource utilization. Simultaneously, during data scheduling and management, inconsistent mapping relationships introduce complex routing logic and address mapping rules, increasing hardware design complexity and timing analysis difficulty. Conversely, when the ratio of adjacent layer nodes is uniformly 2 or 1 / 2, the data transmission relationship between nodes in each layer is completely consistent, the data flow is clear, and the load is balanced. This is beneficial for unifying the data splitting, merging, channel arbitration, and cache management logic on the FPGA side, and the address mapping relationship is regularized, thereby reducing the complexity of system debugging and verification. At the same time, choosing the configuration with the minimum total number of nodes minimizes hardware costs while satisfying all constraints.

[0073] The following uses a set of specific numerical values ​​to fully illustrate the two embodiments described above, in order to demonstrate the practical application effect of the method of the present invention.

[0074] Configure system parameters:

[0075] Total system input rate (4 cameras, 500 MB / s per camera);

[0076] Maximum allowable speed of the final stage board input interface (Maximum input fiber bandwidth);

[0077] Maximum continuous output rate of the final stage board (Actual write speed of USB 3.0);

[0078] Final stage board degraded output rate

[0079] Effective cache capacity of a single FPGA development board ;

[0080] The target continuous working time is T=75s;

[0081] The system topology has 5 layers.

[0082] The forwarding delay coefficient β of a single topology layer is 0.01s / layer;

[0083] The influence coefficient of the number of nodes is γ = 0.02s / node;

[0084] Step 1: Calculate the range of the number of nodes

[0085] (1) Calculate the minimum number of nodes

[0086] First, determine the upper limit of the input rate of the final stage board:

[0087] ;

[0088] Calculate the minimum propagation delay:

[0089] ;

[0090] Calculate the rise time. Because... ,and We can first substitute the preliminary estimate, and then iterate to obtain... ,but:

[0091] ;

[0092] Solve for the time t1 when the final stage board buffer first fills. It is determined that filling occurs during the constant rate phase; substituting this into the constant rate phase formula:

[0093] ;

[0094] Calculate the minimum number of nodes:

[0095] ;

[0096] Round up .

[0097] (2) Calculate the maximum number of nodes

[0098] The maximum number of nodes corresponds to the extreme storage condition where the output rate is zero:

[0099] ;

[0100] Round up .

[0101] Therefore, the number of system nodes can be selected in the range of [17, 19].

[0102] Step 2: Optimize the number of nodes

[0103] In candidate node numbers of 17, 18, and 19, capacity constraints, port constraints, and link throughput constraints are verified, and topology regularity is examined.

[0104] Analysis revealed that N=17 and N=18 cannot construct a regular topology where adjacent layers have equal or reciprocal proportions. For N=19, a five-layer topology (4,8,4,2,1) can be constructed, with adjacent layers having proportions of 2 or 1 / 2, satisfying all engineering constraints.

[0105] Therefore, N=19 is selected as the target number of nodes.

[0106] Ultimately, the system utilizes 19 FPGA development boards deployed in a five-layer topology (4, 8, 4, 2, 1). The data input layer consists of four boards connected to four high-speed cameras; the distributed buffer layer comprises eight boards in the second layer and four boards in the third layer; the aggregation layer consists of two boards in the fourth layer; and the final output layer consists of one board connected to a portable hard drive via USB 3.0. All FPGA development boards are interconnected via fiber optic cables, forming a... Figure 2 The tree-like topology shown.

[0107] Furthermore, although the steps of the method in this disclosure are described in a specific order in the accompanying drawings, this does not require or imply that the steps must be performed in that specific order, or that all the steps shown must be performed to achieve the desired result. Additional or alternative steps may be omitted, multiple steps may be combined into one step, and / or a step may be broken down into multiple steps.

[0108] From the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein can be implemented by software or by combining software with necessary hardware. Therefore, the technical solutions according to the embodiments of this disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (such as a CD-ROM, USB flash drive, external hard drive, etc.) or on a network, including several instructions to cause a computing device (such as a personal computer, server, mobile terminal, or network device, etc.) to execute the methods according to the embodiments of this disclosure.

[0109] In an exemplary embodiment of this disclosure, an electronic device capable of implementing the above-described method is also provided.

[0110] Those skilled in the art will understand that various aspects of the present invention can be implemented as systems, methods, or program products. Therefore, various aspects of the present invention can be specifically implemented in the following forms: entirely in hardware, entirely in software (including firmware, microcode, etc.), or in a combination of hardware and software, collectively referred to herein as “circuit,” “module,” or “system.”

[0111] An electronic device according to this embodiment of the invention. The electronic device is merely an example and should not be construed as limiting the functionality or scope of the embodiments of the invention.

[0112] Electronic devices are manifested in the form of general-purpose computing devices. Components of an electronic device may include, but are not limited to: at least one processor, at least one memory, and buses connecting different system components (including memory and processor).

[0113] The memory stores program code that can be executed by a processor, causing the processor to perform the steps described in the "Exemplary Methods" section above, according to various exemplary embodiments of the present invention.

[0114] The storage may include readable media in the form of volatile storage, such as random access memory (RAM) and / or cache memory, and may further include read-only memory (ROM).

[0115] The storage may also include programs / utilities having a set (at least one) of program modules, including but not limited to: an operating system, one or more applications, other program modules, and program data, each or some combination of these examples may include an implementation of a network environment.

[0116] A bus can represent one or more of several bus architectures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus that uses any of the various bus architectures.

[0117] The electronic device can also communicate with one or more external devices (e.g., keyboards, pointing devices, Bluetooth devices, etc.), one or more devices that enable a user to interact with the electronic device, and / or any device that enables the electronic device to communicate with one or more other computing devices (e.g., routers, modems, etc.). This communication can be performed via input / output (I / O) interfaces. Furthermore, the electronic device can communicate with one or more networks (e.g., local area networks (LANs), wide area networks (WANs), and / or public networks, such as the Internet) via a network adapter. The network adapter communicates with other modules of the electronic device via a bus. It should be understood that, although not shown in the figures, other hardware and / or software modules can be used in conjunction with the electronic device, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.

[0118] In exemplary embodiments of this disclosure, a computer-readable storage medium is also provided, on which a program product capable of implementing the methods described above is stored. In some possible embodiments, various aspects of the present invention may also be implemented as a program product comprising program code that, when the program product is run on a terminal device, causes the terminal device to perform the steps of the various exemplary embodiments of the present invention described in the "Exemplary Methods" section above.

[0119] The program product may employ any combination of one or more readable media. A readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of readable storage media (a non-exhaustive list) include: electrical connections having one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.

[0120] Computer-readable signal media may include data signals propagated in baseband or as part of a carrier wave, carrying readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A readable signal medium may also be any readable medium other than a readable storage medium, capable of sending, propagating, or transmitting programs for use by or in conjunction with an instruction execution system, apparatus, or device.

[0121] The program code contained on the readable medium may be transmitted using any suitable medium, including but not limited to wireless, wired, optical fiber, RF, etc., or any suitable combination thereof.

[0122] Program code for performing the operations of this invention can be written in any combination of one or more programming languages, including object-oriented programming languages ​​such as Java and C++, and conventional procedural programming languages ​​such as C or similar languages. The program code can execute entirely on the user's computing device, partially on the user's device, as a standalone software package, partially on the user's computing device and partially on a remote computing device, or entirely on a remote computing device or server. In cases involving remote computing devices, the remote computing device can be connected to the user's computing device via any type of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (e.g., via the Internet using an Internet service provider).

[0123] Furthermore, the accompanying drawings are merely illustrative of the processes included in the method according to exemplary embodiments of the present invention and are not intended to be limiting. It is readily understood that the processes shown in the above drawings do not indicate or limit the temporal order of these processes. Additionally, it is readily understood that these processes may be executed synchronously or asynchronously, for example, in multiple modules.

[0124] It should be noted that although several modules or units for the device used to perform actions have been mentioned in the detailed description above, this division is not mandatory. In fact, according to embodiments of this disclosure, the features and functions of two or more modules or units described above can be embodied in one module or unit. Conversely, the features and functions of one module or unit described above can be further divided and embodied by multiple modules or units.

[0125] The above are merely specific embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A method for determining the number of nodes in a multi-FPGA buffer system for high-speed image acquisition, characterized in that, The system adopts a tree-like topology consisting of multiple FPGA development boards, including a data input layer, a distributed cache layer, a convergence layer, and a final output layer. The FPGA development board connected to the mobile hard drive is the final stage board, which is connected to the mobile hard drive via the USB 3.0 protocol. The other FPGA development boards are interconnected via optical fiber, and the transmission rate of the optical fiber is greater than that of USB 3.

0. The method includes the following steps: Based on the maximum permissible rate R of the final stage board input interface max The maximum number of nodes corresponding to the system's maximum cache requirements. The time t1 for the final stage board buffer to be filled for the first time is generated; t1 is determined based on the function r(t) of the final stage board input rate as a function of time, and t1 satisfies: ; in, This represents the highest continuous output rate of the final stage board. Effective cache capacity for a single FPGA development board; The minimum propagation delay of the system is used to characterize the shortest time required for data to be transmitted from the system input to the final stage board; Positively correlated with the number of system topology layers L; r(t) satisfies the following condition: ; in, , The total system input rate, Rise time is used to characterize the increase in the input rate of the final stage board from zero to... The time required With the maximum number of nodes in the system Positive correlation; Based on the system target continuous working time Final stage board degraded output rate , , and Determine the minimum number of nodes in the system. ; The following conditions must be met: ; in, .

2. The method according to claim 1, characterized in that, satisfy ;in, This is the preset single-topology layer forwarding delay coefficient; satisfy ;in, The number of nodes is the preset influence coefficient.

3. The method according to claim 1, characterized in that, The data input layer is equipped with four independent FPGA development boards, which are connected to four high-speed cameras via optical fibers to transmit data to the system. There is one final stage board. The data input layer, aggregation layer and final output layer are all composed of a single layer, while the distributed cache layer is composed of two layers.

4. The method according to claim 1, characterized in that, The maximum continuous output rate of the final stage board The actual continuous write rate of the USB 3.0 interface, the degraded output rate The output rate is reduced to a preset speed threshold after the final stage board buffer overflows.

5. The method according to claim 1, characterized in that, The method further includes: according to and ,generate ; satisfy: .

6. The method according to claim 1, characterized in that, exist The total number of candidate nodes selected is N; Based on the number of system inputs M and outputs S, construct a tree topology with L layers, where the number of nodes in each layer is N1, N2, ..., NL, and N1 = M, NL = S. ; Verify whether the topology corresponding to the total number of candidate nodes N satisfies the following constraints: Capacity constraints: ,in ; Port constraint: For any node j, the number of its input fibers... Number of output optical fibers The sum shall not exceed the maximum number of available fiber optic ports on a single board. ; Link throughput constraint: For any physical link e, its actual data carrying rate No more than the effective throughput limit of a single link ; When there are multiple candidate nodes whose total number of nodes satisfies all of the above constraints, the configuration that makes the proportion of nodes in adjacent layers equal or reciprocals of each other and has the smallest total number of nodes is selected as the target number of nodes. The ratio of adjacent layer nodes is or And the ratio is a positive integer or one-integer.

7. The method according to claim 6, characterized in that, The ratio of the number of nodes in adjacent layers is either 2 or 1 / 2.

8. The method according to claim 6, characterized in that, The optical fiber has a transmission rate of 500MB / s, while the USB 3.0 has a transmission rate of 300MB / s.

9. A non-transitory computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by the processor, it implements the method for determining the number of nodes in a multi-FPGA cache system for high-speed image acquisition as described in any one of claims 1 to 8.

10. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the method for determining the number of nodes in a multi-FPGA cache system for high-speed image acquisition as described in any one of claims 1 to 8.