A hierarchical storage method and system for intermediate data in optical detection image processing
By using a hierarchical storage architecture and data dependency topology graph to identify intermediate data types, monitor the storage layer status in real time, calculate priority scores, and store data in the appropriate layer, the system solves the problems of invalid storage resource occupation and access latency in optical inspection systems, thereby improving system performance and stability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- GUANGDONG SANENSHI TECH CO LTD
- Filing Date
- 2026-04-02
- Publication Date
- 2026-06-30
Smart Images

Figure CN122048627B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the technical field of data storage, and in particular to a hierarchical storage method and system for intermediate data of optical detection image processing. Background Technology
[0002] With the accelerating pace of modern industrial automation and the development of the semiconductor, precision electronic component, and high-precision optical component manufacturing industries, optical automatic inspection (AOI) technology has become a key link in ensuring product quality and production yield. To meet the detection accuracy requirements at the micron and even nanometer levels, existing optical inspection systems generally adopt high-resolution, high-frame-rate imaging modules. These devices continuously generate massive streams of raw image data during operation. In the image processing pipeline, the raw data needs to undergo serialization processing by multiple operators to generate a large amount of intermediate data. This intermediate data exhibits significant heterogeneity in terms of scale, access frequency, and lifecycle: some data is accessed frequently only within a very short time window, some data needs to maintain cross-frame correlation to ensure contextual continuity, while other data exists only momentarily and is accessed very infrequently.
[0003] For the above scenarios, existing technologies generally adopt a flat storage architecture based on general-purpose memory or video memory to manage such intermediate data. However, this existing architecture has fundamental defects: First, the existing technology architecture is difficult to identify the heterogeneity characteristics of intermediate data, resulting in high-speed storage resources being ineffectively occupied by data with long lifecycles but low access frequency, while critical data with high frequency of access falls into access delay dilemma due to resource competition; Second, the existing technology architecture lacks the ability to dynamically perceive the data dependencies in the processing flow, and cannot implement differentiated storage scheduling strategies based on the real-time value and urgency of the data, making data placement decisions lack targeting. This blindness is very likely to cause cache jitter and bus communication congestion, causing computing units to be idle for a long time due to data supply interruption. Ultimately, this seriously restricts the throughput, energy efficiency and long-term operational stability of the detection system, making it difficult to meet the urgent needs of modern high-precision manufacturing environments for real-time processing and reliability. Summary of the Invention
[0004] To address the aforementioned shortcomings, this application provides a hierarchical storage method and system for intermediate data in optical detection image processing.
[0005] The above-mentioned objective of this application is achieved through the following technical solution:
[0006] A method for hierarchical storage of intermediate data in optical detection image processing includes the following steps:
[0007] In response to an image processing task, a corresponding processing flow is obtained, the processing flow including a number of image processing operators executed in sequence and the data dependencies between the image processing operators;
[0008] A data dependency topology graph is generated based on the processing flow, and intermediate data and its data types are identified. The data types include transient local data, window reuse data, cross-frame associated data, and final state result data.
[0009] During the execution of image processing tasks, the remaining capacity and access frequency of each layer in the pre-built hierarchical architecture are monitored in real time. The hierarchical architecture includes a register storage layer, a cache layer, a dynamic storage layer, and a persistent storage layer.
[0010] Based on the monitored remaining capacity and access frequency, the hierarchical priority score of intermediate data is calculated by combining the data dependency topology graph.
[0011] Based on the hierarchical priority score, intermediate data is mapped and stored to the corresponding target storage layer. When the hierarchical priority is higher than the preset first threshold, the corresponding intermediate data is stored in the register storage layer or the cache layer and is prohibited from being replaced during its lifetime.
[0012] The second objective of this invention is achieved through the following technical solution:
[0013] A hierarchical storage system for intermediate data in optical detection image processing includes:
[0014] The process acquisition module is used to acquire the corresponding processing flow in response to the image processing task. The processing flow includes a number of image processing operators executed in sequence and the data dependencies between the image processing operators.
[0015] The topology graph generation module is used to generate a data dependency topology graph based on the processing flow and to identify intermediate data and its data types. The data types include transient local data, window reuse data, cross-frame correlation data, and final state result data.
[0016] The real-time monitoring module is used to monitor the remaining capacity and access frequency of each layer in the pre-built hierarchical architecture during the execution of image processing tasks. The hierarchical architecture includes a register storage layer, a cache layer, a dynamic storage layer, and a persistent storage layer.
[0017] The score calculation module is used to calculate the hierarchical priority score of intermediate data based on the monitored remaining capacity and access popularity, combined with the data dependency topology graph.
[0018] The execution module is used to map and store intermediate data to the corresponding target storage layer according to the hierarchical priority score. When the hierarchical priority is higher than the preset first threshold, the corresponding intermediate data is stored in the register storage layer or the cache layer and replacement is prohibited during its lifetime.
[0019] This application also relates to a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the above-described method for hierarchical storage of intermediate data for optical detection image processing.
[0020] This application also relates to a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the above-described method for hierarchical storage of intermediate data in optical detection image processing.
[0021] In summary, the hierarchical storage method and system for intermediate data in optical detection image processing provided in this application obtains the processing flow in response to the image processing task, generates a data dependency topology graph and identifies data types, monitors the hierarchical architecture status in real time, calculates the hierarchical priority score, and maps and stores the data to the target layer. This can effectively solve the problems of invalid storage resource occupation and high access latency in the prior art, and has the effects of improving processing efficiency, reducing data access latency, optimizing storage resource utilization, and improving system throughput and stability. Attached Figure Description
[0022] Figure 1 This is a flowchart of one step of an embodiment of a hierarchical storage method for intermediate data in optical detection image processing according to this application;
[0023] Figure 2 This is an example diagram of the processing flow in an embodiment of a hierarchical storage method for intermediate data in optical detection image processing according to this application;
[0024] Figure 3 This is a schematic diagram of the high-resolution locking storage process of intermediate data in an embodiment of the hierarchical storage method for intermediate data in optical detection image processing according to this application. Detailed Implementation
[0025] The following is in conjunction with the appendix Figures 1-3 This application will be described in further detail.
[0026] In traditional optical inspection image processing systems, intermediate data exhibits significant heterogeneity in terms of scale, access frequency, and lifecycle. Existing flat storage architectures fail to effectively differentiate these characteristics, leading to high-speed storage resources being ineffectively occupied by infrequently accessed, long-cycle data, while frequently accessed, short-cycle data faces access latency bottlenecks. Furthermore, existing system architectures lack the ability to perceive the inherent data dependencies within the processing flow, and cannot implement differentiated storage scheduling based on the real-time value and urgency of the data. This results in blind data placement decisions, easily causing frequent cache jitter and bus congestion. Consequently, computing units remain idle due to data supply stagnation, negatively impacting system throughput, energy efficiency, and continuous operational stability.
[0027] For example, in the scenario of semiconductor wafer surface defect detection, the image processing pipeline sequentially executes operators such as noise suppression, edge enhancement, feature matching, and defect determination. Among them, the intermediate data output by the noise suppression operator is transient local data, which has a short lifespan and low access frequency; the sliding window feature data generated by the edge enhancement operator is window reuse data, which needs to be accessed frequently in multiple subsequent operators; the cross-frame motion compensation data generated by the feature matching operator is cross-frame correlated data, which depends on the correlation information of adjacent frames. Under the existing storage architecture, transient local data may be incorrectly retained in the cache layer, occupying important resources, while window reuse data may be frequently replaced due to insufficient cache space, resulting in repeated calculation overhead. The transmission of cross-frame correlated data may be delayed due to bus congestion, causing the defect determination operator to wait for data input, resulting in a decrease in the utilization of computing units and interruption of the detection process.
[0028] If the above problems are not addressed, the data supply stagnation will intensify as the optical inspection image processing system runs longer, the idle time of computing resources will continue to increase, and the system throughput will show a downward trend. At the same time, the cumulative effect of cache jitter and bus congestion may cause the storage subsystem to time out, leading to task execution failure. In continuous high-precision inspection scenarios, these problems will continue to reduce the reliability of the system, making it impossible to guarantee the integrity and timeliness of the inspection results, ultimately affecting the effectiveness of product quality control.
[0029] In response, this application discloses a hierarchical storage method for intermediate data in optical detection image processing. In one embodiment, as follows: Figure 1 As shown, the specific steps include the following:
[0030] S10: In response to an image processing task, obtain the corresponding processing flow, which includes a number of image processing operators executed in sequence and the data dependencies between the image processing operators.
[0031] In this embodiment, intermediate data in optical inspection image processing refers to non-final results generated by image processing operators on the original image data in the optical inspection system, but which are particularly important for subsequent processing. Intermediate data may include filtering results, feature point sets, region segmentation masks, etc. An image processing task refers to a set of image processing operations performed in the field of optical inspection to achieve a specific detection target, such as defect detection, dimensional measurement, and character recognition. The processing flow refers to the predefined or dynamically generated execution order of image processing operators and the data transmission paths between them to complete the image processing task. An image processing operator refers to a basic functional unit in the field of image processing that performs specific image operations, such as image filtering, edge detection, feature extraction, and image transformation. Data dependency refers to the logical relationship between image processing operators when the output data of one image processing operator is used as the input data of one or more other image processing operators in the processing flow.
[0032] Specifically, in response to an image processing task, the corresponding processing flow is obtained. This processing flow includes several image processing operators executed in sequence and the data dependencies between the image processing operators. In practical applications, the processing flow can be obtained in various ways. For example, the system administrator can manually configure and input the sequence of image processing operators and their data flow according to specific detection requirements. Another way is to load a standard processing flow that matches the current image processing task type from a predefined template library.
[0033] S20: Generate a data dependency topology graph based on the processing flow, and identify intermediate data and its data types, including transient local data, window reuse data, cross-frame associated data, and final state result data;
[0034] In this embodiment, the data dependency topology graph refers to a graphical representation used to describe the various image processing operators and their data dependencies in an image processing task. Nodes represent operators or intermediate data, and edges represent data flow directions. Intermediate data specifically refers to data generated by one image processing operator and consumed by subsequent image processing operators in the image processing flow, excluding original input data and final output data. Data types include transient local data, window reused data, cross-frame associated data, and final result data. Transient local data refers to temporary data with a very short lifespan and low access frequency. Window reused data refers to data that is frequently accessed and reused in sliding window or local region processing. Cross-frame associated data refers to data that needs to be associated or accumulated across different frames in multi-frame image processing. Final result data refers to data that serves as a stage result or final output after a certain sub-processing flow is completed.
[0035] Specifically, based on the acquired processing flow, a data dependency topology graph is generated, and intermediate data and its data types are identified. These data types include transient local data, window reuse data, cross-frame correlated data, and final state result data. When generating the topology graph, a static analysis of the processing flow can be performed, treating each image processing operator as a node and representing the input-output relationship between image processing operators as directed edges, thus constructing a preliminary topology structure. For intermediate data identification, all data that is neither the original input nor the final output can be labeled as intermediate data. Regarding data type identification, intermediate data can be initially classified as general temporary data or final output data based on its position in the topology graph and its relationship with adjacent image processing operators. For example, if the output of an image processing operator is only consumed by the next image processing operator and not reused by other image processing operators, it can be initially labeled as temporary data; if the output of an image processing operator is the final result of the entire process, it can be labeled as final state result data.
[0036] S30: During the execution of the image processing task, the remaining capacity and access frequency of each layer in the pre-built hierarchical architecture are monitored in real time. The hierarchical architecture includes a register storage layer, a cache layer, a dynamic storage layer, and a persistent storage layer.
[0037] In this embodiment, the pre-built hierarchical architecture refers to a pre-constructed multi-tiered storage system structure, designed to store data in storage media with different performance characteristics based on data access speed, capacity, cost, and other characteristics. The hierarchical architecture includes a register storage layer, a cache layer, a dynamic storage layer, and a persistent storage layer. These four layers, from top to bottom, correspond to lower access latency, smaller capacity, and higher cost, respectively. The register storage layer refers to the highest-speed storage unit located inside the processor, which has extremely low access latency but typically has very limited capacity. The cache layer refers to a small-capacity, high-speed memory located between the processor and main memory, used to store data frequently accessed by the processor to reduce access time to main memory. The dynamic storage layer typically refers to main memory, which has relatively large capacity and moderate access speed. The persistent storage layer typically refers to hard disk drives or solid-state drives, which have the largest capacity and the lowest access speed, used for long-term data storage.
[0038] Specifically, during the execution of image processing tasks, the remaining capacity and access frequency of each layer in the pre-built hierarchical architecture are monitored in real time. This hierarchical architecture includes a register storage layer, a cache layer, a dynamic storage layer, and a persistent storage layer. To achieve real-time monitoring, a periodic query method can be adopted, with a pre-set monitoring module periodically querying each storage layer for its current available storage space. For monitoring access frequency, a preset counter can be incremented when data is accessed. This counter records the number of times the data is accessed within a certain time window.
[0039] S40: Based on the monitored remaining capacity and access popularity, calculate the hierarchical priority score of intermediate data by combining the data dependency topology graph;
[0040] In this embodiment, remaining capacity refers to the amount of storage space currently available in each storage layer of the hierarchical architecture; access popularity refers to the frequency or intensity of access to a certain data or storage area within a specific time period; and hierarchical priority score refers to a value calculated based on multiple attributes of intermediate data and the real-time status of the storage system, used to guide the decision on the storage location of intermediate data in the hierarchical architecture.
[0041] Specifically, based on the monitored remaining capacity and access frequency, and combined with the data dependency topology graph, the hierarchical priority score of intermediate data is calculated. The calculation of the hierarchical priority score can adopt a weighted summation model. For example, fixed weight coefficients can be set for the estimated access frequency, remaining lifespan, and data size of the intermediate data. Then, these attribute values are multiplied by the corresponding weight coefficients and summed to obtain the preliminary priority score. The calculation of the hierarchical priority score can realize the allocation of quantified priority to each piece of intermediate data according to the inherent attributes of the data and the real-time changes of the storage environment, so as to determine its storage location.
[0042] S50: Based on the hierarchical priority score, the intermediate data is mapped and stored to the corresponding target storage layer. When the hierarchical priority is higher than the preset first threshold, the corresponding intermediate data is stored in the register storage layer or the cache layer and is prohibited from being replaced during its lifetime.
[0043] In this embodiment, lifecycle refers to the time period from when intermediate data is created until it is no longer needed by any subsequent processing operator and can be destroyed or replaced; replacement refers to the operation of moving some data out of the storage layer when the storage layer capacity is insufficient, so as to make room for new data.
[0044] Specifically, several priority score ranges can be preset, each corresponding to a target storage layer. When the priority score of an intermediate data is calculated, the priority score range to which the priority score belongs is found, thereby determining its initial target storage layer. For intermediate data with a priority score higher than the first threshold, it is attempted to store it in the register storage layer or the cache layer. Once the intermediate data is placed in the register storage layer or the cache layer, a locking flag is set for it, indicating that the intermediate data should not be replaced by other data during its data lifecycle, in order to ensure its continuous high-speed access. For example, when an intermediate data is determined to be of high priority, it is directly attempted to write it to the cache layer and marked as non-replaceable until the intermediate data is no longer needed.
[0045] For example, suppose an optical inspection system needs to perform defect detection on product images on a production line. This inspection task includes a series of image processing operators such as image preprocessing, edge detection, feature extraction, and defect classification.
[0046] First, such as Figure 2 As shown, when a defect detection image processing task is received, a predefined processing flow is obtained in response to the image processing task. This processing flow includes Gaussian filtering operator, Canny edge detection operator, HOG feature extraction operator, and SVM classification operator, etc., and the data dependencies between the above image processing operators are clarified. For example, the output of Gaussian filtering is the input of Canny edge detection.
[0047] Furthermore, a data dependency topology graph is generated based on the acquired processing flow. In this graph, each image processing operator and its intermediate data are represented as nodes, such as Gaussian-filtered images, Canny edge maps, and HOG feature vectors. The data flow direction is represented as edges. At the same time, the data type of the intermediate data is identified. For example, Gaussian-filtered images may be initially identified as general temporary data, while HOG feature vectors may be initially identified as final result data.
[0048] Furthermore, during the execution of image processing tasks, the status of each storage layer in the hierarchical architecture is continuously monitored. Specifically, this includes monitoring the current remaining capacity of the register storage layer, cache layer, dynamic storage layer, and persistent storage layer, as well as the access frequency of data in each storage layer.
[0049] Furthermore, based on real-time monitoring data and information from the data dependency topology graph, a corresponding hierarchical priority score is calculated for each intermediate data. For example, for the edge map generated by the Canny edge detection operator, since the subsequent feature extraction operator will immediately use the edge map, its estimated access frequency may be very high. However, since the edge map will be consumed soon, its remaining lifespan may be short. At the same time, the weight coefficients used to calculate the priority score are dynamically adjusted based on its data scale, the remaining capacity of the current cache layer, and the overall access heat of the system.
[0050] Finally, based on the calculated hierarchical priority score, the intermediate data is mapped and stored in the corresponding target storage layer. For example, if the hierarchical priority score of the Canny edge map is very high and exceeds a preset first threshold, it is attempted to store it in the register storage layer or the cache layer. Once successfully stored, a lock record is created to indicate that it will not be evicted during the entire lifecycle of the Canny edge map, even if the cache layer is short of space. When the HOG feature extraction operator needs to access the Canny edge map, it can directly obtain it from the high-speed, low-latency storage layer, avoiding the latency caused by loading data from the slower dynamic storage layer or even the persistent storage layer, thereby improving the execution efficiency and throughput of the entire defect detection task.
[0051] In this way, the solution of this embodiment can place intermediate data in the most suitable storage layer according to the real-time value of intermediate data and the dynamic changes of the storage environment. This can effectively solve the problem of high-speed resources being ineffectively occupied by low-frequency data and high-frequency data facing latency bottlenecks in traditional flat storage architectures. It also avoids cache jitter and bus congestion caused by blind data placement, ensuring the data supply of computing units, thereby improving the overall performance and stability of the system.
[0052] Furthermore, the solution in this embodiment can further guarantee the access performance of critical intermediate data by prohibiting the replacement of high-priority data during the data lifecycle. This enables the system to perform differentiated storage scheduling based on the real-time value and urgency of the data, thereby overcoming the problem of low storage efficiency caused by the lack of perception of the inherent data dependencies in the processing flow in the prior art.
[0053] In one embodiment, step S20 includes:
[0054] S21: Match historical execution records according to the processing flow and extract historical data features, including data size, life cycle, access frequency and data type tags;
[0055] In this embodiment, step S21 aims to obtain the behavioral patterns and attribute information of intermediate data by analyzing historical data of past executions of the same or similar processing flows. Step S21 can be achieved by recording the execution process of each image processing task in detail during system operation, including the generation, consumption, and destruction time points, size, and number of times each piece of intermediate data is accessed. These records can be stored in corresponding log files or databases and queried and matched when needed. Alternatively, a machine learning model can be used to learn the mapping relationship between processing flows and intermediate data features by training on a large amount of historical execution data. When a new processing flow is performed, the machine learning model can predict the historical characteristics of the intermediate data that may be generated, thereby reducing or even avoiding the overhead of real-time recording.
[0056] S22: Generate a preliminary topology graph based on the processing flow and historical execution records, and associate historical data features as attributes with the corresponding intermediate data nodes in the preliminary topology graph to generate a data dependency topology graph.
[0057] In this embodiment, step S22 aims to combine the static structural information of the processing flow with the dynamic behavioral features extracted from historical execution records to construct a data dependency topology graph. Historical data features are associated with intermediate data nodes as attributes, so that the data dependency topology graph not only represents the data flow direction but also includes the key runtime attributes of each data node. Step S22 can construct a preliminary topology graph structure based on the image processing operators and their data dependencies in the processing flow, where nodes represent intermediate data and edges represent data flow directions. Furthermore, historical execution records are traversed, and the historical data features of the matched intermediate data are appended as metadata to the corresponding intermediate data nodes in the preliminary topology graph. Alternatively, a graph database can be used to store and manage the data dependency topology graph. When generating the preliminary topology graph, intermediate data nodes and operator nodes are treated as entities in the preliminary topology graph, and data dependencies are treated as edges. Furthermore, historical data features extracted from historical execution records can be directly added as attributes to the corresponding intermediate data nodes, thereby enriching the information in the preliminary topology graph.
[0058] S23: Identify the data type of intermediate data nodes in the data dependency topology graph and associate the identified data type with the attributes of the corresponding intermediate data node.
[0059] In this embodiment, step S23 aims to classify intermediate data nodes based on the historical data characteristics associated with them, clarifying whether they belong to transient local data, window reuse data, cross-frame associated data, or final state result data. Step S23 can be implemented by pre-setting several rule-based classifiers. For example, if an intermediate data node has a very short lifespan and low access frequency, it may be identified as transient local data; if a data node is frequently accessed by multiple subsequent image processing operators, it may be identified as window reuse data. Furthermore, the rules in the classifier can be set based on experience or through statistical analysis of historical data. Alternatively, classification algorithms such as decision trees or support vector machines can be used to train a classification model by using historical data characteristics as input features and predefined data types as output labels. In practical applications, this classification model can predict the data type of intermediate data nodes based on their attributes and associate the results as new attributes with the nodes.
[0060] Specifically, historical data features are matched and extracted from historical execution records based on the current processing flow. These historical data features contain the actual behavioral patterns of intermediate data. Furthermore, these historical data features are used as attributes and associated with intermediate data nodes in the preliminary topology graph generated based on the processing flow, thereby constructing a data dependency topology graph with runtime behavioral information. On this basis, the data types associated with the intermediate data nodes are identified by combining the historical data features associated with them, and the identified data types are associated with the attributes of the corresponding intermediate data nodes, thereby constructing a more accurate and comprehensive data dependency topology graph.
[0061] By using the above technical solution, dynamic historical behavioral characteristics are associated as attributes with intermediate data nodes in the data dependency topology graph. This enriches the information dimensions of the data dependency topology graph, enabling it to not only reflect the data flow but also include the actual runtime attributes of the intermediate data. Based on this, when identifying the data type of intermediate data nodes, a more accurate judgment can be made by combining their historical behavioral patterns. This helps to more accurately predict the storage requirements and access patterns of intermediate data, thereby avoiding improper allocation of storage resources due to insufficient information and improving the efficiency and accuracy of hierarchical storage of intermediate data in optical detection image processing tasks.
[0062] In one embodiment, step S23 includes:
[0063] If the lifecycle of the historical data associated with the intermediate data node is lower than the first lifecycle threshold and its access frequency is lower than the first frequency threshold, then its data type is determined to be transient local data.
[0064] In this embodiment, the lifetime of the historical data associated with the intermediate data node is lower than the first lifetime threshold, and its access frequency is lower than the first frequency threshold. This means that the time span from the generation of the intermediate data to its obsolescence is short, and it is read or modified less frequently within a specific time period. The first lifetime threshold and the first frequency threshold are preset judgment criteria, which can be set according to specific system performance requirements, data characteristics, or empirical values. For example, the first lifetime threshold can be set to 100 milliseconds, and the first frequency threshold can be set to 5 times per second. Transient local data refers to local data generated in a short period of time that is no longer needed after being used once or a few times. Identifying transient local data helps to process it quickly and release storage resources, avoiding the occupation of important high-speed storage space. Furthermore, it can be identified by analyzing local variables and temporary calculation results in the data flow graph; or by judging it by the distance between data producers and consumers, for example, only being consumed by the next image processing operator.
[0065] If the access frequency of the historical data characteristics associated with the intermediate data node is higher than the second frequency threshold and its data size is greater than the first size threshold, then its data type is determined as window reuse data, wherein the second frequency threshold is higher than the first frequency threshold.
[0066] In this embodiment, if the access frequency of the historical data associated with the intermediate data node is higher than the second frequency threshold and its data size is greater than the first size threshold, it means that the intermediate data is frequently accessed and has a large data volume. The second frequency threshold being higher than the first frequency threshold indicates that it has a higher access popularity than transient local data. The first size threshold is a preset judgment standard, for example, it can be set to 1MB. Window reuse data refers to data that is accessed multiple times by multiple image processing operators within a certain time window. It usually has high reuse value and a large data volume. Identifying window reuse data helps to store it in a storage layer that is easy to access quickly but has a relatively large capacity to support efficient reuse. Furthermore, it can be implemented by analyzing the data block identification in sliding window operations and batch processing operations; or by judging the pattern of data sharing among multiple consumer image processing operators.
[0067] If an intermediate data node has a cross-frame reference edge in the data dependency topology graph, or if the data type label in the historical data feature it is associated with is a cross-frame association type, then its data type is determined to be cross-frame association data.
[0068] In this embodiment, intermediate data nodes have cross-frame reference edges in the data dependency topology graph, or the data type label in the associated historical data features is cross-frame association type. This means that there are edges in the data dependency topology graph pointing from the data node of the current frame to the data node of the subsequent frame, indicating that the data needs to be transferred between different frames; or the data in the historical execution record has been clearly marked as cross-frame association. Cross-frame association data refers to data that needs to be associated or transferred between consecutive image frames, such as the state of the tracking target, feature points of historical frames, etc. Identifying cross-frame association data helps to ensure that it can be correctly accessed and maintained in the multi-frame processing process, avoiding data loss or inconsistency. Furthermore, it can be identified by analyzing time series data, state-preserving variables, etc.; or by judging the role of data in loop or iterative processing.
[0069] If an intermediate data node has no successor consumer node in the data dependency topology graph, and the data type label in its associated historical data feature is the final result type, then its data type is determined to be the final result data.
[0070] In this embodiment, the intermediate data node has no successor consumer node in the data dependency topology graph, and the data type label in its associated historical data features is the final result type, which means that the data is the final output of the processing flow and will no longer be consumed by other image processing operators; or the data that has been clearly marked as the final result in the historical execution record; the final result data refers to the final output result of the image processing task, which needs to be stored for a long time or provided to the outside world. Identifying the final result data helps to store it in the persistent storage layer to ensure the integrity and accessibility of the data; furthermore, it can be identified by analyzing the final output interface of the task, report generation data, etc.; or by judging the final position of the data in the entire processing flow.
[0071] Specifically, intermediate data with short lifecycles and low access frequency are classified as transient local data to indicate their temporary nature; intermediate data with high access frequency and large data size are identified as window reuse data to support efficient reuse; intermediate data with cross-frame reference edges in the data dependency topology graph or marked as cross-frame associations are identified as cross-frame associated data to ensure consistency in multi-frame processing; and intermediate data without successor consumer nodes and marked as final results are classified as final result data to achieve persistent storage.
[0072] For example, in an optical inspection task, suppose it is necessary to process consecutive image frames to identify product defects.
[0073] When an image processing operator (e.g., a local contrast enhancement operator) generates intermediate data, and its historical execution record shows that the data is accessed only once by the next threshold segmentation operator within 50 milliseconds after its generation, and its access frequency is lower than a preset first frequency threshold, for example, 5 times per second, then the intermediate data is determined to be transient local data.
[0074] When an image processing operator (e.g., a feature extraction operator) generates intermediate data that is frequently accessed in multiple subsequent classification operators, for example, 20 times per second, which is higher than a second frequency threshold, and its data size is 2MB, which is greater than 1MB of the first size threshold, then the intermediate data is identified as window reuse data.
[0075] When an image processing operator (e.g., a target tracking operator) generates the target's position information in the current frame, and this position information has a reference edge in the data dependency topology graph pointing to the target prediction operator in the next frame, or its historical data features have a data type label that explicitly indicates "cross-frame association", then the intermediate data is identified as cross-frame associated data.
[0076] When the last operator in the defect detection process (e.g., a result report generation operator) outputs the final defect list, which has no successor consumer nodes in the data dependency topology graph and whose data type label in its historical data features is "final state result", then the intermediate data is identified as the final state result.
[0077] Through the above technical solution, this application can identify the data type of intermediate data based on multi-dimensional information such as the life cycle, access frequency, data scale, topology, and historical tags of the intermediate data, thereby avoiding the generalization of data with different usage modes. For example, it can distinguish between transient local data, window reuse data, cross-frame associated data, and final result data, and select the most suitable storage strategy for different types of data, thereby avoiding unnecessary data migration and storage resource waste. This has the effect of improving the efficiency and accuracy of intermediate data storage in optical detection image processing tasks, and thus optimizing the overall system performance.
[0078] In one embodiment, step S40 includes:
[0079] S41: Based on the monitored remaining capacity and access popularity, dynamically adjust the frequency weight coefficient, timeliness weight coefficient, and spatial weight coefficient.
[0080] In this embodiment, step S41 aims to adjust the weight parameters affecting the hierarchical priority score based on the actual operating status of the current storage system. Remaining capacity refers to the amount of available storage space in each storage layer of the hierarchical architecture, reflecting the scarcity of storage resources. Access frequency refers to the frequency of data access in each storage layer, reflecting the current system's demand for the performance of different storage layers. Frequency weight coefficient, timeliness weight coefficient, and space weight coefficient are parameters used to balance the relative importance of access frequency, lifespan, and data size of intermediate data in the hierarchical priority score calculation. One implementation is to appropriately increase the timeliness weight coefficient when the remaining capacity of the register storage layer and cache layer is low, prioritizing the entry of short-lifespan, soon-to-be-used intermediate data into the high-speed layer. Simultaneously, the space weight coefficient can be reduced to prevent large-size data from excessively occupying scarce high-speed storage resources. Another implementation is to increase the frequency weight coefficient when the overall system access frequency is high, making it easier for frequently accessed intermediate data to obtain high priority, thereby improving overall processing efficiency. Conversely, when the access frequency is low, the frequency weight coefficient can be appropriately reduced, focusing on other factors instead.
[0081] S42: Obtain the estimated access frequency, remaining lifespan, and data size of the target intermediate data from the data dependency topology graph;
[0082] In this embodiment, step S42 aims to provide key attribute information of the intermediate data itself for the calculation of hierarchical priority scores. The data dependency topology graph is a graph structure describing the data flow and dependencies between image processing operators, containing relevant metadata of the intermediate data. The estimated access frequency refers to the expected number or frequency at which the intermediate data will be read or accessed in subsequent processing, which can be predicted by analyzing how many subsequent operators consume the intermediate data in the data dependency topology graph or by combining historical execution records. The remaining lifetime refers to the duration or number of processing steps of the intermediate data from the current moment until it is used up by the last consumer operator, which can be determined by the data dependency topology graph. The graph traces all its consumer operators and determines the data size by combining the execution order of the processing flow; the data size refers to the storage space occupied by the intermediate data, such as the number of bytes or elements, which can usually be determined when the data is generated or through the data type definition; one implementation is to pre-calculate and store the above attributes for each intermediate data node when generating the data dependency topology graph, such as calculating the degree and longest path by traversing the topology graph as the estimated access frequency and remaining lifespan, and obtaining the data size from the data type definition; another implementation is to query the data dependency topology graph in real time when it is necessary to calculate the hierarchical priority score, and dynamically calculate the remaining lifespan in combination with the current processing progress.
[0083] S43: Calculate the stratification priority score of the target intermediate data. The calculation formula is as follows:
[0084] ,in, For frequency weighting coefficients, This is a timeliness weighting coefficient. Spatial weighting coefficient, To estimate access frequency, For the remaining lifespan, For data scale.
[0085] In this embodiment, a higher estimated access frequency generally indicates that the intermediate data is more important and should receive a higher priority; a shorter remaining lifetime indicates that the intermediate data is about to be used and should also receive a higher priority; the larger the data size, the more likely its priority may need to be adjusted under limited high-speed storage resources. For example, in some cases, large-size data may require a lower priority to avoid monopolizing resources, or in other cases, if its access frequency and timeliness are extremely high, it still needs to be given a high priority. One implementation is to normalize the estimated access frequency, remaining lifetime, and data size to make them fall within the same numerical range, and then substitute them into the hierarchical priority score calculation formula to avoid the unbalanced impact of attributes with different dimensions on the results. Another implementation is to preset the weight coefficients based on actual application scenarios and experience, and adjust them during system operation based on the monitored remaining capacity and access popularity to optimize the hierarchical storage effect.
[0086] The units for estimated access frequency, remaining lifespan, and data size can be determined based on the actual scenario and the actual detection task.
[0087] Furthermore, before the step of calculating the hierarchical priority score of intermediate data based on the monitored remaining capacity and access frequency, combined with the data dependency topology graph, the following steps are also included:
[0088] Spatial feature pre-extraction is performed on the image regions corresponding to each intermediate data node in the data dependency topology graph, specifically including:
[0089] A two-dimensional relative coordinate system is established with a preset reference point in the image area as the origin, and the two-dimensional relative coordinate system is divided into a uniform rectangular grid along the coordinate axis direction with a preset step size.
[0090] For each intermediate data point, determine the rectangular grid region covered by its image data, and calculate at least one spatial visual feature value for the rectangular grid region, including texture density, edge intensity, and motion vector amplitude.
[0091] The calculated spatial visual feature values are used as attributes of intermediate data nodes and associated with the data dependency topology graph.
[0092] Furthermore, based on the monitored remaining capacity and access frequency, and combined with the data dependency topology graph, the hierarchical priority score of the intermediate data is calculated, including:
[0093] The estimated access frequency of the target intermediate data is obtained from the data dependency topology graph. Remaining life cycle Data scale and its associated spatial visual feature values;
[0094] Calculate the spatial weight coefficient of this intermediate data based on spatial visual feature values. Among them, the higher the spatial visual feature value, the better. The larger the value;
[0095] Combined with dynamically adjusted frequency weighting coefficients Timeliness weighting coefficient and spatial weight coefficient Calculate the hierarchical priority score.
[0096] Specifically, during the execution of image processing tasks, the remaining capacity and overall access frequency of each storage layer in the pre-built hierarchical architecture are continuously monitored and used to dynamically adjust the frequency weight coefficient, timeliness weight coefficient, and spatial weight coefficient. For example, when the remaining capacity of the cache layer is tight, the timeliness weight coefficient may be increased to give higher priority to intermediate data that is about to be consumed. When the overall access frequency of the system is high, the frequency weight coefficient may be increased to prioritize the storage of frequently accessed data. At the same time, for each target intermediate data, its inherent attributes are extracted from the pre-generated data dependency topology graph, including the estimated access frequency, remaining lifespan, and data size. Furthermore, the dynamically adjusted weight coefficients and the inherent attributes of the intermediate data are substituted into the preset hierarchical priority score calculation formula to obtain a comprehensive hierarchical priority score, which takes into account the pressure of the current storage environment, the system's demand for different types of data, and the data's own access characteristics, timeliness, and storage consumption.
[0097] Through the above technical solution, this application can achieve accurate and dynamic calculation of hierarchical priority scores for intermediate data in optical detection image processing. Specifically, by monitoring the remaining capacity and access frequency of the storage system in real time, and dynamically adjusting the frequency weight coefficient, timeliness weight coefficient, and spatial weight coefficient accordingly, the priority assessment can fully reflect the current system resource constraints and the demand for data performance. At the same time, by combining the estimated access frequency, remaining lifespan, and data size of the intermediate data obtained from the data dependency topology graph, and using a weighted summation calculation formula, the value of the intermediate data itself and environmental factors can be comprehensively considered, thereby obtaining a more accurate and adaptive hierarchical priority score. This avoids the storage resource waste or performance bottlenecks that may be caused by static priority assessment, ensuring that high-value and time-sensitive intermediate data can be prioritized for storage in the high-performance storage layer, while low-value intermediate data can be stored in the lower-cost storage layer, thereby effectively improving the utilization efficiency of storage resources and ensuring the smooth execution and overall performance of optical detection image processing tasks.
[0098] In one embodiment, step S41 includes:
[0099] S411: Obtain the preset total remaining capacity threshold and global access popularity threshold;
[0100] In this embodiment, step S411 aims to provide a benchmark for subsequent storage resource status assessment. The total remaining capacity threshold can be a pre-set fixed value, such as a certain percentage of the total system storage capacity, or an empirical value dynamically learned based on historical load conditions. The global access heat threshold can be an indicator representing the overall data access activity of the system, such as the average number of accesses per unit time, the reciprocal of the cache hit rate, etc. Its setting can be based on the performance goals during system design or obtained by analyzing historical access patterns through machine learning models.
[0101] S412: Compare the remaining capacity with the total remaining capacity threshold to obtain the first comparison result, and compare the access popularity with the global access popularity threshold to obtain the second comparison result;
[0102] In this embodiment, step S412 aims to quantify the current resource stress and data activity of the storage system. The comparison between the remaining capacity and the total remaining capacity threshold can determine whether the current storage resources are sufficient or strained. For example, if the remaining capacity is lower than the total remaining capacity threshold, it indicates that the storage resources are becoming strained; if it is higher than the total remaining capacity threshold, it indicates that the resources are relatively abundant. The comparison between the access popularity and the global access popularity threshold can assess the current data access pressure of the system. For example, if the access popularity is higher than the global access popularity threshold, it indicates that the system is under high access pressure; if it is lower than the global access popularity threshold, it indicates that the access pressure is low. Furthermore, the comparison result can be a Boolean value, an enumerated value, or a quantitative indicator.
[0103] S413: Based on the first and second comparison results, dynamically adjust the frequency weight coefficient, timeliness weight coefficient, and spatial weight coefficient.
[0104] In this embodiment, step S413 aims to achieve adaptive adjustment of weight coefficients. Based on the comparison results, each weight coefficient can be intelligently adjusted to optimize the hierarchical storage strategy. For example, when the remaining capacity is low and the access frequency is high, the frequency weight coefficient and the timeliness weight coefficient may be increased to prioritize storing frequently accessed or soon-to-be-accessed intermediate data in a faster storage layer, while the space weight coefficient is reduced to reduce the preference for large-size data. Conversely, when the remaining capacity is sufficient and the access frequency is low, the frequency weight coefficient may be reduced and the space weight coefficient may be increased to allow more large-size data to be stored in the high-speed layer to make full use of resources. Furthermore, the adjustment strategy can be implemented based on actual needs and actual conditions, combined with a preset rule table, fuzzy logic system, or reinforcement learning model.
[0105] Specifically, the solution in this application introduces a mechanism for sensing the current state of the storage system to achieve adaptive adjustment of the weighting coefficients in the calculation of hierarchical priority scores. Specifically, before calculating the hierarchical priority scores of intermediate data, a preset total remaining capacity threshold and a global access heat threshold are first obtained as benchmarks for measuring storage resource status and data access pressure. Further, the remaining capacity of the current storage layer, monitored in real-time, is compared with the total remaining capacity threshold to generate a first comparison result, assessing the degree of storage resource strain. Simultaneously, the access heat of the current system, monitored in real-time, is compared with the global access heat threshold to generate a second comparison result, assessing the overall data access activity. Based on the first and second comparison results, the frequency weight coefficient, timeliness weight coefficient, and spatial weight coefficient used to calculate the hierarchical priority score are dynamically adjusted. For example, when storage resources are scarce and access pressure is high, the frequency weight coefficient and timeliness weight coefficient may be increased to give higher priority to frequently accessed or soon-to-be-accessed intermediate data, thus allowing it to be stored in a faster storage layer to ensure fast access to critical data. Conversely, when storage resources are abundant and access pressure is low, the sensitivity to access frequency and timeliness may be reduced, and the spatial weight coefficient may be increased to allow a larger scale of data to enter the high-speed storage layer to make full use of storage resources.
[0106] Through the above technical solution, this application can intelligently adjust the calculation parameters of the intermediate data tier priority score according to the real-time status of the storage system. Specifically, by acquiring and comparing preset capacity and popularity thresholds, it can accurately determine the current storage resource shortage and data access activity level, and dynamically adjust the frequency weight coefficient, timeliness weight coefficient, and space weight coefficient based on the comparison results, so that the tiered storage strategy can adaptively respond to system load and resource changes. For example, when storage resources are scarce or access pressure is high, priority is given to the access frequency and timeliness of data to ensure that critical data can be accessed quickly, thereby effectively avoiding performance bottlenecks caused by improper allocation of storage resources. When resources are abundant, storage space can be utilized more flexibly to optimize the storage efficiency of large-size data. Through this dynamic adjustment mechanism, the flexibility and efficiency of tiered storage of intermediate data for optical detection image processing are improved.
[0107] In one embodiment, such as Figure 3 As shown, step S50 includes:
[0108] S51: Obtain the preset capacity reservation ratio between the register storage layer and the cache layer;
[0109] In this embodiment, the preset capacity reservation ratio between the register storage layer and the cache layer can be configured by the system administrator or developer based on experience, system performance requirements, and data characteristics, or it can be dynamically calculated and adjusted by the system based on historical operating data through a machine learning model.
[0110] S52: When the hierarchical priority score of intermediate data is higher than the preset first threshold, verify whether the data type of the intermediate data is window reuse data or cross-frame associated data, and whether its remaining lifetime is less than the preset maximum lock lifetime.
[0111] In this embodiment, after calculating the hierarchical priority score of the intermediate data, it is first compared with the first threshold. If it is higher than the first threshold, the data type identified in the data dependency topology graph and its estimated remaining lifespan are further checked. This can be achieved through a preset strategy engine, which performs secondary filtering on the data that meets the priority conditions according to preset rules (such as data type whitelist and lifespan limit).
[0112] S53: If the verification passes, the intermediate data is directly stored in the register storage layer and a lock record is created for it. The lock record is used to indicate that the corresponding intermediate data is skipped when performing the data replacement operation, so that the intermediate data is prohibited from being replaced during its lifetime.
[0113] In this embodiment, if the verification in step S52 passes, the intermediate data can be directly written to the register storage layer and an internal locking table or flag bit can be updated. The locking table or flag bit contains data identification and lifecycle information for the replacement algorithm to query; or, the operating system or runtime environment can provide specific memory management APIs that allow the application to mark specific data areas as non-replaceable and enforce protection by hardware or microcode layers.
[0114] S54: If the verification fails, compare the layer priority score with the preset second threshold. If the layer priority score is higher than the second threshold, the preferred target layer is the register storage layer. If the layer priority score is lower than the second threshold, the preferred target layer is the cache layer.
[0115] In this embodiment, if the verification in step S52 fails, the hierarchical priority score is compared with a preset second threshold. If the hierarchical priority score is higher than the second threshold, the register storage layer is the preferred target layer; if the hierarchical priority score is lower than the second threshold, the cache layer is the preferred target layer. This provides an alternative or alternative storage strategy for high-priority data that fails verification. In this storage strategy, faster storage layers are still given priority, but a distinction is made between the register storage layer and the cache layer based on a finer-grained priority threshold. Furthermore, the corresponding target storage layer indicator can be dynamically set based on the comparison result between the hierarchical priority score and the second threshold, wherein the second threshold is greater than the first threshold.
[0116] S55: Identify the current remaining capacity of the preferred target layer and send a reservation request for a first storage amount to the storage resource manager associated with the preferred target layer, wherein the first storage amount is determined based on the preset capacity reservation ratio of the preferred target layer and the data size;
[0117] In this embodiment, step S55 aims to proactively check whether the preferred high-speed storage layer has sufficient space before actual data transmission, and attempt to reserve the space, thereby ensuring efficient resource allocation and preventing storage failure due to insufficient capacity; wherein, the storage resource manager can maintain real-time capacity information of its corresponding associated layer, and when a reservation request is received, it calculates based on the requested data size and preset capacity reservation ratio, and checks whether there is sufficient available space.
[0118] S56: When the storage resource manager's reservation response indicates that the reservation was successful, the intermediate data is stored in the current preferred target layer and a lock record is created for it.
[0119] In this embodiment, after successful reservation, intermediate data is written to the selected preferred target layer and a detailed lock record is created, which is particularly important during the replacement operation and for debugging and monitoring.
[0120] Furthermore, the locked record includes the data identifier, the storage layer it resides in, the lifecycle expiration timestamp, and references to its producer and consumer operators.
[0121] Specifically, a preset capacity reservation ratio is obtained to reserve a certain amount of space for the critical storage layer. When the hierarchical priority score of intermediate data is higher than a preset first threshold, it is not locked directly. Instead, its data type and remaining lifetime are further verified. If the intermediate data is confirmed to be window reuse data or cross-frame associated data, and its remaining lifetime is less than the preset maximum lock lifetime, it is considered critical data with a limited lifetime. In this case, it is directly stored in the register storage layer, and a detailed lock record is created for it to ensure that it will never be replaced within its lifetime. This avoids locking all high-priority data indiscriminately, and prioritizes the allocation of important register resources to the most needed and most suitable intermediate data for locking. Intermediate data; if intermediate data fails the above verification, but its hierarchical priority score is still higher than the first threshold, it will be further compared with a preset second threshold to determine whether to choose the register storage layer or the cache layer as the preferred layer; further, the current remaining capacity of the preferred target layer is identified, and a reservation request is sent to the corresponding storage resource manager based on the preset capacity reservation ratio and data size, so as to ensure that the target storage layer has enough space to accommodate the data before actual writing, avoiding storage failure due to insufficient capacity; once the reservation is successful, the intermediate data will be stored in the selected high-speed layer, and a lock record containing the data identifier, the storage layer, the lifecycle expiration timestamp, and the producer and consumer operator references will be created.
[0122] Through the above technical solutions, this application enables more refined storage management of high-priority intermediate data in optical inspection image processing. Specifically, by introducing a verification mechanism based on data type and remaining lifespan, it ensures that only truly critical intermediate data with a limited lifespan is directly locked in the fastest register storage layer, thereby avoiding excessive occupation and improper locking of important high-speed storage resources. Simultaneously, the tiered priority threshold and capacity reservation mechanism ensure that even high-priority data that fails the highest verification can be reasonably allocated to the register storage layer or cache layer based on its priority and the actual capacity of the storage layer, and reserved and locked. This improves the utilization efficiency and reliability of high-speed storage resources, effectively reduces performance fluctuations caused by data replacement or storage failures, and thus ensures the stability and real-time performance of optical inspection image processing tasks.
[0123] In one embodiment, after step S55, the following steps are included:
[0124] S551: If the reservation response from the storage resource manager indicates that reservation has failed, and the corresponding preferred target layer is the register storage layer, then the preferred target layer is downgraded to the cache layer, and a reservation request for the first storage quantity is sent to the storage resource manager associated with the cache layer.
[0125] In this embodiment, when the storage resource manager's reservation response indicates reservation failure and the corresponding preferred target layer is the register storage layer, a degradation operation is performed, downgrading the preferred target layer to the cache layer, and a reservation request for the first storage amount is sent to the storage resource manager associated with the cache layer. Specifically, when the highest priority register storage layer cannot satisfy the storage request, the second highest priority storage layer, i.e., the cache layer, is tried to ensure that intermediate data can be stored, while utilizing the fastest storage resources as much as possible. This embodiment serves as a fault-tolerance mechanism, aiming to improve the robustness of storage allocation. For example, after receiving a reservation request, the storage resource manager checks its associated storage layer. If the available capacity is insufficient to meet the first storage requirement, a reservation failure response is returned. Upon receiving the failure response, the target storage layer is switched from the register storage layer to the cache layer according to the degradation rules, and a new reservation request for the cache layer is reconstructed and sent to the storage resource manager associated with the cache layer. Alternatively, multi-level degradation logic can be implemented within the storage resource manager. When reservation for the register storage layer fails, the register storage layer storage resource manager can directly forward the reservation request to the cache layer storage resource manager with a degradation instruction, and the cache layer storage resource manager will attempt to process the reservation request, thereby simplifying the upper-layer storage allocation logic.
[0126] S552: If the reservation response of the storage resource manager associated with the cache layer indicates that the reservation has failed, then according to the hierarchical priority score of the intermediate data, it is stored directly as unlocked data to the dynamic storage layer, and the failure event is recorded.
[0127] In this embodiment, when neither the register storage layer nor the cache layer can satisfy the storage request, the data will be stored in the dynamic storage layer. The dynamic storage layer typically has a larger capacity but a relatively slower access speed, serving as a final storage guarantee. Simultaneously, this intermediate data is marked as unlocked data, meaning it no longer enjoys the locking protection of the high-priority storage layer and can be replaced. Furthermore, failure events are recorded for subsequent analysis and optimization. For example, when the cache layer manager returns a reservation failure response, the layer priority score of the current intermediate data is determined, and it is directly written to the dynamic storage layer according to a preset strategy. This preset strategy, for example, is that all data degraded from the cache layer directly enters the dynamic storage layer. After a successful write, an event record containing failure-related information is generated and stored in a log system or a dedicated event recording module. Alternatively, a storage layer priority list can be maintained. When reservation for the current preferred layer fails, the next available layer in the storage layer priority list is traversed. If the dynamic storage layer is reached, the write operation is performed directly, triggering the event recording mechanism.
[0128] Furthermore, failure events typically include relevant event feature vectors. These vectors include the failure time, the type of triggering data, the data size, and the identifier of the producer operator, clearly defining the specific content of the failure event record. By recording event feature vectors, the reasons for storage reservation failures can be analyzed in depth, such as whether it is caused by high concurrency within a specific time period, excessively large data size of a specific type, or frequent generation of large amounts of data by a specific producer operator. For example, when recording a failure event, the timestamp of the failure, the type of intermediate data that caused the failure (e.g., transient local data, window reuse data, etc.), the data size of the intermediate data (e.g., number of bytes), and the unique identifier of the image processing operator that generated the intermediate data are extracted from the current context. The extracted data information is then encapsulated into a structure or data object as the event feature vector. Alternatively, the event feature vector can be passed as a parameter to the corresponding log service or event management service through a unified event recording interface, thereby formatting and persistently storing the event feature vector, for example, in a database or distributed log system, for querying and analysis.
[0129] S553: When the number of failure events recorded within a preset period exceeds a preset threshold, a correlation analysis is performed on all failure events recorded within the preset period, and a load adjustment suggestion is sent to the task scheduler based on the correlation analysis results.
[0130] In this embodiment, when the number of failure events recorded within a preset period exceeds a preset threshold, a correlation analysis is performed on all failure events recorded within that preset period, and a load adjustment suggestion is sent to the task scheduler based on the correlation analysis results. Through correlation analysis, the underlying causes of failures can be identified, and optimization suggestions can be provided to the task scheduler, thereby achieving dynamic balancing of system resources and performance optimization. For example, a background monitoring service can be set up, which checks the number of newly added failure events every preset period. If the number exceeds the preset threshold, the background monitoring service will initiate a correlation analysis. The correlation analysis can identify failure patterns through statistical methods or machine learning algorithms. Finally, the analysis results are encapsulated into load adjustment suggestions and sent to the task scheduler through a message queue or API interface. Alternatively, the correlation analysis can also be implemented based on a rule engine. For example, if it is found that a certain producer operator causes multiple register storage layer reservation failures in a short period of time, a suggestion to reduce the execution frequency of the producer operator or prioritize the allocation of its output data to the dynamic storage layer is triggered and sent to the task scheduler.
[0131] Specifically, the solution in this embodiment introduces a storage reservation failure handling mechanism to ensure that the optical detection image processing task can still run stably when faced with a shortage of high-priority storage resources. When attempting to store intermediate data in the register storage layer based on the hierarchical priority score, if the storage resource manager indicates that reservation has failed, the preferred target layer is downgraded to the cache layer, and reservation is attempted again. This degradation strategy reflects the flexible utilization of storage resources, prioritizing the storage of data in the second-best but still faster storage layer, thereby avoiding storage interruptions caused by insufficient resources at the highest level.
[0132] Furthermore, if the storage resource manager of the cache layer also indicates that reservation has failed, the system will degrade again, storing the intermediate data directly as unlocked data to the dynamic storage layer with larger capacity and higher stability. Through this multi-level degradation mechanism, it can be ensured that all intermediate data can be stored normally, thereby ensuring the continuity of image processing tasks.
[0133] More importantly, each storage reservation failure is recorded in detail, including the failure time, the type of triggering data, the data size, and its producer operator identifier, providing a data foundation for subsequent system optimization. Simultaneously, the system counts the number of failure events. When the number of failure events exceeds a preset threshold within a preset period, correlation analysis is initiated to identify the underlying causes of storage bottlenecks, such as resource contention for specific data types, specific operators, or specific time periods. Based on the correlation analysis results, load adjustment suggestions are sent to the task scheduler, such as adjusting task priorities, limiting the concurrency of certain operators, or optimizing data generation strategies. This prompts the task scheduler to dynamically adjust according to the load adjustment suggestions, thereby achieving intelligent scheduling and load balancing of system resources, improving the operating efficiency and stability of the optical inspection image processing system.
[0134] Through the above technical solution, this application effectively solves the storage failure problem that high-priority intermediate data may face when storage resources are scarce, and further improves the system's adaptive optimization capability. Specifically, when the register storage layer reservation fails, the preferred target layer can be downgraded to the cache layer to ensure that intermediate data can still be stored in a faster storage medium, avoiding a sharp drop in performance. If the cache layer also cannot meet the demand, the intermediate data is stored in the dynamic storage layer to ensure that the data is not lost, thereby maintaining the continuity of image processing tasks. More importantly, by recording the feature vectors of failure events in detail and performing periodic correlation analysis, this application can identify the deep-seated causes of storage bottlenecks, such as resource contention for specific data types, specific operators, or specific time periods, and provide load adjustment suggestions to the task scheduler based on the correlation analysis results. This allows the task scheduler to dynamically adjust task priorities, operator concurrency, or data storage strategies, thereby improving the overall operating efficiency, stability, and resource utilization of the optical detection image processing system.
[0135] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
[0136] In one embodiment, a hierarchical storage system for intermediate data of optical detection image processing is provided, which corresponds one-to-one with the hierarchical storage method for intermediate data of optical detection image processing described in the above embodiment. The hierarchical storage system for intermediate data of optical detection image processing includes:
[0137] The process acquisition module is used to acquire the corresponding processing flow in response to the image processing task. The processing flow includes a number of image processing operators executed in sequence and the data dependencies between the image processing operators.
[0138] The topology graph generation module is used to generate a data dependency topology graph based on the processing flow and to identify intermediate data and its data types. The data types include transient local data, window reuse data, cross-frame correlation data, and final state result data.
[0139] The real-time monitoring module is used to monitor the remaining capacity and access frequency of each layer in the pre-built hierarchical architecture during the execution of image processing tasks. The hierarchical architecture includes a register storage layer, a cache layer, a dynamic storage layer, and a persistent storage layer.
[0140] The score calculation module is used to calculate the hierarchical priority score of intermediate data based on the monitored remaining capacity and access popularity, combined with the data dependency topology graph.
[0141] The execution module is used to map and store intermediate data to the corresponding target storage layer according to the hierarchical priority score. When the hierarchical priority is higher than the preset first threshold, the corresponding intermediate data is stored in the register storage layer or the cache layer and replacement is prohibited during its lifetime.
[0142] Specific limitations regarding the hierarchical storage system for intermediate data in optical detection image processing can be found in the above description of the hierarchical storage method for intermediate data in optical detection image processing, and will not be repeated here. Each module in the aforementioned hierarchical storage system for intermediate data in optical detection image processing can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device in software form, so that the processor can call and execute the operations corresponding to each module.
[0143] In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements a hierarchical storage method for intermediate data of optical detection image processing.
[0144] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements a hierarchical storage method for intermediate data of optical detection image processing.
[0145] The above-described embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application, and should all be included within the protection scope of this application.
Claims
1. A method for hierarchical storage of intermediate data in optical detection image processing, characterized in that, Including the following steps: In response to an image processing task, a corresponding processing flow is obtained, the processing flow including a number of image processing operators executed in sequence and the data dependencies between the image processing operators; A data dependency topology graph is generated based on the processing flow, and intermediate data and its data types are identified. The data types include transient local data, window reuse data, cross-frame associated data, and final state result data. During the execution of image processing tasks, the remaining capacity and access frequency of each layer in the pre-built hierarchical architecture are monitored in real time. The hierarchical architecture includes a register storage layer, a cache layer, a dynamic storage layer, and a persistent storage layer. Based on the monitored remaining capacity and access frequency, the hierarchical priority score of intermediate data is calculated by combining the data dependency topology graph. Based on the hierarchical priority score, intermediate data is mapped and stored to the corresponding target storage layer. When the hierarchical priority is higher than the preset first threshold, the corresponding intermediate data is stored in the register storage layer or the cache layer and is prohibited from being replaced during its lifetime. The step of calculating the hierarchical priority score of intermediate data based on the monitored remaining capacity and access popularity, combined with the data dependency topology graph, includes: Based on the monitored remaining capacity and access popularity, the frequency weight coefficient, timeliness weight coefficient, and spatial weight coefficient are dynamically adjusted. Obtain the estimated access frequency, remaining lifespan, and data size of the target intermediate data from the data dependency topology graph; The formula for calculating the hierarchical priority score of the target intermediate data is as follows: ,in, For frequency weighting coefficients, This is a timeliness weighting coefficient. Spatial weighting coefficient, To estimate access frequency, For the remaining lifespan, For data scale; The step of mapping and storing intermediate data to the corresponding target storage layer according to the hierarchical priority score, wherein when the hierarchical priority is higher than a preset first threshold, the corresponding intermediate data is stored in the register storage layer or the cache layer, and replacement is prohibited during its lifetime, includes: Obtain the preset capacity reservation ratio between the register storage layer and the cache layer; When the hierarchical priority score of intermediate data is higher than the preset first threshold, verify whether the data type of the intermediate data is window reuse data or cross-frame associated data, and whether its remaining lifetime is less than the preset maximum lock lifetime. If the verification passes, the intermediate data is directly stored in the register storage layer and a lock record is created for it. The lock record is used to indicate that the corresponding intermediate data is skipped when performing the data replacement operation, so that the intermediate data is prohibited from being replaced during its lifetime. If the verification fails, the hierarchical priority score is compared with the preset second threshold. If the hierarchical priority score is higher than the second threshold, the register storage layer is the preferred target layer. If the hierarchical priority score is lower than the second threshold, the cache layer is the preferred target layer. Identify the current remaining capacity of the preferred target layer and send a reservation request for a first storage amount to the storage resource manager associated with the preferred target layer, wherein the first storage amount is determined based on the preset capacity reservation ratio of the preferred target layer and the data size; When the storage resource manager's reservation response indicates that the reservation was successful, the intermediate data is stored in the current preferred target layer, and a lock record is created for it.
2. The hierarchical storage method for intermediate data in optical detection image processing according to claim 1, characterized in that: The steps of generating a data dependency topology graph based on the processing flow and identifying intermediate data and its data types, including transient local data, window reuse data, cross-frame correlation data, and final state result data, include: According to the processing flow, historical execution records are matched and historical data features are extracted. The historical data features include data size, life cycle, access frequency and data type tags. A preliminary topology graph is generated based on the processing flow and historical execution records. Historical data features are then associated with the corresponding intermediate data nodes in the preliminary topology graph as attributes to generate a data dependency topology graph. The data type of intermediate data nodes in the data dependency topology graph is identified, and the identified data type is associated with the attributes of the corresponding intermediate data node.
3. The method for hierarchical storage of intermediate data in optical detection image processing according to claim 2, characterized in that: The step of identifying the data type of intermediate data nodes in the data dependency topology graph and associating the identified data type with the attributes of the corresponding intermediate data node includes: If the lifecycle of the historical data associated with the intermediate data node is lower than the first lifecycle threshold and its access frequency is lower than the first frequency threshold, then its data type is determined to be transient local data. If the access frequency of the historical data characteristics associated with the intermediate data node is higher than the second frequency threshold and its data size is greater than the first size threshold, then its data type is determined as window reuse data, wherein the second frequency threshold is higher than the first frequency threshold. If an intermediate data node has a cross-frame reference edge in the data dependency topology graph, or if the data type label in the historical data feature it is associated with is a cross-frame association type, then its data type is determined to be cross-frame association data. If an intermediate data node has no successor consumer node in the data dependency topology graph, and the data type label in its associated historical data feature is the final result type, then its data type is determined to be the final result data.
4. The hierarchical storage method for intermediate data in optical detection image processing according to claim 1, characterized in that: The step of dynamically adjusting the frequency weight coefficient, timeliness weight coefficient, and spatial weight coefficient based on the monitored remaining capacity and access popularity includes: Get the preset total remaining capacity threshold and global access popularity threshold; The remaining capacity is compared with the total remaining capacity threshold to obtain the first comparison result, and the access popularity is compared with the global access popularity threshold to obtain the second comparison result. Based on the first and second comparison results, the frequency weight coefficient, timeliness weight coefficient, and spatial weight coefficient are dynamically adjusted.
5. The method for hierarchical storage of intermediate data in optical detection image processing according to claim 1, characterized in that: After the step of identifying the current remaining capacity of the preferred target layer and sending a reservation request for a first storage amount to the storage resource manager associated with the preferred target layer, wherein the first storage amount is determined based on a preset capacity reservation ratio of the preferred target layer and the data size, the following steps are included: If the reservation response from the storage resource manager indicates that reservation has failed, and the corresponding preferred target layer is the register storage layer, then the preferred target layer is downgraded to the cache layer, and a reservation request for the first storage quantity is sent to the storage resource manager associated with the cache layer. If the reservation response of the storage resource manager associated with the cache layer indicates that reservation has failed, then the intermediate data is stored directly as unlocked data to the dynamic storage layer according to the hierarchical priority score of the intermediate data, and the failure event is recorded. If the number of failure events recorded within a preset period exceeds a preset threshold, a correlation analysis is performed on all failure events recorded within that preset period, and a load adjustment suggestion is sent to the task scheduler based on the correlation analysis results.
6. A hierarchical storage system for intermediate data in optical detection image processing, characterized in that, include: The process acquisition module is used to acquire the corresponding processing flow in response to the image processing task. The processing flow includes a number of image processing operators executed in sequence and the data dependencies between the image processing operators. The topology graph generation module is used to generate a data dependency topology graph based on the processing flow and to identify intermediate data and its data types. The data types include transient local data, window reuse data, cross-frame correlation data, and final state result data. The real-time monitoring module is used to monitor the remaining capacity and access frequency of each layer in the pre-built hierarchical architecture during the execution of image processing tasks. The hierarchical architecture includes a register storage layer, a cache layer, a dynamic storage layer, and a persistent storage layer. The score calculation module is used to calculate the hierarchical priority score of intermediate data based on the monitored remaining capacity and access popularity, combined with the data dependency topology graph. Specifically, the score calculation module is used for: Based on the monitored remaining capacity and access popularity, the frequency weight coefficient, timeliness weight coefficient, and spatial weight coefficient are dynamically adjusted. Obtain the estimated access frequency, remaining lifespan, and data size of the target intermediate data from the data dependency topology graph; The formula for calculating the hierarchical priority score of the target intermediate data is as follows: ,in, For frequency weighting coefficients, This is a timeliness weighting coefficient. Spatial weighting coefficient, To estimate access frequency, For the remaining lifespan, For data scale; An execution module is used to map and store intermediate data to the corresponding target storage layer according to the hierarchical priority score. Specifically, when the hierarchical priority is higher than a preset first threshold, the corresponding intermediate data is stored in a register storage layer or a cache layer, and replacement is prohibited during its lifetime. The execution module is specifically used for: Obtain the preset capacity reservation ratio between the register storage layer and the cache layer; When the hierarchical priority score of intermediate data is higher than the preset first threshold, verify whether the data type of the intermediate data is window reuse data or cross-frame associated data, and whether its remaining lifetime is less than the preset maximum lock lifetime. If the verification passes, the intermediate data is directly stored in the register storage layer and a lock record is created for it. The lock record is used to indicate that the corresponding intermediate data is skipped when performing the data replacement operation, so that the intermediate data is prohibited from being replaced during its lifetime. If the verification fails, the hierarchical priority score is compared with the preset second threshold. If the hierarchical priority score is higher than the second threshold, the register storage layer is the preferred target layer. If the hierarchical priority score is lower than the second threshold, the cache layer is the preferred target layer. Identify the current remaining capacity of the preferred target layer and send a reservation request for a first storage amount to the storage resource manager associated with the preferred target layer, wherein the first storage amount is determined based on the preset capacity reservation ratio of the preferred target layer and the data size; When the storage resource manager's reservation response indicates that the reservation was successful, the intermediate data is stored in the current preferred target layer, and a lock record is created for it.
7. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the hierarchical storage method for intermediate data of optical detection image processing as described in any one of claims 1-5.
8. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by the processor, it implements the steps of the hierarchical storage method for intermediate data of optical detection image processing as described in any one of claims 1-5.