A data processing method, device and computer medium

By analyzing memory load and instruction phase offset, mapping hash space, isolating nodes with load mutations, and merging data fragments, the problem of resource misalignment in traditional data processing is solved, and stable and efficient data flow in concurrent scenarios is achieved.

CN122309291APending Publication Date: 2026-06-30QUANZHOU QUANZHONG VOCATIONAL TECH SCHOOL (LICHENG DISTRICT VOCATIONAL & TECH EDUCATION CENT)

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
QUANZHOU QUANZHONG VOCATIONAL TECH SCHOOL (LICHENG DISTRICT VOCATIONAL & TECH EDUCATION CENT)
Filing Date
2026-05-30
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Traditional data processing methods, in large-scale concurrent scenarios, cause misalignment of computing resources due to static partitioning, leading to data skew and gaps in memory allocation, resulting in slowed task response and unnecessary resource consumption.

Method used

By analyzing memory load distribution patterns and instruction phase offsets, tracing cache transfer paths, defining node association strength and mapping the underlying hash space, isolating nodes with load abrupt changes, performing graph topology aggregation, merging data shards and removing inefficient resident blocks, a data processing status classification and discrimination mechanism is constructed.

Benefits of technology

It improves the accuracy of computing power allocation, strengthens thread synchronization capabilities, eliminates the risk of skew in concurrent scenarios, and ensures the continuous stability of massive data flow processing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309291A_ABST
    Figure CN122309291A_ABST
Patent Text Reader

Abstract

This invention relates to the field of data processing technology, specifically to a data processing method, device, and computer medium. The data processing method includes the following steps: parsing memory load distribution patterns and instruction phase offsets; defining a node set to obtain a mapping layer; isolating nodes with sudden load changes to generate resource stability area blocks; labeling data skew and hole nodes to perform sharding and aggregation; removing low-throughput blocks to obtain monitoring results; comparing hash differences to match scheduling labels; and recording underlying attributes to form a classification dataset. In this invention, by tracing load distribution paths to map the underlying physical space, the resource misalignment defect caused by static partitioning is resolved, improving computing power accuracy; isolating nodes with sudden load changes and aggregating skewed and hole maps strengthens the ability to maintain stability in concurrent scenarios; eliminating data skew risks and bridging memory allocation defects; comparing hash distributions to construct a state discrimination mechanism; optimizing distribution logic and enhancing scheduling accuracy; and ensuring efficient, coherent, and stable data flow.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data processing technology, and in particular to a data processing method, apparatus, and computer medium. Background Technology

[0002] The field of data processing technology involves the acquisition, storage, organization, management, and processing of data. Its core aspects include acquiring raw data through data acquisition devices, writing the data into distributed storage nodes to form datasets, defining and indexing the data in a structured manner, and performing operations such as filtering, aggregation, association, and statistics on the data through parallel computing frameworks to generate data results that can be used for subsequent analysis. This technology field revolves around the design of storage structures for massive datasets, data flow paths, and batch and stream processing mechanisms. Traditional data processing methods refer to methods for processing large-scale datasets. These methods divide the data into multiple data blocks according to time or source, write each data block into different storage nodes to form a distributed storage structure, and schedule computing nodes to perform read operations on the data blocks during processing. The read data is sorted, grouped, and associated according to preset fields, and then the grouped data is accumulated or filtered according to fixed rules. Finally, the processing results are written to a designated storage location for subsequent retrieval.

[0003] Traditional data processing mechanisms are based on static partitioning by time or source and direct writing to nodes. This fixed allocation mode completely ignores the actual load status of the distributed cluster and the coordination of concurrent instructions, resulting in serious misalignment of underlying computing resources. In large-scale concurrent scenarios, it frequently causes data skew and memory allocation gaps. The mechanical rule screening and accumulation is difficult to cope with complex flow fluctuations. When faced with multi-dimensional read and write, it is very easy to cause sudden changes in local computing domain load and thread synchronization crashes, which aggravates the unnecessary consumption of resources and slows down the task response rhythm, causing serious disorder or even deadlock in concurrent task scheduling. Summary of the Invention

[0004] The purpose of this invention is to address the shortcomings of existing technologies by proposing a data processing method, device, and computer medium.

[0005] To achieve the above objectives, the present invention adopts the following technical solution: a data processing method comprising the following steps: S1: Based on the concurrent task execution status in a computer distributed cluster and big data processing environment, analyze the memory load distribution pattern and instruction phase offset of data flow in the distributed cluster, track the continuous transfer path between multi-level caches, define the set of computing nodes whose data residency association strength exceeds the threshold, and obtain the spatiotemporal mapping layer of big data feature flow. S2: Based on the big data feature flow spatiotemporal mapping layer, analyze the data replica consistency sharding within the node, evaluate the cluster concurrent load stability and thread synchronization, map adjacent distributed hash address spaces and isolate nodes with load mutations, and generate a topology block of computer resource load stability zone. S3: Based on the topology of the stable area of ​​computer resource load, mark the index set of data skew and memory hole nodes and perform graph topology aggregation, merge adjacent fine-grained data fragments, remove small micro-resident blocks with throughput lower than the benchmark, and obtain the data flow pattern contour closure monitoring results. S4: Based on the high cohesion storage area characteristics in the data flow pattern contour closure monitoring results, compare the hash distribution differences of the key storage domains of distributed files to match concurrent scheduling strategy labels, record the physical sector location and multi-dimensional I / O attributes, and form a data processing status classification and discrimination dataset.

[0006] As a further embodiment of the present invention, the spatiotemporal mapping layer of the big data feature flow includes resident distribution topology features, thread phase offset vector, and node hash connectivity weight; the topology block of the computer resource load stable area includes replica consistency sharding, computing thread synchronization aggregation area, and abnormal concurrent node isolation marker; the data flow morphology contour closure monitoring result includes closed-loop aggregation boundary network, memory hole mapping path, and low-throughput sharding stripping record; and the data processing status classification and discrimination dataset includes I / O load curvature parameters, instruction phase offset benchmark, and concurrent execution frequency stability index.

[0007] As a further aspect of the present invention, the step of obtaining the spatiotemporal mapping layer of big data feature flow specifically includes: S101: Based on the concurrent task execution status in a computer distributed cluster and big data processing environment, quantify the horizontal I / O load, vertical I / O load and parallel computing load difference between adjacent cluster nodes in the logical memory dimension of computing nodes, apply gradient descent algorithm to analyze the load consumption gradient of multi-dimensional computing resources, define the consumption intensity of single-point computing resources, and generate a set of computing node flow energy consumption intensity values. S102: Based on the set of computing node energy consumption intensity values, define the phase offset direction of concurrent computing instructions between the source computing node and neighboring computing nodes, identify continuous data flow links in which the cluster load shows monotonically increasing or decreasing, record the matching score between the flow link sequence and the instruction execution direction, and generate a set of instruction direction maintaining link strength. S103: Invoke the instruction direction to maintain the link strength set, select continuous links with consistent instruction execution direction among the resident nodes at the memory boundary of the computer cluster, map the continuous links to the high-dimensional computation vector space according to the computation flow continuity judgment criteria, and obtain the spatiotemporal mapping layer of big data feature flow.

[0008] As a further aspect of the present invention, the step of generating the topology blocks of the computer resource load stability zone specifically includes: S201: Based on the big data feature flow spatiotemporal mapping layer, define the logical shards of continuous big data sets in the distributed storage pool, count the difference in computing power consumption between the target computing node and adjacent computing nodes within a fixed instruction execution cycle, and obtain the node load balance index in the distributed storage domain. S202: Based on the node load balancing index, analyze the synchronization parameters of computing threads in the distributed logical domain, define the allowable range of thread concurrency fluctuations, screen computing nodes whose concurrency thread synchronization differences meet the preset cluster convergence threshold, perform aggregation operations according to hash logical regions and detect the high cohesion of local threads, and generate a set of benchmark coordinates for computing load expansion. S203: Based on the aforementioned computing load expansion reference coordinate set, traverse the neighboring cluster computing nodes and compare the waveform evolution of the large dataset throughput rate. After isolating extreme nodes where the resource consumption change rate exceeds the preset downtime warning threshold, perform topology reshaping of the cluster computing region to obtain the topology block of the stable area of ​​computer resource load.

[0009] As a further aspect of the present invention, the step of obtaining the data flow morphology contour closure monitoring result specifically includes: S301: Based on the topology block of the stable area of ​​computer resource load, strip the underlying computing nodes whose node load difference is lower than the preset cluster balance threshold, mark the isolated memory coordinates that lack hash connectivity mechanism, perform topology closure aggregation operation on the set of computing domain points whose adjacent logical distance is less than the preset network routing jump point threshold, and obtain the count value of memory allocation hole connectivity area. S302: Call the memory allocation hole connectivity count value, filter the underlying logical storage capacity covered by the connectivity area, after removing the micro fragment area with a capacity smaller than the preset cluster scheduling benchmark, aggregate the big data storage area with connectivity relationship at the routing boundary, perform computing network topology reconstruction based on the aggregated area, and obtain the data flow shape contour closure monitoring result.

[0010] As a further aspect of the present invention, the step of forming a data processing state classification and discrimination dataset specifically includes: S401: Call the high cohesion storage area in the data flow shape contour closure monitoring result, quantify the dynamic offset of the big data I / O throughput direction in the grid, define the concurrent I / O consumption curvature boundary of each distributed storage domain, and obtain the load curvature dispersion of the computing node. S402: Based on the load curvature dispersion of the computing node, analyze the data dwell density of the computing boundary node and the instruction execution phase offset angle of the continuous computing area, measure the topological architecture difference between the main node domain and the standard big data storage model of the computer, and integrate them into a multi-dimensional cluster architecture difference item to obtain the target computing domain architecture difference parameters. S403: Call the target computing domain architecture difference parameters, match and verify the multi-dimensional cluster architecture difference items with the preset concurrent scheduling classification benchmark threshold, record the underlying physical hardware coordinates and corresponding computer instruction classification labels for distributed regions that meet the matching mechanism, and obtain the data processing status classification and discrimination dataset.

[0011] As a further aspect of the present invention, the data processing method further includes the following steps: S5: Based on the data processing state classification and discrimination dataset, extract the instruction scheduling encoding sequence of each concurrent processing state region, perform computation flow similarity clustering on the instruction scheduling encoding sequence and assign a unified cluster scheduling label number, analyze the correspondence between task scheduling index and physical computing region, and output the data processing state classification partition identification label set. The data processing status classification and partition identification tag set includes unified scheduling instruction code, physical node mapping index, and scheduling area identifier.

[0012] As a further aspect of the present invention, the step of processing the output data status classification partition identification label set specifically includes: S501: Based on the data processing state classification and discrimination dataset, the node load identifier encoding of the concurrent scheduling state region is extracted, the bit width length of the underlying calculation sequence is unified and the topological normalization arrangement of the directed graph is performed to obtain the computer data processing scheduling encoding sequence set. S502: Based on the computer data processing scheduling coding sequence set, compare the position offset of concurrent I / O consumption between computing sequences, establish a concurrent computing power deviation matrix, and use a clustering algorithm to perform high-dimensional classification. Issue a unified task scheduling number according to the computing flow clustering group, analyze the mapping relationship between instruction type labels and underlying physical computing node clusters, and obtain a computer processing state type clustering mapping table. S503: Call the computer processing status type clustering mapping table, define the topological network boundary and cluster addressing index of the big data concurrent processing area according to the task scheduling number, construct the binding mapping architecture between the underlying scheduling number and the hardware physical coordinates, register the exclusive identifier of the computing type, and obtain the data processing status classification partition identification label set.

[0013] The present invention also provides a data processing apparatus for performing the data processing method, comprising: The big data flow mapping module is configured to analyze the memory load distribution pattern and instruction phase offset of data flow in the distributed cluster based on the concurrent task execution status in the computer distributed cluster and big data processing environment, track the continuous transfer path between multi-level caches, define the set of computing nodes whose data residency association strength exceeds the threshold, and obtain the spatiotemporal mapping layer of big data feature flow. The computing load stability determination module is configured to parse the data replica consistency sharding within the node based on the big data feature flow spatiotemporal mapping layer, evaluate the cluster concurrent load stability and thread synchronization, map adjacent distributed hash address spaces and isolate nodes with load mutations, and generate a topology block of computer resource load stability zone. The cluster topology closed-loop aggregation module is configured to, based on the topology blocks of the stable area of ​​computer resource load, mark the index set of data skew and memory hole nodes and perform graph topology aggregation, merge adjacent fine-grained data fragments, remove small micro-resident blocks with throughput below the benchmark, and obtain the data flow pattern contour closure monitoring results. The architecture difference analysis and classification module is configured to match the hash distribution differences of key storage domains of distributed files with concurrent scheduling strategy labels based on the high cohesion storage area characteristics in the data flow pattern contour closure monitoring results, record the physical sector location and multi-dimensional I / O attributes, and form a data processing status classification and discrimination dataset.

[0014] The present invention also provides a computer medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the data processing method described above.

[0015] Compared with the prior art, the advantages and positive effects of the present invention are as follows: In this invention, by analyzing memory load distribution patterns and instruction phase offsets to trace cache transfer paths, defining node association strength, and mapping the underlying hash space, the resource misalignment defects caused by static partitioning are effectively solved, greatly improving the accuracy of computing power allocation. By isolating nodes with sudden load changes and marking data skew and memory holes for graph topology aggregation, merging data shards and removing inefficient resident blocks, the ability to maintain thread synchronization stability in concurrent scenarios is greatly enhanced, completely eliminating skew risks and bridging memory defects. Based on hash distribution differences, scheduling strategy labels are matched, and a state classification and discrimination mechanism is constructed by recording underlying sector attributes. The task distribution logic is comprehensively optimized and the accuracy of concurrent scheduling is greatly enhanced, ensuring the continuous and stable processing of massive data flows. Attached Figure Description

[0016] Figure 1 This is a schematic diagram of the workflow of the data processing method of the present invention.

[0017] Figure 2 This is a detailed flowchart of step S1 of the present invention.

[0018] Figure 3 This is a detailed flowchart of step S2 of the present invention.

[0019] Figure 4 This is a detailed flowchart of step S3 of the present invention.

[0020] Figure 5 This is a detailed flowchart of step S4 of the present invention.

[0021] Figure 6 This is a detailed flowchart of step S5 of the present invention.

[0022] Figure 7 This is a flowchart of the data processing device of the present invention. Detailed Implementation

[0023] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.

[0024] In the description of this invention, it should be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," and "outer," etc., indicating orientation or positional relationships, are based on the orientation or positional relationships shown in the accompanying drawings and are only for the convenience of describing the invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of the invention. Furthermore, in the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.

[0025] Example 1: Please refer to Figure 1 The present invention provides a data processing method comprising the following steps: S1: Based on the concurrent task execution status in a computer distributed cluster and big data processing environment, analyze the memory load distribution pattern of cross-node data flow and the instruction phase offset attribute of parallel computing, track the continuous transfer path of big data in multi-level caches, define the set of computing nodes whose data residency association strength exceeds the preset memory interaction threshold, and obtain the spatiotemporal mapping layer of big data feature flow. S2: Based on the spatiotemporal mapping layer of big data feature flow, it parses the data replica consistency sharding within physical server nodes, calculates the high-concurrency load stability and parallel computing thread synchronization of the computer cluster during operation, and generates a topology block of computer resource load stability zone by mapping adjacent distributed hash address spaces and isolating computing nodes with abnormal load changes. S3: Based on the topology of the stable area of ​​computer resource load, identify the index set of isolated computing nodes and memory allocation holes that cause data skew, apply distributed graph topology aggregation operation to the index set, merge adjacent fine-grained data fragments, strip out small micro-resident blocks with data throughput lower than the preset throughput standard, and obtain the data flow pattern contour closure monitoring results. S4: Based on the high cohesion storage area characteristics in the data flow form contour closure monitoring results, compare the differences in hash distribution characteristics of key storage domains of computer distributed files, match the pre-set concurrent task scheduling strategy labels according to the differences in hash distribution characteristics, record the physical sector location and multi-dimensional I / O attributes, and form a data processing status classification and discrimination dataset. S5: Based on the data processing state classification and discrimination dataset, the instruction scheduling coding sequence of each concurrent processing state region is extracted, the instruction scheduling coding sequence is clustered by the same degree of computation flow and assigned a unified cluster scheduling label number, the correspondence between task scheduling index and physical computing region is analyzed, and the data processing state classification partition identification label set is output. The big data feature flow spatiotemporal mapping layer includes resident distribution topology features, thread phase offset vectors, and node hash connectivity weights. The computer resource load stable area topology blocks include replica consistency sharding, computing thread synchronization aggregation area, and abnormal concurrent node isolation markers. The data flow morphology contour closure monitoring results include closed-loop aggregation boundary network, memory hole mapping path, and low-throughput sharding stripping records. The data processing status classification and discrimination dataset includes I / O load curvature parameters, instruction phase offset benchmarks, and concurrent execution frequency stability indicators. The data processing status classification partition identification label set includes unified scheduling instruction codes, physical node mapping indexes, and scheduling area identifiers.

[0026] Please see Figure 2 The specific steps for obtaining the spatiotemporal mapping layer of big data feature flow are as follows: S101: Based on the concurrent task execution status in a computer distributed cluster and big data processing environment, quantify the horizontal I / O load, vertical I / O load and parallel computing load difference between adjacent cluster nodes in the logical memory dimension of computing nodes, apply gradient descent algorithm to analyze the load consumption gradient of multi-dimensional computing resources, define the consumption intensity of single-point computing resources, and generate a set of computing node flow energy consumption intensity values. Based on the distributed environment baseboard management bus interface, the concurrent thread count and byte throughput of the memory controller are read at a fixed sampling frequency of 100 Hz. The absolute difference between the current point and the average of the previous 10 points in the time-series data set is filtered to replace data affected by level fluctuations as the median. The total number of bytes received and transmitted by the target node's network card is extracted and summed to obtain the horizontal base byte volume, which is then multiplied by the end-to-end latency in milliseconds to obtain the horizontal input / output load. The number of bytes transmitted through the direct memory access channel is read and multiplied by the seek time compensation coefficient to obtain the vertical input / output load. The absolute value of the difference between the total number of threads on the target node and the total number of threads on adjacent nodes is extracted as the parallel computing load difference. A multivariate linear cost function is constructed with the horizontal quantity, vertical quantity, and difference as independent variables, and the feature weights are initially set to 0.5. The error is extracted by calculating the difference between the actual total power consumption and the predicted power consumption, and gradient descent is performed to update the feature weights using a learning rate of 0.01. After multiple iterations and convergence, the absolute values ​​of the three feature weights are summed to obtain the load consumption gradient. Multiply the load consumption gradient by the baseline resource constant 150 watts to obtain the consumption intensity of single-point computing resources. Then, through address mapping, encapsulate and generate a set of computing node flow energy consumption intensity values ​​containing the consumption intensity of each node.

[0027] S102: Based on the set of energy consumption intensity values ​​of computing nodes, define the phase offset direction of concurrent computing instructions between the source computing node and neighboring computing nodes, identify continuous data flow links in which the cluster load shows monotonically increasing or decreasing, record the matching score between the flow link sequence and the instruction execution direction, and generate a set of instruction direction maintenance link strength. The system retrieves the preceding set of energy consumption intensity values ​​and extracts the energy consumption intensity sequence of the source computing node and neighboring computing nodes for 10 consecutive cycles. The intensity of the last cycle is subtracted from the initial cycle intensity, and then divided by the time span of 9 to obtain the slope of the intensity change between the source and neighboring nodes. The slope proportionality constant is calculated by dividing the source node slope by the neighboring node slope. This constant is then converted into a concurrent computing instruction phase offset angle between 0 and 180 degrees using an inverse cosine function mapping table. Based on the angle, 0 to 45 degrees is defined as positive cooperative offset, and 135 to 180 degrees is defined as negative suppression offset. The energy consumption intensity values ​​of three consecutive network hop nodes are extracted, and the difference between adjacent intensity values ​​is compared. If the difference for all three consecutive hops is greater than 0, it indicates a monotonically increasing continuous data flow link in the cluster load. A dot product operation is performed between the actual packet forwarding direction vector and the phase offset direction vector. The absolute value of the deviation between the actual forwarding angle and the offset angle is calculated and divided by 180 degrees to perform normalization and obtain the directional deviation degree. Subtracting the directional deviation from the constant 1, multiplying by the base score of 100, and then multiplying by the link smoothing coefficient of 0.85 generates a score indicating the fit between the flow link sequence and the command execution direction. In actual measurements, the slope of the source node (10) and the slope of the neighboring node (5) are used to calculate the proportionality constant 2. After mapping, an offset angle of 63 degrees is obtained and defined as a neutral offset. The measured directional deviation is calculated to be 0.07, resulting in a final fit score of 79. A filtering operation is performed on the link scores, removing low-scoring links to generate a set of link strengths that maintain the command direction.

[0028] S103: Call the instruction direction to maintain the link strength set, select continuous links with consistent instruction execution direction in the resident nodes at the memory boundary of the computer cluster, map the continuous links to the high-dimensional computing vector space according to the computing flow continuity judgment criteria, and obtain the spatiotemporal mapping layer of big data feature flow. The system maintains the link strength set by invoking command direction, traverses the included continuous data flow links, matches the Internet Protocol address of the link terminus node with the list of external access and residing nodes, and retains links with addressing correspondences. It performs inner product calculation on the command execution direction vectors of consecutive adjacent nodes; if the calculation result is greater than the consistency threshold of 0.9, it filters out continuous links with consistent command execution directions. It extracts the link continuous survival time index and the average number of data frames forwarded per second, and performs high-dimensional vector mapping on links with a time index greater than 500 milliseconds and a throughput greater than 1000 frames. It establishes the node's CPU core utilization, remaining available memory bytes, and network card transmit queue length as input feature sequences, performs matrix multiplication with the mapping transformation weight matrix, and calculates the sum to obtain 128-dimensional feature values ​​in the high-dimensional computational vector space. It stacks the transformed high-dimensional computational vectors in chronological order according to timestamps to construct a three-dimensional data matrix, and extracts two-dimensional slices of the matrix to obtain a spatiotemporal mapping layer of big data feature flow.

[0029] Please see Figure 3The specific steps for generating the topology blocks of the stable area of ​​computer resource load are as follows: S201: Based on the spatiotemporal mapping layer of big data feature flow, define the logical shards of continuous big data sets in the distributed storage pool, count the difference in computing power consumption between the target computing node and adjacent computing nodes within a fixed instruction execution cycle, and obtain the node load balance index in the distributed storage domain. Based on the spatiotemporal mapping layer of big data feature flow, the distribution area of ​​nodes whose activity values ​​remain above 80 for three consecutive time segments is extracted. The addresses of consecutive logical blocks in this area are read; if the difference between consecutive address values ​​equals the data block capacity step size of 64 megabytes, it is determined to be a continuous logical fragment of the large dataset. Based on a clock timer trigger with a fixed instruction execution cycle of 200 milliseconds, the number of floating-point instructions executed per second by the target computing node's floating-point arithmetic unit is read as the target computing power consumption value. Simultaneously, probe commands are sent to obtain the computing power consumption values ​​of adjacent computing nodes, and the absolute value of the difference between the two is extracted to obtain the computing power consumption difference. The sum of the differences between the target node and its directly connected adjacent nodes is accumulated, and divided by the total number of local nodes to obtain the local average computing power deviation. This deviation is divided by the baseline maximum computing power extreme parameter of 150,000 million floating-point instructions per second, and the quotient is multiplied by 100 to generate the node load balancing index percentage. The actual measured computing power consumption value of 85000 and the consumption values ​​of three adjacent nodes were collected. The difference was calculated and the sum of the differences was obtained as 14000. The total number of nodes was 4. The average deviation was obtained by dividing the difference by 3500. The maximum computing power extreme value of 150000 was combined with the difference and multiplied by 100 to obtain the node load balance index in the distributed storage domain.

[0030] S202: Based on the node load balancing index, analyze the synchronization parameters of computing threads in the distributed logical domain, define the allowable range of thread concurrency fluctuations, select computing nodes whose concurrency thread synchronization differences meet the preset cluster convergence threshold, perform aggregation operations according to hash logical regions and detect the high cohesion of local threads, and generate a set of benchmark coordinates for computing load expansion. Retrieve the list of completion timestamps of concurrent computation threads for the target node and its neighboring nodes. Subtract the completion timestamps of the corresponding threads for the neighboring nodes from the completion timestamp of the target node's thread, and record the absolute value of the difference as the single-thread time deviation. Accumulate the deviations of all matching thread pairs to obtain the total time deviation. Divide this by the total number of matching thread pairs to calculate the computation thread synchronization parameter. Multiply the previously calculated node load balancing index by the load balancing expansion coefficient of 4.5 to obtain the broad adjustment amount. Add this adjustment amount to the basic floating constant of 15 to obtain the upper limit of the allowable range, and subtract the adjustment amount to obtain the lower limit of the allowable range. Compare the computation thread synchronization parameter with the convergence threshold of 30 milliseconds. If the value falls within the upper or lower limit of the allowable range and is less than the convergence threshold, store it in the candidate list. Extract the Internet Protocol address strings of the candidate list nodes, perform a hash operation, and take the result modulo 1024 as the hash logical region number. The cohesion percentage is obtained by dividing the number of nodes in the same numbered region by the total number of candidate nodes. If this value is greater than the recognition benchmark value of 60, the local thread is confirmed to have a high degree of cohesion. The corresponding physical grid coordinates are extracted to generate a set of benchmark coordinates for computational load expansion.

[0031] S203: Based on the benchmark coordinate set for expanding computing load, traverse the computing nodes of the neighboring cluster and compare the waveform evolution of the throughput rate of the large dataset. After isolating extreme nodes whose resource consumption change rate exceeds the preset downtime warning threshold, perform topology reshaping of the cluster computing area to obtain the topology map of the stable area of ​​computer resource load. Based on the constructed baseline coordinate set for computational load expansion, the peripheral neighboring cluster computing nodes within a 1-hop physical routing distance are traversed. The network throughput data (megabytes per second) over the past 60 seconds is read, and adjacent difference calculations are performed on the time-series waveform to obtain a first-order difference vector composed of 59 values. The covariance between the difference vector of the target expansion node and the difference vector of neighboring nodes is calculated, and divided by the product of their standard deviations to obtain the waveform evolution similarity parameter. Simultaneously, the temperature rise in degrees Celsius and the percentage increase in cooling fan speed of neighboring nodes over 5 minutes are read. The temperature rise value is multiplied by a temperature impact weight of 0.7, and the fan speed percentage is multiplied by a speed impact weight of 0.3. The products are added to obtain a comprehensive index of resource consumption change rate. If the comprehensive index is greater than the preset downtime warning threshold of 15, the link is forcibly disconnected and the node is removed from the routing table. After removing extreme nodes, the remaining adjacent nodes are reconstructed using a triangular mesh algorithm to reconstruct the connectivity edge structure, ensuring that there are no other isolated nodes within the circumcircle of the triangle. A topology tile is derived based on the base distribution map. The resource consumption change rate is calculated by multiplying the temperature increase of 12 and the fan speed increase of 25 by their respective weights and summing them, resulting in a comprehensive index of 15.9. If this value exceeds the warning threshold of 15, an isolation and disconnection mechanism is triggered, reshaping the underlying connections of the remaining healthy nodes and exporting the topology tile for the stable computer resource load area.

[0032] Please see Figure 4The specific steps for obtaining the data flow morphology contour closure monitoring results are as follows: S301: Based on the topology of the stable area of ​​computer resource load, strip the underlying computing nodes whose node load difference is lower than the preset cluster balance threshold, mark the isolated memory coordinates that lack hash connectivity mechanism, perform topology closure aggregation operation on the set of computing domain points whose adjacent logical distance is less than the preset network routing jump point threshold, and obtain the count value of the memory allocation hole connectivity area. Based on the topology of the stable computer resource load area, the average memory usage of interconnected computing nodes within the tile is extracted as a baseline cursor. The absolute value of the difference between the real-time memory usage of each individual point and the global average memory usage is calculated as the node load difference value. The load difference value is compared with the cluster balancing threshold of 450 megabytes. If it is lower than this threshold, the edge connection of the node in the topology tile is disconnected for physical isolation. A network traversal probing algorithm is used to scan the tile, and network addressing coordinates that cannot be reached after disconnection are marked as isolated memory coordinates. The shortest path calculation rule is used to detect the logical hop distance between any two isolated residual nodes. If the logical hop distance is less than the network routing hop threshold of 3 hops, the related nodes are merged into the same connected region set. The total number of globally independent connected regions is counted, and the count of memory allocation hole connected regions is output.

[0033] S302: Call the memory allocation hole connectivity count value, filter the underlying logical storage capacity covered by the connectivity, after removing the micro fragment area with a capacity smaller than the preset cluster scheduling benchmark, aggregate the big data storage area with connectivity relationship at the route boundary, perform computing network topology reconstruction based on the aggregated area, and obtain the data flow shape contour closure monitoring results. The system initiates a cyclical hierarchical scanning mechanism by calling the memory allocation hole connectivity count value. It issues a capacity query command to the connectivity object, reads the total remaining available SSD sector capacity and the total unallocated RRAM capacity within a single hole connectivity zone, and adds these two values ​​to obtain the total underlying logical storage capacity. This total capacity is compared with the cluster scheduling baseline parameter of 512 megabytes. If it is less than the baseline parameter, it is determined to be a micro-fragmentation zone and is discarded; if it is greater than or equal to the preset baseline, the selected connectivity zone is retained. The routing table of the edge gateway device in the retained zone is consulted to determine whether there are active physical links between adjacent zone boundary gateways. If the existence is confirmed, the logical identifiers of the associated zones are merged into a large aggregate zone. The virtual LAN subnet mask configuration is reissued to the generated aggregate zone, establishing a gateway-free communication mechanism within the same broadcast domain. A network distribution matrix for establishing connections is generated and rendered, and the data flow shape contour closure monitoring results are exported. In a measured case, the SSD sector capacity of a certain connectivity zone (150 megabytes) plus the memory capacity (60 megabytes) yielded a total storage capacity of 210 megabytes. The value 210 is compared with the baseline 512. It is determined to be in a low position and is cleared from memory as a waste space. Large-capacity aggregated areas are merged to complete the subnet configuration, and the data flow shape contour closure monitoring result matrix is ​​exported.

[0034] Please see Figure 5 The specific steps for forming a data processing state classification dataset are as follows: S401: Call the high cohesion storage area in the data flow shape contour closure monitoring results, quantify the dynamic offset of the big data I / O throughput direction in the grid, define the concurrent I / O consumption curvature boundary of each distributed storage domain, and obtain the load curvature dispersion of the computing node. The coordinates of the high-cohesion storage area within the data flow morphology contour closure monitoring results are retrieved, and the network transmission vector sequences of the previous and current time windows are extracted. The sum of squares of each dimension component is calculated, and the square root is taken to obtain the corresponding vector magnitude. The corresponding components of the two transmission vectors are multiplied and summed to obtain the dot product. The dot product is divided by the product of the two vector magnitudes to obtain the cosine similarity. The cosine similarity is subtracted from the constant 1 to obtain the dynamic offset, and a coordinate system is constructed with the number of concurrent queues on the horizontal axis and the response latency in milliseconds on the vertical axis. The second derivative expression of the fitted curve is extracted, and the corresponding absolute value of curvature is calculated by substituting it into the number of concurrent requests. The maximum peak value within the curvature sequence is extracted and fixed as the consumption curvature boundary. The arithmetic mean of the consumption curvature boundary values ​​of each computing node within the region is calculated. The squared deviation value is obtained by subtracting the consumption curvature boundary value of a single node from the arithmetic mean. The sum of the squared deviation values ​​is divided by the total number of nodes, and the square root of the quotient is taken to obtain the computing node load curvature dispersion.

[0035] S402: Based on the load curvature dispersion of computing nodes, analyze the data dwell density of computing boundary nodes and the instruction execution phase offset angle of continuous computing areas, measure the topological architecture difference between the main node domain and the standard big data storage model of computers, and integrate them into a multi-dimensional cluster architecture difference item to obtain the target computing domain architecture difference parameters. The structural correlation parameters are extracted using the compute node load curvature dispersion. The total volume of large data blocks in the resident state (in megabytes) is read and divided by the total nominal physical memory capacity (in megabytes) to obtain the data resident density parameter. The average waiting cycle count of the central processing unit within the continuous computing area and the waiting cycle count of the boundary network nodes are extracted. The boundary cycle count is divided by the internal cycle count to calculate the corresponding ratio, and the instruction execution phase offset angle is derived through function mapping. The eigenvalues ​​of the actual network connectivity matrix of the area are read, and the maximum eigenvalue constant of the star-shaped standard connectivity benchmark matrix (31.62) is retrieved. The absolute value of the topological algebra deviation is obtained by subtracting the benchmark maximum eigenvalue constant from the actual eigenvalue values. The extracted compute node load curvature dispersion, data resident density, instruction execution phase offset angle, and topological algebra deviation are sequentially arranged, merged, and pushed into a one-dimensional array to construct the target computing domain architecture difference parameters.

[0036] S403: Call the target computing domain architecture difference parameters, match and verify the multi-dimensional cluster architecture difference items with the preset concurrent scheduling classification benchmark threshold, record the underlying physical hardware coordinates and corresponding computer instruction classification labels for distributed regions that meet the matching mechanism, and obtain the data processing status classification and discrimination dataset. Obtain the four-dimensional numerical vector sequence of the target computing domain architecture difference parameters, and extract the baseline reference value sequence associated with the preset category from the concurrent scheduling classification lookup table. Subtract the corresponding baseline reference value from the measured values ​​of the four feature dimensions of the target architecture difference parameters to calculate the four difference results, and square them respectively. Sum the four squared results to obtain the total squared distance, and perform a square root operation on the sum to obtain the Euclidean matching distance. Compare the spatial matching distance with each classification category, select the category with the smallest calculated value, and determine that the computing region matches the classification instruction mapping. Extract the device string information containing the coordinates of the underlying computing node hardware network card within the matching region, and structure and bind the hardware coordinate string with the classification instruction label execution data field. Create a new record table, insert the hardware coordinate and corresponding instruction label fields, and construct a data processing status classification and discrimination dataset.

[0037] Please see Figure 6 The specific steps for outputting the data processing status classification partition identification label set are as follows: S501: Based on the data processing state classification and discrimination dataset, the node load identifier encoding of the concurrent scheduling state region is extracted, the bit width length of the underlying calculation sequence is unified and the topological normalization arrangement of the directed graph is performed to obtain the computer data processing scheduling encoding sequence set. Based on the data processing state classification dataset, a character set of node load identifier codes with attached state region identifiers is extracted. The 20-bit manufacturer characteristic characters containing hardware attributes at the front end of the identifier code are filtered out, and the remaining core data character sequences are truncated. The length of the truncated sequence is compared with the standard 64-bit width. If it is less than the standard length, zero characters are added to the end of the sequence until it is full; if it is greater, the excess portion at the end is truncated, resulting in a uniform standard length underlying computation sequence. The computation sequence connection attributes are used to construct a directed graph representing dependencies. The in-degree count of all vertices in the directed graph is counted. Isolated starting vertices with an in-degree count of 0 are extracted, forcibly removed from the directed graph, and pushed sequentially into a linear one-dimensional array to store the front end of the sequence. The in-degree count of the remaining associated vertices in the directed graph is updated. Associative vertices with in-degree decaying to 0 are detected and added to the array. The entire graph is traversed to complete linear rearrangement, outputting a computer data processing scheduling code sequence set.

[0038] S502: Based on the computer data processing scheduling coding sequence set, compare the position offset of concurrent I / O consumption between computing sequences, establish a concurrent computing power deviation matrix, and use a clustering algorithm to perform high-dimensional classification. Issue a unified task scheduling number according to the computing flow cluster group, analyze the mapping relationship between instruction type labels and underlying physical computing node clusters, and obtain a computer processing state type clustering mapping table. Retrieve the recent read / write consumption of the mapping nodes corresponding to any two independent computation sequences within the scheduling coding sequence set, perform a difference calculation, and extract the absolute value parameter by subtracting the two read / write consumption values. Extract the absolute value of the index difference by sorting the subscripts of the two computation sequences in a one-dimensional array, and multiply the absolute value of the index difference by the topological distance penalty constant of 15 to obtain the position compensation amount. Add the absolute value of the consumption difference to the position compensation amount to calculate the comprehensive position offset. Traverse all network nodes and calculate the combination relationship to fill the elements of the concurrent computing power deviation matrix, setting the core neighborhood search radius parameter to 80 and the basic minimum sample size to 5. Scan the vector distribution status of the deviation matrix row by row, and statistically determine the total number of neighboring related vectors contained within the search radius. If the statistical number is greater than 5, merge them into the same cluster group. Distribute and assign corresponding task scheduling unified numbers according to the divided cluster groups, map and bind the underlying device communication address with the number record into the data architecture, and output the computer processing status type cluster mapping table.

[0039] S503: Call the computer processing status type clustering mapping table, define the topology network boundary and cluster addressing index of the big data concurrent processing area based on the task scheduling number, construct the underlying scheduling number and hardware physical coordinate binding mapping architecture, register the computing type exclusive identifier, and obtain the data processing status classification partition identification label set; The system invokes a clustering mapping table for computer processing status types to extract all physical communication access control addresses (PMACs) matching the network segment corresponding to a single task scheduling number. It reads the numerical features of the addressing sequence to extract the endpoint with the widest range, establishing the starting endpoint's numerical attribute as the minimum addressing threshold for the topology boundary and the ending endpoint's numerical attribute as the maximum addressing threshold for the topology boundary to delineate the addressing sandbox boundaries. Using the task scheduling number as the root node, it arranges the pure binary strings converted from physical communication addresses as leaf nodes, constructing a top-down tree-like addressing index. It allocates pure numeric arrangements to the corresponding network group in the global memory resource pool to build a unique identifier. It creates a logical underlying form for the mapping database, setting the task scheduling number column as the primary key, arranging the minimum and maximum addressing thresholds in the range limitation column, and associating them with the unique identifier field. After completing the entry data population, it publishes a network-wide broadcast notification to execute the final deployment, generating a data processing status classification and partition identification tag set.

[0040] This embodiment also provides a data processing device for performing the above-described data processing method, including: The big data flow mapping module is configured to analyze the memory load distribution pattern and instruction phase offset of data flow in the distributed cluster based on the concurrent task execution status in the computer distributed cluster and big data processing environment, track the continuous transfer path between multi-level caches, define the set of computing nodes whose data residency association strength exceeds the threshold, and obtain the spatiotemporal mapping layer of big data feature flow. The computing load stability determination module is configured to parse the data replica consistency sharding within the node based on the big data feature flow spatiotemporal mapping layer, evaluate the cluster concurrent load stability and thread synchronization, map adjacent distributed hash address spaces and isolate nodes with load mutations, and generate a topology block of computer resource load stability zone. The cluster topology closed-loop aggregation module is configured to, based on the topology blocks of the stable area of ​​computer resource load, mark the index set of data skew and memory hole nodes and perform graph topology aggregation, merge adjacent fine-grained data fragments, remove small micro-resident blocks with throughput below the benchmark, and obtain the data flow pattern contour closure monitoring results. The architecture difference analysis and classification module is configured to match the hash distribution differences of key storage domains of distributed files with concurrent scheduling strategy labels based on the high cohesion storage area characteristics in the data flow form contour closure monitoring results, record the physical sector location and multi-dimensional I / O attributes, and form a data processing status classification and discrimination dataset. The computer concurrent scheduling identification module is configured to classify and discriminate datasets based on data processing status, extract instruction scheduling encoding sequences for each type of concurrent processing status region, perform computation flow similarity clustering on the instruction scheduling encoding sequences and assign unified cluster scheduling label numbers, analyze the correspondence between task scheduling indexes and physical computing regions, and output a set of data processing status classification and partition identification labels.

[0041] This embodiment also provides a computer medium on which a computer program is stored, and when the computer program is executed by a processor, it implements the steps of the above-described data processing method.

[0042] The above are merely preferred embodiments of the present invention and are not intended to limit the present invention in any other way. Any person skilled in the art may make changes or modifications to the above-disclosed technical content to create equivalent embodiments that can be applied to other fields. However, any simple modifications, equivalent changes, and modifications made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the protection scope of the present invention.

Claims

1. A data processing method, characterized in that, Includes the following steps: S1: Based on the concurrent task execution status in a computer distributed cluster and big data processing environment, analyze the memory load distribution pattern and instruction phase offset of data flow in the distributed cluster, track the continuous transfer path between multi-level caches, define the set of computing nodes whose data residency association strength exceeds the threshold, and obtain the spatiotemporal mapping layer of big data feature flow. S2: Based on the big data feature flow spatiotemporal mapping layer, analyze the data replica consistency sharding within the node, evaluate the cluster concurrent load stability and thread synchronization, map adjacent distributed hash address spaces and isolate nodes with load mutations, and generate a topology block of computer resource load stability zone. S3: Based on the topology of the stable area of ​​computer resource load, mark the index set of data skew and memory hole nodes and perform graph topology aggregation, merge adjacent fine-grained data fragments, remove small micro-resident blocks with throughput lower than the benchmark, and obtain the data flow pattern contour closure monitoring results. S4: Based on the high cohesion storage area characteristics in the data flow pattern contour closure monitoring results, compare the hash distribution differences of the key storage domains of distributed files to match concurrent scheduling strategy labels, record the physical sector location and multi-dimensional I / O attributes, and form a data processing status classification and discrimination dataset.

2. The data processing method according to claim 1, characterized in that, The big data feature flow spatiotemporal mapping layer includes resident distribution topology features, thread phase offset vectors, and node hash connectivity weights. The computer resource load stable area topology blocks include replica consistency sharding, computing thread synchronization aggregation area, and abnormal concurrent node isolation markers. The data flow shape contour closure monitoring results include closed-loop aggregation boundary network, memory hole mapping path, and low-throughput sharding stripping records. The data processing status classification and discrimination dataset includes I / O load curvature parameters, instruction phase offset benchmarks, and concurrent execution frequency stability indicators.

3. The data processing method according to claim 1, characterized in that, The specific steps for obtaining the spatiotemporal mapping layer of big data feature flow are as follows: S101: Based on the concurrent task execution status in a computer distributed cluster and big data processing environment, quantify the horizontal I / O load, vertical I / O load and parallel computing load difference between adjacent cluster nodes in the logical memory dimension of computing nodes, apply gradient descent algorithm to analyze the load consumption gradient of multi-dimensional computing resources, define the consumption intensity of single-point computing resources, and generate a set of computing node flow energy consumption intensity values. S102: Based on the set of computing node energy consumption intensity values, define the phase offset direction of concurrent computing instructions between the source computing node and neighboring computing nodes, identify continuous data flow links in which the cluster load shows monotonically increasing or decreasing, record the matching score between the flow link sequence and the instruction execution direction, and generate a set of instruction direction maintaining link strength. S103: Invoke the instruction direction to maintain the link strength set, select continuous links with consistent instruction execution direction among the resident nodes at the memory boundary of the computer cluster, map the continuous links to the high-dimensional computation vector space according to the computation flow continuity judgment criteria, and obtain the spatiotemporal mapping layer of big data feature flow.

4. The data processing method according to claim 3, characterized in that, The specific steps for generating the topology blocks of the stable area of ​​computer resource load are as follows: S201: Based on the big data feature flow spatiotemporal mapping layer, define the logical shards of continuous big data sets in the distributed storage pool, count the difference in computing power consumption between the target computing node and adjacent computing nodes within a fixed instruction execution cycle, and obtain the node load balance index in the distributed storage domain. S202: Based on the node load balancing index, analyze the synchronization parameters of computing threads in the distributed logical domain, define the allowable range of thread concurrency fluctuations, screen computing nodes whose concurrency thread synchronization differences meet the preset cluster convergence threshold, perform aggregation operations according to hash logical regions and detect the high cohesion of local threads, and generate a set of benchmark coordinates for computing load expansion. S203: Based on the aforementioned computing load expansion reference coordinate set, traverse the neighboring cluster computing nodes and compare the waveform evolution of the large dataset throughput rate. After isolating extreme nodes where the resource consumption change rate exceeds the preset downtime warning threshold, perform topology reshaping of the cluster computing region to obtain the topology block of the stable area of ​​computer resource load.

5. The data processing method according to claim 4, characterized in that, The specific steps for obtaining the data flow shape contour closure monitoring results are as follows: S301: Based on the topology block of the stable area of ​​computer resource load, strip the underlying computing nodes whose node load difference is lower than the preset cluster balance threshold, mark the isolated memory coordinates that lack hash connectivity mechanism, perform topology closure aggregation operation on the set of computing domain points whose adjacent logical distance is less than the preset network routing jump point threshold, and obtain the count value of memory allocation hole connectivity area. S302: Call the memory allocation hole connectivity count value, filter the underlying logical storage capacity covered by the connectivity area, after removing the micro fragment area with a capacity smaller than the preset cluster scheduling benchmark, aggregate the big data storage area with connectivity relationship at the routing boundary, perform computing network topology reconstruction based on the aggregated area, and obtain the data flow shape contour closure monitoring result.

6. The data processing method according to claim 5, characterized in that, The specific steps for forming the data processing state classification and discrimination dataset are as follows: S401: Call the high cohesion storage area in the data flow shape contour closure monitoring result, quantify the dynamic offset of the big data I / O throughput direction in the grid, define the concurrent I / O consumption curvature boundary of each distributed storage domain, and obtain the load curvature dispersion of the computing node. S402: Based on the load curvature dispersion of the computing node, analyze the data dwell density of the computing boundary node and the instruction execution phase offset angle of the continuous computing area, measure the topological architecture difference between the main node domain and the standard big data storage model of the computer, and integrate them into a multi-dimensional cluster architecture difference item to obtain the target computing domain architecture difference parameters. S403: Call the target computing domain architecture difference parameters, match and verify the multi-dimensional cluster architecture difference items with the preset concurrent scheduling classification benchmark threshold, record the underlying physical hardware coordinates and corresponding computer instruction classification labels for distributed regions that meet the matching mechanism, and obtain the data processing status classification and discrimination dataset.

7. The data processing method according to claim 1, characterized in that, The data processing method further includes the following steps: S5: Based on the data processing state classification and discrimination dataset, extract the instruction scheduling encoding sequence of each concurrent processing state region, perform computation flow similarity clustering on the instruction scheduling encoding sequence and assign a unified cluster scheduling label number, analyze the correspondence between the task scheduling index and the physical computing region, and output the data processing state classification partition identification label set. The data processing status classification and partition identification tag set includes unified scheduling instruction code, physical node mapping index, and scheduling area identifier.

8. The data processing method according to claim 7, characterized in that, The specific steps for processing the output data into a state classification partition identification label set are as follows: S501: Based on the data processing state classification and discrimination dataset, the node load identifier encoding of the concurrent scheduling state region is extracted, the bit width length of the underlying calculation sequence is unified and the topological normalization arrangement of the directed graph is performed to obtain the computer data processing scheduling encoding sequence set. S502: Based on the computer data processing scheduling coding sequence set, compare the position offset of concurrent I / O consumption between computing sequences, establish a concurrent computing power deviation matrix, and use a clustering algorithm to perform high-dimensional classification. Issue a unified task scheduling number according to the computing flow clustering group, analyze the mapping relationship between instruction type labels and underlying physical computing node clusters, and obtain a computer processing state type clustering mapping table. S503: Call the computer processing status type clustering mapping table, define the topological network boundary and cluster addressing index of the big data concurrent processing area according to the task scheduling number, construct the binding mapping architecture between the underlying scheduling number and the hardware physical coordinates, register the exclusive identifier of the computing type, and obtain the data processing status classification partition identification label set.

9. A data processing apparatus for executing a data processing method according to any one of claims 1 to 8, characterized in that, include: The big data flow mapping module is configured to analyze the memory load distribution pattern and instruction phase offset of data flow in the distributed cluster based on the concurrent task execution status in the computer distributed cluster and big data processing environment, track the continuous transfer path between multi-level caches, define the set of computing nodes whose data residency association strength exceeds the threshold, and obtain the spatiotemporal mapping layer of big data feature flow. The computing load stability determination module is configured to parse the data replica consistency sharding within the node based on the big data feature flow spatiotemporal mapping layer, evaluate the cluster concurrent load stability and thread synchronization, map adjacent distributed hash address spaces and isolate nodes with load mutations, and generate a topology block of computer resource load stability zone. The cluster topology closed-loop aggregation module is configured to, based on the topology blocks of the stable area of ​​computer resource load, mark the index set of data skew and memory hole nodes and perform graph topology aggregation, merge adjacent fine-grained data fragments, remove small micro-resident blocks with throughput below the benchmark, and obtain the data flow pattern contour closure monitoring results. The architecture difference analysis and classification module is configured to match the hash distribution differences of key storage domains of distributed files with concurrent scheduling strategy labels based on the high cohesion storage area characteristics in the data flow pattern contour closure monitoring results, record the physical sector location and multi-dimensional I / O attributes, and form a data processing status classification and discrimination dataset.

10. A computer medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of a data processing method according to any one of claims 1 to 8.