A data filtering visualization system based on aggregate association computation

The data filtering and visualization system based on aggregation and correlation calculation solves the problems of insufficient multi-source data correlation processing capabilities and low performance of deep aggregation calculation, and realizes real-time dynamic linkage and efficient visualization of multi-source data.

CN122240643APending Publication Date: 2026-06-19SUZHOU HANMA INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SUZHOU HANMA INTELLIGENT TECH CO LTD
Filing Date
2026-03-16
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing data filtering and visualization systems suffer from technical problems when dealing with large-scale, multi-source, heterogeneous data, such as insufficient multi-source data association processing capabilities, low performance in deep aggregation computing of massive data, and a lack of real-time dynamic linkage mechanism between the filtering engine and the visualization rendering component.

Method used

This paper provides a data filtering and visualization system based on aggregation and association calculation, including a data access module, an association rule engine, a multi-dimensional aggregation module, a filtering execution module, and a visualization rendering module. It processes multi-source heterogeneous data through standardized data protocols, performs association matching operations and multi-dimensional aggregation calculations, generates visualization charts, and dynamically redraws the charts under interactive trigger events.

Benefits of technology

It realizes cross-table association mapping of multi-source data, solves the performance bottleneck of deep computing of large-scale data, realizes end-to-end real-time dynamic update mechanism, and improves data processing efficiency and real-time linkage of visualization.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240643A_ABST
    Figure CN122240643A_ABST
Patent Text Reader

Abstract

This invention relates to the field of data processing, specifically to a data filtering and visualization system based on aggregation and association calculations. The system includes: a data access module for converting multi-source heterogeneous data into intermediate data in a unified format using a standardized data protocol, and outputting structured data; an association rule engine for merging the structured data into a full merged dataset; a multi-dimensional aggregation module for generating an aggregation result set with business dimensions and performing consistency checks on the aggregation result set; a filtering execution module for converting natural language into structured query statements and using the structured query statements to extract matching target data subsets from the aggregation result set; and a visualization rendering module for generating visual charts based on the data objects. The system of this invention effectively overcomes the problems of weak cross-source association capabilities and sluggish processing of massive amounts of data in existing data filtering and visualization tools.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data processing. More specifically, this invention relates to a data filtering and visualization system based on aggregation and correlation calculations. Background Technology

[0002] Currently, data filtering and visualization tools primarily filter single data sources using preset filtering conditions and display the results in chart form. In the architecture of existing systems, the data connection module typically only supports connections to a single or limited number of data sources, reading data through a fixed-format interface. Its filter module performs row-level filtering based on simple user-input conditions, with filtering logic limited to simple conditional judgments. During the visualization rendering stage, the system generates static or semi-dynamic charts based on the filtered data. While its interactive module allows users to trigger local data updates by dragging and clicking chart elements, it does not support cross-data source correlation calculations. This leads to the following technical shortcomings in existing data filtering and visualization systems when facing complex business scenarios: First, the system's filtering dimensions are limited, supporting only simple conditional filtering based on a single data source, unable to handle the correlation between multi-source data, resulting in the system's inability to perform cross-table data fusion and analysis. Second, the system's underlying layer lacks deep data aggregation and analysis capabilities, with weak aggregation calculation capabilities, making it difficult for the front-end filtering results to reflect the deeper, inherent patterns in the data. Furthermore, existing systems suffer from severe data scale limitations; when processing large-scale data exceeding millions of records, filtering efficiency drops sharply, and the system is prone to technical bottlenecks such as lag or timeouts. Finally, the interaction between the front-end charts and the underlying data is poor. When the filtering conditions are modified, it is often necessary to manually trigger the regeneration of the charts, as the system cannot achieve an end-to-end real-time dynamic update mechanism. In summary, existing data filtering and visualization tools suffer from technical problems when handling large-scale, multi-source, heterogeneous data, including insufficient multi-source data association processing capabilities, low performance in deep aggregation calculations of massive data, and a lack of a real-time dynamic linkage mechanism between the filtering engine and the visualization rendering components. Summary of the Invention

[0003] To address the technical problems of existing data filtering and visualization tools in processing large-scale, multi-source heterogeneous data, such as insufficient multi-source data association processing capabilities, low performance of deep aggregation calculations of massive data, and lack of real-time dynamic linkage mechanism between filtering engines and visualization rendering components, this invention provides solutions in the following aspects.

[0004] This invention provides a data filtering and visualization system based on aggregation and association calculations, comprising: The data access module is used to acquire multi-source heterogeneous data, convert the multi-source heterogeneous data into intermediate data in a unified format through a standardized data protocol, perform cleaning operations on the intermediate data, and output structured data to the data cache pool. The association rule engine is used to receive association configuration instructions input by the user, perform association matching operations on the structured data based on the association fields and association logic in the association configuration instructions, generate a unified association model, and merge the structured data into a full merged dataset based on the association model. The multi-dimensional aggregation module is used to call the distributed computing framework to perform multi-dimensional aggregation calculations on the full merged dataset, generate an aggregation result set with business dimensions, and perform consistency verification on the aggregation result set; The filtering execution module is used to receive a filtering request containing natural language input by the user, convert the natural language into a structured query statement, and use the structured query statement to extract a matching target data subset from the aggregated result set; it is also used to convert the target data subset into a data object in a preset format. The visualization rendering module is used to generate visualization charts based on the data object and dynamically redraw the visualization charts when an interactive trigger event is detected.

[0005] The step of performing a cleaning operation on the intermediate data and outputting structured data to a data cache pool includes: Extract the preset duplicate detection field from the data records to be processed, and perform a filtering operation on the intermediate data based on the duplicate detection field; Identify empty fields containing missing values ​​and abnormal fields containing non-compliant garbled characters in the intermediate data; The missing fields are automatically filled with preset null values, and the characters contained in the abnormal fields are converted into null values ​​to obtain the cleaned structured data.

[0006] Preferably, the step of performing association matching operations on the structured data based on the association fields and association logic in the association configuration instruction to generate a unified association model includes: The association configuration instruction is parsed to determine the target data source set for which association analysis needs to be performed, and the association field for matching is extracted from the target data source set. Verify whether the data types of the associated fields in each of the target data source sets are consistent; When the data types are consistent, the association logic is applied to perform cross-data source mapping on the target data source set, automatically generating the association model that reflects the mapping relationship between each data source, and outputting a modeling success prompt signal.

[0007] Preferably, the step of calling the distributed computing framework to perform multi-dimensional aggregation calculations on the full merged dataset to generate an aggregation result set with business dimensions includes: obtaining the load status parameters of each computing node in the distributed cluster; Based on the size of the full merged dataset and the number of computing nodes, the full merged dataset is split into multiple data shards; Based on the load status parameters, the aggregation calculation subtasks corresponding to the multiple data shards are allocated to the corresponding computing nodes for parallel execution, so as to obtain the intermediate results returned by each computing node; Perform global aggregation logic on all the intermediate results to generate the aggregated result set, and release the computing resources of the computing node after execution.

[0008] Preferably, the step of splitting the full merged dataset into multiple data shards and performing parallel aggregation calculation subtasks includes: Determine the target dimension for the full merged dataset, wherein the target dimension is at least one of the time dimension, spatial dimension, or business tag dimension; For numerical data sharding, numerical aggregation operations are performed, and grouping and summing are performed based on the target dimension using grouping statements to obtain numerical grouping aggregation results; For character-type data shards, addition operations are prevented, and deduplication and counting logic is executed according to the target dimension to obtain character-type grouping and aggregation results.

[0009] Preferably, the consistency check of the aggregation result set includes: Calculate the sum of all the numerical grouping aggregation results and use it as the numerical total value; The original numerical fields corresponding to the numerical data fragments in the full merged dataset are directly summed to obtain the direct summary value of the original data. Determine whether the total value of the numerical values ​​is equal to the directly summarized value of the original data; If they are not equal, the data consistency check is deemed to have failed, and a data rollback transaction is automatically triggered to undo the current aggregation and association operation. The check error message containing the abnormal dimension is also output.

[0010] Preferably, the step of converting the natural language into a structured query statement includes: The input natural language is preprocessed and segmented. Extract keywords from the segmented text to meet core requirements for data filtering. The system retrieves a preset query template, fills the core requirement keywords into the corresponding condition nodes of the query template, and generates a structured query statement that conforms to standard syntax rules.

[0011] Preferably, the step of extracting a matching target data subset from the aggregated result set using the structured query statement and converting the target data subset into a data object in a preset format includes: Read the aggregation result set from the data cache pool; The structured query statement is executed to perform threshold matching and filtering on the aggregated result set to generate preliminary matching results; The preliminary matching results are sorted or paginated to obtain the target data subset; The target data subset is serialized into a unified, standardized, lightweight data exchange format and sent to the data bus as the data object.

[0012] Preferably, the unified, standardized, lightweight data exchange format is JSON.

[0013] Preferably, dynamically redrawing the visualization chart upon detecting an interactive trigger event includes: Click events or selection events that act on specific graphic elements of the visualization chart on the monitoring terminal display interface; In response to the click event or the selection event, the local dimension information corresponding to the triggered graphic element is parsed. Based on the local dimension information, a drill-down analysis query instruction is sent to the filtering execution module to obtain a subset of drill-down data, and the visualization rendering module is controlled to use the subset of drill-down data to perform a local view update on the visualization chart.

[0014] The beneficial effects of this invention are as follows: This invention provides a full-link data filtering and visualization system. Through a data access module, it unifies and cleans multi-source heterogeneous data into a unified format. Utilizing an association rule engine, it breaks down the data silos of single data sources, achieving cross-table association mapping of multi-source data. Combined with a multi-dimensional aggregation module and a distributed computing framework, it solves the performance bottleneck problem in large-scale deep computing. Simultaneously, it uses a filtering execution module to extract precise data subsets and drives the visualization rendering module to dynamically redraw. This solution achieves real-time workflow from "heterogeneous data cleaning - multi-table association calculation - distributed aggregation - automatic filtering - dynamic rendering," effectively overcoming the technical shortcomings of existing technologies, such as weak cross-source association capabilities, lag in massive data processing, and a lack of real-time front-end and back-end linkage response mechanisms. Attached Figure Description

[0015] Figure 1This is a schematic diagram illustrating the structure of a data filtering visualization system based on aggregation and association calculation according to an embodiment of the present invention; Figure 2 This is a schematic flowchart illustrating the cleaning operation performed on the intermediate data according to an embodiment of the present invention. Detailed Implementation

[0016] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0017] The specific embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

[0018] Example of a data filtering and visualization system based on aggregation and association calculations: like Figure 1 As shown, the data filtering and visualization system based on aggregation and association calculation of the present invention includes: The data access module is used to acquire multi-source heterogeneous data, convert the multi-source heterogeneous data into intermediate data in a unified format through a standardized data protocol, perform cleaning operations on the intermediate data, and output structured data to the data cache pool. The association rule engine is used to receive association configuration instructions input by the user, perform association matching operations on the structured data based on the association fields and association logic in the association configuration instructions, generate a unified association model, and merge the structured data into a full merged dataset based on the association model. The multi-dimensional aggregation module is used to call the distributed computing framework to perform multi-dimensional aggregation calculations on the full merged dataset, generate an aggregation result set with business dimensions, and perform consistency verification on the aggregation result set; The filtering execution module is used to receive a filtering request containing natural language input by the user, convert the natural language into a structured query statement, and use the structured query statement to extract a matching target data subset from the aggregated result set; it is also used to convert the target data subset into a data object in a preset format. The visualization rendering module is used to generate visualization charts based on the data object and dynamically redraw the visualization charts when an interactive trigger event is detected.

[0019] In this embodiment, the distributed computing framework can be the Apache Spark framework.

[0020] In this embodiment, the multi-source heterogeneous data includes relational databases, time-series databases, data returned by API interfaces, Excel / CVS files, and other data.

[0021] The data access module includes a multi-source adaptation module and a data cleaning module. The multi-source adaptation module supports access from various data sources, including relational databases (MySQL, PostgreSQL), NoSQL databases (MongoDB), CSV files, and API interfaces. It converts heterogeneous data into intermediate data in a unified format using standardized data protocols (such as JDBC and ODBC). The data cleaning module performs deduplication, missing value imputation, and outlier detection on the multi-source heterogeneous data, outputting structured data to a data cache pool to improve the overall system's processing efficiency, ensure real-time performance, and reduce resource consumption by downstream modules.

[0022] The association configuration instructions also include user-defined association conditions (such as "User ID = Order ID"). The association logic can be any one of the following: inner join, left join, right join, and full join.

[0023] The filtering execution module is the central hub for querying and filtering, and its core function is to analyze user needs, extract precise target data from the aggregated results, and provide adapted data for visualization.

[0024] The purpose of converting the natural language into structured query statements is to transform the user's easily understandable natural language requirements into standardized query instructions that can be executed by a computer, thereby enabling accurate filtering of the aggregated result set.

[0025] The filtering request entered by the user, which contains natural language, is the natural language filtering condition (such as "sales revenue > 500,000 in the last 30 days and user level is VIP").

[0026] Structured query statements can be SQL statements or SQL-like statements.

[0027] In this embodiment, the filtering execution module includes a filtering history module, which is used to save user filtering conditions and corresponding results, and supports condition reuse and comparative analysis.

[0028] The visualization rendering module includes a chart template library, a dynamic linkage rendering module, an interaction enhancement module, a user interaction layer, a visual operation interface, and a natural language interaction module. Among these: The chart template library includes 20+ chart types such as line charts, bar charts, and pie charts, and supports users to manually select and customize chart styles (colors, axes, legends). The dynamic linkage rendering module is used to automatically trigger chart redrawing when filter conditions or data are updated, realizing end-to-end real-time linkage of "filtering-aggregation-visualization". The interactive enhancement module supports click / selection of chart elements (such as bars in a bar chart and points in a scatter plot) to trigger drill-down analysis (such as clicking "East China Region" to view data for each city in that region).

[0029] The user interaction layer is the operation and display interface for end users. Its core function is to provide user operation entry points and chart display windows, forming a closed loop of "user operation - data processing - result feedback".

[0030] The visual interface provides a drag-and-drop filter configuration area, a multi-source data association configuration panel, and a real-time preview window.

[0031] The natural language interaction module supports users in filtering their requests via voice or text input (such as "display the profit margin ranking of each product in Q3 2023"), automatically parsing and generating filtering rules. Its core functions are: receiving user commands and passing them to downstream modules, displaying visual charts, and providing feedback on the results.

[0032] This embodiment provides a full-link data filtering and visualization system. Through a data access module, it unifies and cleans heterogeneous data from multiple sources, and utilizes a correlation rule engine to break down data silos from single data sources, enabling cross-table correlation mapping of multi-source data. Combined with a multi-dimensional aggregation module and a distributed computing framework, it solves the performance bottleneck problem in large-scale deep computing. Simultaneously, a filtering execution module extracts precise data subsets and drives the visualization rendering module to dynamically redraw the data. This solution achieves real-time workflow from "heterogeneous data cleaning - multi-table correlation calculation - distributed aggregation - automatic filtering - dynamic rendering," effectively overcoming the technical shortcomings of existing technologies, such as weak cross-source correlation capabilities, lag in massive data processing, and a lack of real-time front-end and back-end linkage response mechanisms.

[0033] In one embodiment, such as Figure 2 As shown, the step of performing a cleaning operation on the intermediate data and outputting structured data to the data cache pool includes: S101. Perform a filtering operation on the intermediate data, specifically: extract a preset duplicate detection field from the data record to be processed, and perform a filtering operation on the intermediate data according to the duplicate detection field; S102. Identify empty fields containing missing values ​​and abnormal fields containing non-compliant garbled characters in the intermediate data; S103. Obtain the cleaned structured data, specifically by automatically filling the missing fields with preset null values ​​and converting the characters contained in the abnormal fields into null values ​​to obtain the cleaned structured data.

[0034] By automatically filtering duplicate records, filling in missing values, and converting non-compliant and garbled characters, the structured quality and standardization of the underlying data are improved from the system source. This effectively avoids downstream aggregation calculation anomalies or system errors caused by dirty data injection, while reducing the additional consumption of system memory and computing resources by invalid processing operations.

[0035] In one embodiment, the step of performing association matching operations on the structured data based on the association fields and association logic in the association configuration instruction to generate a unified association model includes: S201. Parse the association configuration instruction, determine the target data source set for which association analysis needs to be performed, and extract the association field for matching from the target data source set; In this embodiment, determining the target data source set for which correlation analysis needs to be performed, and extracting the correlation fields for matching from the target data source set includes: (1) Determine all data sources that need to be analyzed for correlation, clarify the core data fields and data formats of each data source, and select the set of data sources that have correlation value and can be used to establish mapping relationships as the target data source set; (2) For each filtered data source, select the key field (i.e. the related field) for association matching.

[0036] It is necessary to ensure that the selected fields have consistent data types and semantic origins to achieve accurate matching between data sources; S202. Verify whether the data types of the associated fields in each of the target data source sets are consistent; S203. When the data types are consistent, apply the association logic to perform cross-data source mapping on the target data source set, automatically generate the association model that reflects the mapping relationship between each data source, and output a modeling success prompt signal.

[0037] Before performing cross-data source mapping merging, the system enforces a check on the data type consistency of the extracted related fields. This mechanism ensures the accuracy of join operations (such as JOIN operations) between heterogeneous data sources and the underlying stability of the system, preventing the failure of the association model construction or data misalignment caused by data type conflicts.

[0038] In one embodiment, the step of invoking a distributed computing framework to perform multi-dimensional aggregation calculations on the full merged dataset to generate an aggregation result set with business dimensions includes: S301. Obtain the load status parameters of each computing node in the distributed cluster; In this embodiment, the load status parameters of the computing node can be CPU utilization and / or memory utilization.

[0039] S302. Based on the data size of the full merged dataset and the number of computing nodes, the full merged dataset is split into multiple data fragments; S303. Based on the load status parameters, the aggregation calculation subtasks corresponding to the multiple data shards are allocated to the corresponding computing nodes for parallel execution, so as to obtain the intermediate results returned by each computing node. The Executor of the computing node reads data shards from local storage and executes the aggregation logic of the aggregation calculation subtask. During the process, it uses in-memory computing, intermediate result caching and other methods to improve the execution speed.

[0040] S304. Execute global aggregation logic on all the intermediate results to generate the aggregated result set, and release the computing resources of the computing node after execution.

[0041] The computing resources of a computing node include the memory, CPU, and other resources it occupies.

[0042] By introducing a distributed computing framework and sharding and scheduling the entire merged dataset based on the real-time load status parameters of each computing node, concurrent execution of complex aggregation tasks is achieved. This solution significantly reduces the computation time for ultra-large-scale data and eliminates the performance bottleneck and crash risk of single-point servers through a load balancing mechanism.

[0043] In one embodiment, splitting the full merged dataset into multiple data shards and performing parallel aggregation computation subtasks includes: S401. Determine the target dimension for the full merged dataset, wherein the target dimension is at least one of the time dimension, spatial dimension, or business tag dimension; In this embodiment, the time dimension includes year, quarter, and month.

[0044] S402. For numerical data sharding, perform numerical aggregation operation, and use grouping statement to perform grouping and summation based on the target dimension to obtain numerical grouping aggregation result; Grouping statements include the GROUP BY statement. Numerical data can include production quantities, amounts, etc. Numerical aggregation operations can be performed using SUM(quantity) in SQL.

[0045] S403. For character data sharding, prevent addition operations and use the direct statistical results of the corresponding text fields in the original data as the character grouping aggregation results.

[0046] The direct statistical result of the corresponding text field in the original data can be either the total count of that field in the original data or the total count after deduplication.

[0047] In this embodiment, the character-based grouping aggregation result can be a statistical count of the number of different device types in each month.

[0048] Different underlying computation strategies are adopted for data sharding based on different attributes: grouping and summing instructions are used for numerical data, while character data is actively intercepted for addition operations and deduplication logic is triggered. This ensures that the aggregation instructions executed by the system are completely consistent with the physical storage attributes of the data itself, avoiding system-level runtime errors caused by performing mathematical addition on characters.

[0049] In one embodiment, the consistency check of the aggregation result set includes: S501. Calculate the sum of all the numerical grouping aggregation results, and use it as the numerical total value. ; S502. Perform a direct summation operation on the original numerical fields in the full merged dataset that correspond to the numerical data fragments to obtain the direct summary value of the original data. ; S503, Determine the total value of the numerical values. Directly summarized values ​​of the original data Are they equal? S504. If they are not equal, the data consistency check is deemed to have failed. The data rollback transaction is automatically triggered to undo the current aggregation and association operation, and a check exception message containing the exception dimension is output.

[0050] In this embodiment, the verification error message is used to explain the specific dimensions and fields where the total value does not match, so that users can investigate and analyze the root cause of the problem and ensure that the business side obtains accurate and usable data.

[0051] A two-way comparison and verification mechanism was constructed between the "summated value after aggregation" and the "original summary value before aggregation," enabling absolute consistency monitoring throughout the data dimension transformation lifecycle. If a mismatch is detected, a transaction rollback is automatically triggered, effectively cutting off the propagation chain of erroneous data and ensuring that the final data input to the visualization rendering layer is absolutely accurate and reliable.

[0052] In one embodiment, the consistency check of the aggregation result set further includes: S601. Compare whether the total count of each group of text data is equal to the direct total count of the text field in the original data; S602. If the total count of text data in each group is equal to the direct total count of the text field in the original data, the data consistency check is deemed to have passed; otherwise, the data consistency check is deemed to have failed.

[0053] In one embodiment, converting the natural language into a structured query statement includes: S701. Perform preprocessing and word segmentation on the input natural language; S702. Extract core requirement keywords for data filtering from the segmented text; S703. Retrieve the system's preset query statement template, fill the core requirement element keywords into the corresponding condition nodes of the query statement template, and generate the structured query statement that conforms to standard syntax rules.

[0054] The query statement template can be an SQL template, for example: "SELECT FROM aggregate result set WHERE condition 1 AND condition 2.

[0055] By automatically segmenting and extracting core elements from the input natural language and seamlessly filling it into the underlying query template, the system achieves automatic translation of users' unstructured intents into machine-executable structured query instructions, greatly reducing the interaction threshold and system operation complexity of complex logical retrieval.

[0056] In one embodiment, the step of extracting a matching target data subset from the aggregated result set using the structured query statement and converting the target data subset into a data object in a preset format includes: S801. Read the aggregation result set from the data cache pool; S802. Execute the structured query statement to perform threshold matching filtering on the aggregated result set and generate preliminary matching results; S803. Perform sorting or pagination processing on the preliminary matching results to obtain the target data subset; S804. The target data subset is serialized and converted into a unified, standardized, lightweight data exchange format, and sent to the data bus as the data object.

[0057] A unified, standardized, and lightweight data exchange format can be JSON.

[0058] Structured query statements are used to perform precise threshold matching and pruning of data, and pagination / sorting is used to optimize data volume. The refined data is then serialized into a standardized, lightweight exchange format. This process significantly reduces network bandwidth usage during data object transmission on the system's internal bus, and substantially improves the throughput efficiency of cross-module communication and invocation.

[0059] In one embodiment, dynamically redrawing the visualization chart upon detecting an interactive trigger event includes: S901, Click events or selection events that act on specific graphic elements of the visualization chart on the monitoring terminal display interface; S902. In response to the click event or the selection event, parse the local dimension information corresponding to the triggered graphic element; S903. Based on the local dimension information, send a drill-down analysis query instruction to the filtering execution module to obtain a subset of drill-down data, and control the visualization rendering module to use the subset of drill-down data to perform a local view update on the visualization chart.

[0060] By monitoring micro-interaction events (clicks / selections) of graphical elements on the terminal interface, the system automatically reverse-engineers local dimensional information and triggers drill-down analysis calculations, achieving fine-grained real-time decoupling and local redrawing between the front-end view and the underlying data model. This eliminates the overhead of a full global page refresh, endowing the system with efficient deep data exploration and immersive interaction capabilities.

[0061] While this specification has shown and described numerous embodiments of the invention, it will be apparent to those skilled in the art that such embodiments are provided by way of example only. Many modifications, alterations, and alternatives will occur to those skilled in the art without departing from the spirit and essence of the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in the practice of this invention.

Claims

1. A data filtering and visualization system based on aggregation and association calculations, characterized in that, include: The data access module is used to acquire multi-source heterogeneous data, convert the multi-source heterogeneous data into intermediate data in a unified format through a standardized data protocol, perform cleaning operations on the intermediate data, and output structured data to the data cache pool. The association rule engine is used to receive association configuration instructions input by the user, perform association matching operations on the structured data based on the association fields and association logic in the association configuration instructions, generate a unified association model, and merge the structured data into a full merged dataset based on the association model. The multi-dimensional aggregation module is used to call the distributed computing framework to perform multi-dimensional aggregation calculations on the full merged dataset, generate an aggregation result set with business dimensions, and perform consistency verification on the aggregation result set; The filtering execution module is used to receive a filtering request containing natural language input by the user, convert the natural language into a structured query statement, and use the structured query statement to extract a matching target data subset from the aggregated result set. It is also used to convert the target data subset into a data object in a preset format; The visualization rendering module is used to generate visualization charts based on the data object and dynamically redraw the visualization charts when an interactive trigger event is detected.

2. The data filtering and visualization system based on aggregation and association calculation as described in claim 1, characterized in that, The step of performing a cleaning operation on the intermediate data and outputting structured data to a data cache pool includes: Extract the preset duplicate detection field from the data records to be processed, and perform a filtering operation on the intermediate data based on the duplicate detection field; Identify empty fields containing missing values ​​and abnormal fields containing non-compliant garbled characters in the intermediate data; The missing fields are automatically filled with preset null values, and the characters contained in the abnormal fields are converted into null values ​​to obtain the cleaned structured data.

3. The data filtering and visualization system based on aggregation and association calculation as described in claim 1, characterized in that, The step of performing association matching operations on the structured data based on the association fields and association logic in the association configuration instructions to generate a unified association model includes: The association configuration instruction is parsed to determine the target data source set for which association analysis needs to be performed, and the association field for matching is extracted from the target data source set. Verify whether the data types of the associated fields in each of the target data source sets are consistent; When the data types are consistent, the association logic is applied to perform cross-data source mapping on the target data source set, automatically generating the association model that reflects the mapping relationship between each data source, and outputting a modeling success prompt signal.

4. The data filtering and visualization system based on aggregation and association calculation as described in claim 1, characterized in that, The step of calling the distributed computing framework to perform multi-dimensional aggregation calculations on the full merged dataset and generating an aggregation result set with business dimensions includes: obtaining the load status parameters of each computing node in the distributed cluster; Based on the size of the full merged dataset and the number of computing nodes, the full merged dataset is split into multiple data shards; Based on the load status parameters, the aggregation calculation subtasks corresponding to the multiple data shards are allocated to the corresponding computing nodes for parallel execution, so as to obtain the intermediate results returned by each computing node; Perform global aggregation logic on all the intermediate results to generate the aggregated result set, and release the computing resources of the computing node after execution.

5. The data filtering and visualization system based on aggregation and association calculation as described in claim 4, characterized in that, The step of splitting the full merged dataset into multiple data shards and performing parallel aggregation calculation subtasks includes: Determine the target dimension for the full merged dataset, wherein the target dimension is at least one of the time dimension, spatial dimension, or business tag dimension; For numerical data sharding, numerical aggregation operations are performed, and grouping and summing are performed based on the target dimension using grouping statements to obtain numerical grouping aggregation results; For character-type data shards, addition operations are prevented, and deduplication and counting logic is executed according to the target dimension to obtain character-type grouping and aggregation results.

6. The data filtering and visualization system based on aggregation and association calculation as described in claim 5, characterized in that, The consistency check of the aggregation result set includes: Calculate the sum of all the numerical grouping aggregation results and use it as the numerical total value; The original numerical fields corresponding to the numerical data fragments in the full merged dataset are directly summed to obtain the direct summary value of the original data. Determine whether the total value of the numerical values ​​is equal to the directly summarized value of the original data; If they are not equal, the data consistency check is deemed to have failed, and a data rollback transaction is automatically triggered to undo the current aggregation and association operation. The check error message containing the abnormal dimension is also output.

7. The data filtering and visualization system based on aggregation and association calculation as described in claim 1, characterized in that, The process of converting the natural language into a structured query statement includes: The input natural language is preprocessed and segmented. Extract keywords from the segmented text to meet core requirements for data filtering. The system retrieves a preset query template, fills the core requirement keywords into the corresponding condition nodes of the query template, and generates a structured query statement that conforms to standard syntax rules.

8. The data filtering and visualization system based on aggregation and association calculation as described in claim 1, characterized in that, The step of extracting a matching target data subset from the aggregated result set using the structured query statement and converting the target data subset into a data object in a preset format includes: Read the aggregation result set from the data cache pool; The structured query statement is executed to perform threshold matching and filtering on the aggregated result set to generate preliminary matching results; The preliminary matching results are sorted or paginated to obtain the target data subset; The target data subset is serialized into a unified, standardized, lightweight data exchange format and sent to the data bus as the data object.

9. The data filtering and visualization system based on aggregation and association calculation as described in claim 8, characterized in that, The unified, standardized, lightweight data exchange format is JSON.

10. The data filtering and visualization system based on aggregation and association calculation as described in any one of claims 1 to 9, characterized in that, The step of dynamically redrawing the visualization chart upon detecting an interactive trigger event includes: Click events or selection events that act on specific graphic elements of the visualization chart on the monitoring terminal display interface; In response to the click event or the selection event, the local dimension information corresponding to the triggered graphic element is parsed. Based on the local dimension information, a drill-down analysis query instruction is sent to the filtering execution module to obtain a subset of drill-down data, and the visualization rendering module is controlled to use the subset of drill-down data to perform a local view update on the visualization chart.