An adaptive hierarchical multi-dimensional data insight analysis method based on swarm intelligence

CN122309510APending Publication Date: 2026-06-30CHINA ACAD OF SPACE SYST SCI & ENG

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA ACAD OF SPACE SYST SCI & ENG
Filing Date
2026-02-26
Publication Date
2026-06-30

Smart Images

  • Figure CN122309510A_ABST
    Figure CN122309510A_ABST
Patent Text Reader

Abstract

An adaptive hierarchical multidimensional data insight analysis method based on swarm intelligence, belonging to the field of artificial intelligence technology, utilizes recursive decomposition and synchronous data exploration mechanisms to construct a valid topic tree and automatically prune invalid branches. For leaf nodes, the swarm intelligence agent performs autonomous multi-round searches and multimodal heterogeneous data fusion, recording end-to-end evidence. The verification system verifies credibility through code replay and visual backtracking, quantifies the originality of node analysis using density clustering and truth constraint mechanisms, and scores based on relevance and depth. Finally, an interactive insight topology is constructed, supporting on-demand drill-down rendering from macro-level overview to micro-level evidence. This invention effectively solves the problems of ambiguous entry points and homogenized viewpoints in big data analysis. By lowering the threshold for data analysis through swarm intelligence, it leverages multi-angle analysis by the swarm to reveal non-explicit hidden correlations and high-value sparse insights, achieving full-process intelligentization from automatic planning to in-depth mining.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to an adaptive hierarchical multidimensional data insight and analysis method based on swarm intelligence, belonging to the fields of big data analysis, artificial intelligence and human-computer interaction technology. Background Technology

[0002] With the rapid development of information technology, enterprises and organizations have accumulated massive amounts of multimodal data (including structured databases, text reports, statistical charts, etc.). How to quickly extract valuable insights from this vast amount of data is the core challenge of data-driven decision-making.

[0003] Currently, existing data analysis technologies are mainly divided into two categories: traditional business intelligence (BI) analysis and the recently emerging generative analysis based on large language models (LLM). However, in practical applications, these two technical solutions still have the following significant drawbacks: 1. The analysis suffers from a vague entry point and the difficulty of a "cold start." Faced with massive datasets, non-professional users often find themselves at a loss, unsure where to begin. Traditional BI tools rely on users pre-defining specific analytical dimensions and metrics; if users lack domain knowledge or are unfamiliar with data distribution, it's difficult to construct an effective analytical path. While existing AI analysis assistants support natural language queries, they are typically reactive and unable to proactively plan a comprehensive, hierarchical analytical framework, easily leading to fragmented and one-sided analytical results.

[0004] 2. Frequent Illusions and Lack of Data Exploration Mechanisms. Analysis agents based on large language models often exhibit illusions—fabricating data or conclusions—when lacking supporting data. While existing RAG (Retrieval Augmentation) techniques alleviate this problem, they are mostly static retrieval methods lacking data exploration mechanisms. This means that before executing a task, the agent fails to verify whether the database has sufficient data richness to support the analysis in that dimension, resulting in a significant waste of computational resources on ineffective or empty analysis tasks.

[0005] 3. Homogeneous conclusions and lack of in-depth insights. Existing single-agent analysis tends to generate safe and mediocre conclusions, failing to uncover hidden, non-explicit relationships within the data. Due to the lack of group game theory and deduplication mechanisms, analysis reports are often monotonous and lack originality. Furthermore, existing evaluation systems are mostly based on text fluency, making it difficult to quantify the logical depth and uniqueness of the analysis.

[0006] 4. Lack of multimodal validation and disconnected user experience. Business reports often include key trend charts. Most current analytics tools only process text or structured tables, neglecting visual understanding and cross-validation of charts and graphs, leading to the risk of mismatch between text and visuals. Furthermore, the final reports are usually static PDFs or long text documents, preventing users from smoothly drilling down from macro-level conclusions to micro-level evidence (such as raw SQL or charts), making it difficult for decision-makers to trace and verify the conclusions.

[0007] In summary, there is an urgent need for an analytical system that can proactively plan hierarchical tasks, possess data exploration and pruning capabilities, and generate highly original insights through swarm intelligence, in order to address the current problems of high barriers to entry, low reliability, and insufficient depth in data analysis. Summary of the Invention

[0008] The technical problem solved by this invention is to overcome the shortcomings of the prior art and provide an adaptive hierarchical multidimensional data insight analysis method based on swarm intelligence. This method solves the core problems in big data analysis, such as the difficulty in starting the analysis due to the lack of automated task planning, the tendency of intelligent agents to produce illusory conclusions without data support, the homogeneity and lack of originality in individual analysis insights, and the lack of traceability of results and the lack of support for interactive drill-down verification.

[0009] The technical solution of this invention is: Firstly, an adaptive hierarchical multidimensional data insight analysis method based on swarm intelligence, comprising: Receive user input regarding the target topic to be analyzed and the dataset to be analyzed; Based on the dataset to be analyzed, the target topic to be analyzed is decomposed into a hierarchical topic tree using a pre-defined large language model; the hierarchical topic tree includes a parent topic, child topics, and leaf node topics; Based on a pre-defined large language model, a group analysis agent is constructed. The semantic attributes of the topic at each leaf node of the topic tree are analyzed to generate differentiated strategies for each group analysis agent. Then, several group analysis agents are instantiated according to different strategies. Each group analysis agent is configured with analysis perspective prompts and autonomous decision parameters. Each group analysis agent performs multiple rounds of search on the dataset to be analyzed based on the corresponding leaf node topic and records the analysis link index to obtain the analysis results of each group analysis agent; A verification and scoring agent is constructed based on a pre-defined large language model. The verification and scoring agent is then used to quantify and score the analysis results of each group analysis agent using pre-defined evaluation indicators. The pre-defined evaluation indicators include credibility, originality, topic relevance, and insight depth. Based on the quantitative scoring results, the analysis results of the leaf node topics are selected, and each selected leaf node topic and its parent topic are encapsulated as an independent data insight node; An index is built according to the hierarchical relationship of the topic tree. An interactive insight knowledge topology is constructed and visualized through data insight nodes. In response to the user's drill-down interaction command, the analysis results of the child node topics at each level are dynamically loaded and displayed starting from the root node of the topic tree, until the analysis results of the leaf node topics are expanded as needed.

[0010] Furthermore, the step of decomposing the target topic to be analyzed into a hierarchical topic tree includes: S21, Construct a topic planning agent using a preset large language model, and generate a set of candidate subtopic directions using the topic planning agent based on the topic at the current level; S22, Generate metadata query instructions for each group of candidate subtopics and perform queries in the dataset to be analyzed; S23. For structured data in the query results, calculate the normalized record coverage rate; for unstructured data in the query results, calculate the data score using a weighted algorithm based on keyword matching frequency or density and vector semantic similarity; merge the record coverage rate and data score to generate a comprehensive richness index; if the comprehensive richness index is lower than the preset dynamic support threshold, the direction is determined to be unanalyzable, and the candidate subtopic is removed, completing the pruning; if the comprehensive richness index is higher than the preset maximum value and the topic range exceeds the preset range, trigger the next round of recursive decomposition, and use the subtopic direction as a new parent node for further decomposition; if after recursive decomposition, all candidate subtopic directions generated by a parent topic are determined to be invalid pruning, trigger the backtracking retention mechanism, terminate the further decomposition of the parent topic, and directly mark it as a leaf node for retention; S24. Repeat S22~S23 until the data richness of all leaf node topics is within the preset range and the granularity meets the preset requirements, or the set upper limit level is reached, then lock the structure as the final topic tree.

[0011] Furthermore, the multi-round search includes: When the retrieval object is an image or chart, the swarm analysis agent calls a preset multimodal large language model to semantically describe the chart and convert it into text context; The swarm analysis agent iteratively completes a closed-loop task: generating search instructions based on a preset topic, executing the search, viewing the search results, and determining whether the search results meet the preset topic requirements. If data is missing, the swarm analysis agent generates a new query to supplement it; if data contradictions are found, a comparative retrieval is triggered; after executing the closed-loop task for the maximum number of rounds, structured statistical values ​​and unstructured descriptions are obtained. By semantically aligning and fusing structured statistics with unstructured descriptions, the analysis results of the swarm analysis agent are generated.

[0012] Furthermore, the method for calculating the topic relevance includes: The core viewpoints are extracted from the analysis results using a large language model, and a conclusion title is generated. Convert the conclusion title into a vector. The topic descriptions of the leaf nodes that the scoring agent is responsible for are converted into vectors. ,calculate and Cosine similarity between them; Cosine similarity is used as the topic relevance score; if it is lower than the preset topic relevance threshold, it is marked as invalid.

[0013] Furthermore, the method for calculating the credibility includes: For structured data, extract the numerical values ​​from the analysis results and replay the historical query code for comparison; For unstructured chart data, visual verification is performed: First, a preset visual large language model is called to convert the chart into visual factual description text. Then, a logical verification instruction is constructed, taking the visual factual description text as a premise and the analysis result of the group analysis agent as a hypothesis, and inputting it into an independent large language model for logical reasoning. The analysis result is confirmed to pass the verification if and only if the large language model determines that there is a logical support relationship between the two.

[0014] Furthermore, the original calculation method includes: For the analysis results of several group analysis agents under the same leaf node topic, the vector embedding model is used to map them into high-dimensional semantic vectors. Density clustering is performed on the high-dimensional semantic vector to identify the regular opinion clusters containing the majority of conclusions and the outliers or sparse clusters containing the minority of conclusions. Calculate the Euclidean distance between the conclusion vector to be evaluated and the centroid of the conventional viewpoint cluster. The greater the distance, the higher the initial originality score. The originality score is retained only if the credibility score of the analysis result is higher than the preset safety threshold; otherwise, if the credibility is low and the result is far from the centroid, it is judged as an abnormal hallucination, and the originality score is directly reduced to zero.

[0015] Furthermore, the insight depth calculation method is as follows: Predefine the complexity weights of various operators in the analysis toolkit; The number of various operators invoked by the swarm analysis agent during the generation of analysis results and the cumulative operator weights are calculated. The credibility score is used as a coefficient and multiplied by the weight of the cumulative operator mentioned above to obtain the final insight depth score.

[0016] Furthermore, the visualization includes: For each leaf node topic, the top k high-scoring analysis conclusions are filtered using the maximum boundary correlation algorithm, and the original evidence chain associated with them, including chart screenshots and SQL code, is packaged to generate a leaf-level insight view; Using a bottom-up recursive logic, the analysis conclusions of the selected leaf nodes are used as context by a pre-defined large language model to generate a summary of the current parent node, which serves as an overview view of the node. Map each view to an ID in the topic tree structure and store it as a graph database or a hierarchical JSON structure. The default view displays a panoramic overview of the root node; when a user clicks on a branch, the overview view of the nodes under that branch is retrieved in real time; when the user locates a specific leaf node topic, the detailed content and multimodal evidence of that leaf node topic are fully rendered. During the analysis task execution, the task status of each agent is monitored in real time; once a branch completes the verification and scoring, it is dynamically mounted to the topic tree and pushed to the front end; users can view the analysis progress percentage in real time and drill down and view the generated insight nodes in advance without waiting for the entire task to be completed.

[0017] In a second aspect, a computer-readable storage medium stores a computer program that, when executed by a processor, implements the steps of the adaptive hierarchical multidimensional data insight analysis method based on swarm intelligence.

[0018] Thirdly, an adaptive hierarchical multidimensional data insight analysis device based on swarm intelligence includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the steps of the adaptive hierarchical multidimensional data insight analysis method based on swarm intelligence.

[0019] The advantages of this invention compared to the prior art are: 1. This invention achieves data adaptation and anti-illusion in analysis tasks. By introducing innovative data exploration and pruning mechanisms, it can perceive the richness and accessibility of data before the analysis task is executed, automatically eliminating invalid branches without data support. This not only avoids wasting computing power but also eliminates the phenomenon of "making things up" by intelligent agents due to a lack of data from the source, ensuring the practicality of the analysis framework.

[0020] 2. Emerging highly original and scarce insights. This invention utilizes swarm intelligence combined with density clustering evaluation algorithms to break through the limitations of traditional single-entity analysis. The system can automatically identify and reward minority viewpoints that are "highly credible but located in low-density areas," thereby uncovering non-explicit, hidden correlations and high-value sparse insights in the data, significantly improving the originality of analytical conclusions and their decision-making reference value.

[0021] 3. A closed-loop credible verification system for multimodal data was established. For chart data commonly used in business analysis, this invention proposes a visual backtracking verification method. Utilizing a multimodal model as an independent checker, it cross-verifies the consistency between chart visual features and textual conclusions. Combined with code replay of structured data, this ensures that all insights and conclusions are verifiable and credible.

[0022] 4. Enhanced interactive analysis experience from macro to micro perspectives. This invention abandons the traditional static reporting model and constructs an interactive insight topology. Users can freely drill down and switch between "macro panoramic overview" and "micro evidence details" based on a dynamic topic tree, satisfying both management's need for overall control and the execution layer's need for data source tracing. Attached Figure Description

[0023] Various other advantages and benefits will become apparent to those skilled in the art upon reading the following detailed description of preferred embodiments. The accompanying drawings are for illustrative purposes only and are not intended to limit the invention. Furthermore, the same reference numerals denote the same parts throughout the drawings. In the drawings: Figure 1 This is a system architecture diagram of the present invention; Figure 2 This is a flowchart of the method of the present invention. Detailed Implementation

[0024] To better understand the above technical solutions, the technical solutions of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the embodiments of the present invention and the specific features in the embodiments are detailed descriptions of the technical solutions of the present invention, rather than limitations on the technical solutions of the present invention. In the absence of conflict, the embodiments of the present invention and the technical features in the embodiments can be combined with each other.

[0025] The following description, in conjunction with the accompanying drawings, provides a more detailed explanation of an adaptive hierarchical multidimensional data insight analysis method based on swarm intelligence provided by an embodiment of the present invention.

[0026] Example 1: An Adaptive Hierarchical Multidimensional Data Insight and Analysis Method Based on Swarm Intelligence Receive user input regarding the target topic to be analyzed and the dataset to be analyzed; Based on the dataset to be analyzed, the target topic to be analyzed is decomposed into a hierarchical topic tree using a pre-defined large language model; the hierarchical topic tree includes a parent topic, child topics, and leaf node topics; Based on a pre-defined large language model, a group analysis agent is constructed. The semantic attributes of the topic at each leaf node of the topic tree are analyzed to generate differentiated strategies for each group analysis agent. Then, several group analysis agents are instantiated according to different strategies. Each group analysis agent is configured with analysis perspective prompts and autonomous decision parameters. Each group analysis agent performs multiple rounds of search on the dataset to be analyzed based on the corresponding leaf node topic and records the analysis link index to obtain the analysis results of each group analysis agent; A verification and scoring agent is constructed based on a pre-defined large language model. The verification and scoring agent is then used to quantify and score the analysis results of each group analysis agent using pre-defined evaluation indicators. The pre-defined evaluation indicators include credibility, originality, topic relevance, and insight depth. Based on the quantitative scoring results, the analysis results of the leaf node topics are selected, and each selected leaf node topic and its parent topic are encapsulated as an independent data insight node; An index is built according to the hierarchical relationship of the topic tree. An interactive insight knowledge topology is constructed and visualized through data insight nodes. In response to the user's drill-down interaction command, the analysis results of the child node topics at each level are dynamically loaded and displayed starting from the root node of the topic tree, until the analysis results of the leaf node topics are expanded as needed.

[0027] Example 2: An Adaptive Hierarchical Multidimensional Data Insight and Analysis System Based on Swarm Intelligence like Figure 1 As shown, this embodiment provides an adaptive hierarchical multidimensional data insight and analysis system based on swarm intelligence. The system is deployed on a cloud server cluster or high-performance computing center and is logically divided into three core modules: a topic tree construction module 101, a swarm analysis and verification module 102, and an interactive visualization engine 103.

[0028] 1. Topic Tree Construction Module 101: This module is used to initialize the analysis task and plan the analysis path, specifically covering data access, recursive decomposition and validity verification functions.

[0029] Data access unit: Used to receive natural language target topics (such as "commercial space trends") input by users and multimodal datasets to be analyzed. The datasets include structured databases (such as launch record sheets), unstructured documents (PDF / Word reports), and image data.

[0030] The Recursive Decomposition and Exploration Unit: Driven by a built-in large language model, this unit uses recursive logic to decompose the target topic into parent topics, child topics, and leaf nodes. During the decomposition process, this unit simultaneously schedules lightweight data probes to perform pre-retrieval (SQL Count or keyword matching) on ​​each candidate subtopic within the dataset, calculating data richness metrics.

[0031] Adaptive pruning logic: Based on the exploration results, this module automatically prunes branches with richness below a preset threshold, or triggers the next round of recursion for overly broad topics, ultimately outputting a hierarchical topic tree structure that has been validated by data, ensuring that subsequent analysis is "based on evidence".

[0032] 2. Group Analysis and Verification Module 102: This module is the core computing unit of the system, used to schedule the intelligent agent cluster in parallel to perform deep analysis and conduct rigorous quality audits on the conclusions. Specifically, it includes: The strategy planning and distribution unit (executed by the strategy planning agent) dynamically generates differentiated analysis strategies (such as conservative and aggressive) for the leaf nodes of the topic tree and instantiates n (n>2) group analysis agents.

[0033] Multimodal Autonomous Exploration Unit: Drives each agent to execute the "perception-planning-action-evaluation" cycle. This unit has heterogeneous data processing capabilities, can query structured data through an SQL generator, parse charts and images using a multimodal model, and perform semantic alignment and fusion of data features from different modalities, while also fully recording the link index of the analysis process.

[0034] Four-dimensional evaluation and auditing unit: Built-in code execution sandbox and vector database for calculating four-dimensional indicators of analysis conclusions: Credibility: Verify numerical accuracy through "code replay" and verify consistency between text and images through "visual backtracking"; Originality: Constructing a local semantic space using density clustering algorithms (such as DBSCAN) to identify and reward highly reliable sparse viewpoints (outliers); Topic relevance: Determining whether the conclusion is off-topic based on vector similarity; Insight depth: Logical complexity of the operator weight calculation toolchain.

[0035] 3. Interactive Visualization Engine 103: This module is used to transform complex analysis results into user-friendly visualizations, supporting dynamic interaction throughout the entire process.

[0036] Topology generation unit: The maximum boundary correlation (MMR) algorithm is used to select high-scoring conclusions, and based on the topic tree hierarchy, various levels of reviews are generated from bottom to top to construct an interactive insight knowledge topology.

[0037] Asynchronous rendering interface: Supports incremental rendering mechanism and can monitor the analysis progress of module 102 in real time. Once a branch has completed verification and scoring, the topology node is immediately highlighted for users to view, without waiting for the entire task to finish.

[0038] Source Tracing Interactive View: Responds to user drill-down commands, providing the ability to smoothly switch from a panoramic overview of the root node to a detailed report of the leaf node, and supports displaying the underlying raw evidence (such as SQL statements and original chart images), enabling the interpretation and traceability of analysis results.

[0039] Example 3: Exploratory Insight Analysis Method Based on Commercial Space Data Based on the aforementioned system, this embodiment performs the following processing steps for the unstructured objective of "potential trends and risks in the commercial space industry" (refer to...). Figure 2 flow chart): Step S1: Multimodal task initialization and data access The system receives a natural language command from the user: "Gain insights into potential technological risks and investment opportunities in the current commercial space industry." The system then loads a hybrid dataset via data interface module 101, containing approximately 50,000 satellite launch records from the past decade, 3,000 technical patent documents, and related financing data.

[0040] Step S2: Data accessibility verification and adaptive topic tree construction.

[0041] Data richness calculation model: The system uses a quantity- and quality-weighted dual-factor model to calculate the data richness of candidate subtopics. The calculation logic is as follows: 1. Structured data scoring It primarily measures the coverage of data records. The formula is:

[0042] in, To query the number of rows that were matched using SQL Count, This represents the total number of rows in the database or the total amount of data in the current parent node. Logarithmic processing is used to avoid excessive numerical deviations under massive datasets.

[0043] 2. Scoring of unstructured data It primarily measures the relevance and density of text with multimodal data. To ensure real-time performance, the system only performs calculations on metadata or summaries, without scanning the entire text. The formula is:

[0044] in: It represents the number of documents or images containing the keywords; This is a preset baseline number (e.g., 50 articles based on historical experience). It is the average cosine similarity between the candidate subtopic vector and the top-K retrieved document summary vectors, used to remove "noisy data" that contain keywords but are semantically irrelevant; α and β: These are weighting coefficients, usually set to α=0.4 and β=0.6, prioritizing data quality.

[0045] 3. Overall richness index:

[0046] The system dynamically adjusts the weights based on the task type. For example, in this embodiment, which focuses on "trend analysis" and emphasizes unstructured reports, w1=0.3 and w2=0.7 are set.

[0047] Dynamic threshold determination logic: The system uses dynamic relative thresholds.

[0048]

[0049] In other words, if the richness score of a candidate subtopic is less than 10% of the score of its parent topic, it means that the branch is too unpopular or the data is extremely sparse, and the system will perform a pruning operation.

[0050] The topic tree construction module 101 first decomposes the target topic into a first-level set of candidate subtopics, such as: {"Low-Earth Orbit Constellation Network", "Asteroid Mineral Development", "Liquid Oxygen Methane Engine Technology", "Space Tourism Economy"}.

[0051] In the recursive decomposition process described above, the topic planning agent performs the decomposition task based on a preset prompt template (PromptTemplate) to ensure that subtopics meet the principles of mutual independence and complete exhaustiveness. The specific prompt structure used is as follows: "You are a senior data analysis architect, skilled at breaking down complex business or technical problems into structured analysis trees. Please perform the next level of logical decomposition for the target topic {Target_Topic}. Requirements: 1. Decomposition dimensions must meet the principles of mutual independence and complete exhaustiveness; 2. Identify the 'entity class' keywords for subsequent database queries; 3. Output in JSON format. Constraints: The number of subtopics should be controlled between 3 and 5; it must include at least one financial / capital perspective and one technical / product perspective." The system embeds the user-input target topic into the {Target_Topic} slot of the above template and sends it to the large language model, which then outputs the candidate subtopic list in JSON format.

[0052] Subsequently, the module performs synchronous data probing on the candidate subtopics: For the keyword "asteroid mineral development," a keyword probe was generated and searched in patent documents and financing databases. The returned data richness index was 0.02 (below the preset threshold of 0.15). The system determined that this direction lacked effective data support and performed a pruning operation, removing this branch. Furthermore, the system performed a backtracking retention check: Suppose the parent topic "space tourism economy" was broken down into two subtopics: "suborbital flight ticket prices" and "space hotel construction." Probe searches revealed that although there were abundant research reports related to "space tourism economy" overall, the data hit rate for both of the subtopics was 0 (data was too sparse). At this point, the system determined that the breakdown was excessive, automatically canceled the breakdown operation for "space tourism economy," and reverted the parent topic to a leaf node, directly performing subsequent analysis on its entirety.

[0053] For the "liquid oxygen methane engine technology," the structured launch records returned by the probe showed high correlation and rich technical documentation (richness index 0.85). The system retains this branch and triggers the next round of recursive decomposition to generate a topic tree containing leaf nodes such as "thrust parameter evolution" and "capital input and technology output ratio."

[0054] Ultimately, the system constructs a hierarchical topic tree for analysis that has been validated by data verification.

[0055] Step S3: Instantiation and Policy Configuration of the Crowd Analysis Agent For the leaf node "Technology and Capital Relationship of Liquid Oxygen-Methane Engines" in the topic tree, the system instantiates 10 (n=10) group analysis agents. To avoid homogenization of analytical conclusions and the emergence of sparse insights, the system applies a differentiated configuration strategy to these 10 agents: 1. Role Heterogeneity: 4 agents are configured as "Conservative Data Analysts" (focusing on statistical significance), 3 as "Aggressive Venture Capitalists" (focusing on weak signals and potential high returns), and 3 as "Technology Skeptics" (emphasizing uncovering data contradictions). 2. Parameter Perturbation: For the "Routine Analysis" agents, a lower generation temperature (Temperature=0.2) is set to ensure accuracy; for the "Exploratory" agents, a higher temperature (Temperature=0.8) is set to stimulate creative associations.

[0056] To achieve automated heterogeneous configuration of the aforementioned roles, the policy planning agent executes the following "policy generation prompt" instruction: "Current leaf node topic: {Leaf_Topic}; Number of agents to be configured: {N}. Please assign different analysis roles and parameter configurations to these N agents based on topic attributes (deterministic vs. exploratory). The output format is JSON." Example output: { "agents": [ {"id": 1, "role_prompt": "You are a conservative data analyst who only believes in statistical significance...", "temperature": 0.2}, {"id": 2, "role_prompt": "You are an aggressive venture capitalist who makes bold predictions...", "temperature": 0.8} ... ]}. The system parses this JSON output and automatically initializes and instantiates the agent cluster based on the role_prompt and temperature fields.

[0057] Step S4: Autonomous Multi-round Exploration and Multimodal Evidence Retention Each agent executes the analysis task in parallel. Taking agent A as an example, its execution path is as follows: The first round of exploration (structured statistics): Agent A generated an SQL statement to query a financing database, identifying an exponential growth trend in capital inflows into this sector over the past three years. Specifically, when generating the query, Agent A invoked the built-in SQL generation module. This module transformed natural language thinking into executable code prompts as follows: "You are an SQL database expert. Table structure information is as follows: {Schema_Info}; Query intent: {Agent_Thought_Process}. Rules: Output only SQL statements, excluding Markdown formatting. Standard aggregate functions must be used." Second round of exploration (unstructured parsing): Agent A retrieves multiple technical test reports (PDFs) and uses the built-in multimodal large model to identify the "engine specific impulse (Isp)-time characteristic curve" in the reports.

[0058] Semantic Alignment and Fusion: Agent A extracts curve slope features from the image and discovers that the specific impulse parameter in the actual test did not increase in sync with capital investment, but instead showed a technological bottleneck plateau. Agent A semantically aligns "high capital enthusiasm" with "stagnation of technological parameters," generating the preliminary conclusion that "there is a risk of capital-technology inversion."

[0059] Evidence Index: The system records the SQL statements called by agent A, the IDs of the referenced PDF documents, and the coordinate range of the charts to construct an analysis link index.

[0060] Step S5: Multi-dimensional indicator verification and quantitative evaluation. The audit unit in the group analysis and verification module 102 evaluates the conclusions of all agents, specifically using the following quantitative calculation model: 1. Calculation of topic relevance The system first extracts the core vector of the conclusion of agent A. leaf node topic vector Relevance score is calculated using cosine similarity. :

[0061] like If the value is below a preset threshold (e.g., 0.75), the conclusion is considered to be off-topic.

[0062] 2. Insight into Deep Computing The system calculates a depth score based on the agent's behavior-based weighted scoring method. .

[0063] The system pre-defines an analysis operator weight set, categorizing the tools available to the agent into three levels: Basic operators (Weight=1): Single table query (SQL Select), simple counting, text extraction.

[0064] Advanced operators (Weight=2): multi-table join, trend comparison, outlier detection, and chart recognition.

[0065] Higher-order operators (Weight=3): Attribution analysis, predictive model invocation, cross-modal logic alignment.

[0066] The evidence chain (trace) generated by the agent for this conclusion is traversed, the weights of all operations are summed, and then multiplied by a confidence coefficient. The formula is:

[0067] in: The confidence score (0~1) is calculated in the previous step. If it is not reliable, the depth is directly reduced to zero. This indicates that the operator weights called in all steps i on this link are accumulated.

[0068] For example, in this embodiment, the execution path of agent A is: SQL query (1 point) -> chart recognition (2 points) -> data alignment (3 points). Its basic depth is scored as 1 + 2 + 3 = 6 points. If the credibility... = 0.9, then the final depth score = 0.9 6 = 5.4.

[0069] 3. Credibility Verification Structured verification: Re-execute the SQL statements generated by agent A in the sandbox to confirm that the financing data trend is correct.

[0070] Visual backtesting verification: The system calls a multimodal large language model to objectively identify the original charts and transform them into visual fact descriptions in plain text form. The system introduces an independent verification large language model to act as a judge, constructing logical comparison instructions. These logical verification instructions adopt a "premise-hypothesis-verification" thought chain structure, with the specific prompts as follows: "You are an impartial logical verification judge. Your task is to determine whether the visual facts support the analysis conclusion. Visual fact description: {Visual_Description_Text}, Agent analysis conclusion: {Agent_Conclusion}. Please reason whether there is a logical contradiction between the two. If the visual facts support the conclusion, output: PASS; if the visual facts contradict the conclusion, output: CONFLICT; if the visual facts are irrelevant to the conclusion, output: IRRELEVANT." The system uses the above visual fact description as supporting evidence and the analysis conclusion generated by the agent as the object of review, requiring the judge to determine whether the supporting evidence supports the object of review. The model performs language inference to identify whether there are logical contradictions such as data conflicts, opposite trends, or misattribution. Finally, the system makes a judgment based on the judge's output. If the model outputs "Logically Supported", then the conclusion is considered true and credible, and the credibility level is marked as . If the verification is successful, and the output is "logical conflict" or "irrelevant," then it is determined to be a hallucination conclusion, and the credibility level is marked as [value missing]. .

[0071] In this embodiment, an independent visual model is invoked to perform secondary recognition on the characteristic curve referenced by agent A, outputting a semantic description of the chart: "The curve tends to flatten after 2023." This description is consistent with agent A's conclusion, and the verification is passed; therefore, the credibility is marked as [value missing]. .

[0072] 4. Originality in Calculation The system maps the conclusions of the 10 agents into a high-dimensional semantic vector set. The DBSCAN algorithm was used for clustering. The results showed that the conclusion vectors of eight agents clustered into a high-density conventional cluster of "rapidly developing industry," with centroids of [missing information]. The conclusion vector of agent A; Located in a low-density area (outlier). Originality score. Based on Euclidean distance calculation, and introducing "truth constraint" logic:

[0073] Because agent A passed the credibility check ( Furthermore, since it deviates significantly from conventional viewpoints, the system identifies it as a high-value sparse insight and assigns it a high score for originality.

[0074] Step S6: Interactive Insight Topology Building The interactive visualization engine 103 generates a "Commercial Aerospace Industry Insight Topology Map" based on the optimized results.

[0075] When constructing the interactive topology, in order to generate an intermediate-level overview view of the parent node (the recursive summary of the parent node in step S6), the system calls the following recursive summary prompt: "Parent Topic: {Parent_Topic}. List of high-scoring insights for subordinate child nodes: 1. {Child_Insight_1} (Confidence: 0.9); 2. {Child_Insight_2} (Confidence: 0.85)... Based on the above child node evidence, please generate a summary summary of the parent node. Requirements: retain key data indicators; indicate the logical relationship between different sub-viewpoints (whether they corroborate each other or are contradictory); do not supplement information not mentioned in the list through association." Topology rendering and incremental feedback: Due to the deep inference involving 50,000 records and 3,000 documents, the full analysis is expected to take a long time (e.g., several hours). Therefore, the system enters a background asynchronous analysis mode after startup. A task progress bar is displayed in real-time at the top of the interface (e.g., "Current progress: 15%, analyzing: 'Liquid oxygen methane engine' branch"). As the analysis progresses, completed "Capital Trends" branches will automatically "light up" and change from gray to solid in the topology graph. Users do not need to wait for the entire task to finish; they can click on the lit nodes to view the intermediate conclusion of "Capital-Technology Inversion" produced by agent A. The system supports full drill-down interaction on completed parts while the analysis is in progress, achieving a balance between deep analysis and interactive experience.

[0076] Example 4: Electronic Equipment This invention provides a computer-readable storage medium storing computer instructions that, when executed on a computer, cause the computer to perform... Figure 2 The method described.

[0077] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage and optical storage) containing computer-usable program code.

[0078] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0079] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0080] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the functions specified in one or more boxes. Obviously, those skilled in the art can make various modifications and variations to this invention without departing from the spirit and scope of the invention. Therefore, if these modifications and variations of the invention fall within the scope of the claims of the invention and their equivalents, the invention is also intended to include these modifications and variations.

[0081] The contents not described in detail in this specification are common knowledge to those skilled in the art.

Claims

1. An adaptive hierarchical multidimensional data insight analysis method based on swarm intelligence, characterized in that, include: Receive user input regarding the target topic to be analyzed and the dataset to be analyzed; Based on the dataset to be analyzed, the target topic to be analyzed is decomposed into a hierarchical topic tree using a pre-defined large language model; The hierarchical topic tree includes a parent topic, child topics, and leaf node topics; Based on a pre-defined large language model, a group analysis agent is constructed. The semantic attributes of the topic at each leaf node of the topic tree are analyzed to generate differentiated strategies for each group analysis agent. Then, several group analysis agents are instantiated according to different strategies. Each group analysis agent is configured with analysis perspective prompts and autonomous decision parameters. Each group analysis agent performs multiple rounds of search on the dataset to be analyzed based on the corresponding leaf node topic and records the analysis link index to obtain the analysis results of each group analysis agent; A verification and scoring agent is constructed based on a pre-defined large language model. The verification and scoring agent is then used to quantify and score the analysis results of each group analysis agent using pre-defined evaluation indicators. The pre-defined evaluation indicators include credibility, originality, topic relevance, and insight depth. Based on the quantitative scoring results, the analysis results of the leaf node topics are selected, and each selected leaf node topic and its parent topic are encapsulated as an independent data insight node; An index is built according to the hierarchical relationship of the topic tree. An interactive insight knowledge topology is constructed and visualized through data insight nodes. In response to the user's drill-down interaction command, the analysis results of the child node topics at each level are dynamically loaded and displayed starting from the root node of the topic tree, until the analysis results of the leaf node topics are expanded as needed.

2. The adaptive hierarchical multidimensional data insight analysis method based on swarm intelligence according to claim 1, characterized in that, The step of decomposing the target topic to be analyzed into a hierarchical topic tree includes: S21, Construct a topic planning agent using a preset large language model, and generate a set of candidate subtopic directions using the topic planning agent based on the topic at the current level; S22, Generate metadata query instructions for each group of candidate subtopics and perform queries in the dataset to be analyzed; S23. For structured data in the query results, calculate the normalized record coverage rate; for unstructured data in the query results, calculate the data score using a weighted algorithm based on keyword matching frequency or density and vector semantic similarity; merge the record coverage rate and data score to generate a comprehensive richness index; if the comprehensive richness index is lower than the preset dynamic support threshold, the direction is determined to be unanalyzable, and the candidate subtopic is removed, completing the pruning; if the comprehensive richness index is higher than the preset maximum value and the topic range exceeds the preset range, trigger the next round of recursive decomposition, and use the subtopic direction as a new parent node to continue decomposition; if after recursive decomposition, all candidate subtopic directions generated by a parent topic are determined to be invalid pruning, trigger the backtracking retention mechanism, terminate the further decomposition of the parent topic, and directly mark it as a leaf node for retention; S24. Repeat S22~S23 until the data richness of all leaf node topics is within the preset range and the granularity meets the preset requirements, or the set upper limit level is reached, then lock the structure as the final topic tree.

3. The adaptive hierarchical multidimensional data insight analysis method based on swarm intelligence according to claim 1, characterized in that, The multi-round search includes: When the retrieval object is an image or chart, the swarm analysis agent calls a preset multimodal large language model to semantically describe the chart and convert it into text context; The swarm analysis agent iteratively completes a closed-loop task: generating search instructions based on a preset topic, executing the search, viewing the search results, and determining whether the search results meet the preset topic requirements. If data is missing, the swarm analysis agent generates a new query to supplement it; if data contradictions are found, a comparative retrieval is triggered; after executing the closed-loop task for the maximum number of rounds, structured statistical values ​​and unstructured descriptions are obtained. By semantically aligning and fusing structured statistics with unstructured descriptions, the analysis results of the swarm analysis agent are generated.

4. The adaptive hierarchical multidimensional data insight analysis method based on swarm intelligence according to claim 1, characterized in that, The method for calculating the topic relevance includes: The core viewpoints are extracted from the analysis results using a large language model, and a conclusion title is generated. Convert the conclusion title into a vector. The topic descriptions of the leaf nodes that the scoring agent is responsible for are converted into vectors. ,calculate and Cosine similarity between them; Cosine similarity is used as the topic relevance score; if it is lower than the preset topic relevance threshold, it is marked as invalid.

5. The adaptive hierarchical multidimensional data insight analysis method based on swarm intelligence according to claim 1, characterized in that, The method for calculating the credibility includes: For structured data, extract the numerical values ​​from the analysis results and replay the historical query code for comparison; For unstructured chart data, visual verification is performed: First, a preset visual large language model is called to convert the chart into visual factual description text. Then, a logical verification instruction is constructed, taking the visual factual description text as a premise and the analysis result of the group analysis agent as a hypothesis, and inputting it into an independent large language model for logical reasoning. The analysis result is confirmed to pass the verification if and only if the large language model determines that there is a logical support relationship between the two.

6. The adaptive hierarchical multidimensional data insight analysis method based on swarm intelligence according to claim 1, characterized in that, The original calculation method includes: For the analysis results of several group analysis agents under the same leaf node topic, the vector embedding model is used to map them into high-dimensional semantic vectors. Density clustering is performed on the high-dimensional semantic vector to identify the regular opinion clusters containing the majority of conclusions and the outliers or sparse clusters containing the minority of conclusions. Calculate the Euclidean distance between the conclusion vector to be evaluated and the centroid of the conventional viewpoint cluster. The greater the distance, the higher the initial originality score. The originality score is retained only if the credibility score of the analysis result is higher than the preset safety threshold; otherwise, if the credibility is low and the result is far from the centroid, it is judged as an abnormal hallucination, and the originality score is directly reduced to zero.

7. The adaptive hierarchical multidimensional data insight analysis method based on swarm intelligence according to claim 1, characterized in that, The method for calculating the insight depth is as follows: Predefine the complexity weights of various operators in the analysis toolkit; The number of various operators invoked by the swarm analysis agent during the generation of analysis results and the cumulative operator weights are calculated. The credibility score is used as a coefficient and multiplied by the weight of the cumulative operator mentioned above to obtain the final insight depth score.

8. The adaptive hierarchical multidimensional data insight analysis method based on swarm intelligence according to claim 1, characterized in that, The visualization includes: For each leaf node topic, the top k high-scoring analysis conclusions are filtered using the maximum boundary correlation algorithm, and the original evidence chain associated with them, including chart screenshots and SQL code, is packaged to generate a leaf-level insight view; Using a bottom-up recursive logic, the analysis conclusions of the selected leaf nodes are used as context by a pre-defined large language model to generate a summary of the current parent node, which serves as an overview view of the node. Map each view to an ID in the topic tree structure and store it as a graph database or a hierarchical JSON structure. The default view displays a panoramic overview of the root node; when a user clicks on a branch, the overview view of the nodes under that branch is retrieved in real time; when the user locates a specific leaf node topic, the detailed content and multimodal evidence of that leaf node topic are fully rendered. During the analysis task execution, the task status of each agent is monitored in real time; once a branch completes the verification and scoring, it is dynamically mounted to the topic tree and pushed to the front end; users can view the analysis progress percentage in real time and drill down and view the generated insight nodes in advance without waiting for the entire task to be completed.

9. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method as described in any one of claims 1 to 8.

10. An adaptive hierarchical multidimensional data insight analysis device based on swarm intelligence, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that: When the processor executes the computer program, it implements the steps of the method as described in any one of claims 1 to 8.