Artificial intelligence-based data analysis method and device, computer device and medium
By employing AI-based data analysis methods, combined with low-quality content cluster centers, multi-path parallel retrieval, and root cause analysis of target large models, the accuracy and efficiency issues of rule engines in content quality identification were resolved, achieving efficient content correction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA PING AN PROPERTY INSURANCE CO LTD
- Filing Date
- 2026-03-06
- Publication Date
- 2026-06-12
AI Technical Summary
Existing keyword-based rule engines have low accuracy in content quality identification and lack root cause analysis capabilities, resulting in low efficiency in content correction.
Using an AI-based data analysis method, distance calculation is performed through the center of low-quality content clusters to generate quality identification results. Multi-path parallel retrieval and root cause analysis are then performed, and a correction strategy is generated using a target large model to automatically execute content correction.
It improves the accuracy and efficiency of content quality identification and correction, and enables automated root cause analysis and content correction.
Smart Images

Figure CN122196155A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology and can be applied to the financial technology field, particularly to data analysis methods, devices, computer equipment and storage media based on artificial intelligence. Background Technology
[0002] In existing insurance knowledge bases and content platforms, content quality management primarily relies on keyword-based rule engines for quality identification. This method, in its operational mechanism, can only identify pre-defined, superficial issues; for example, when certain sensitive words appear in content, the rule engine will determine that the content has quality problems. However, this method has significant drawbacks. It cannot deeply understand the semantic quality of the content, greatly reducing the accuracy of quality identification. Furthermore, due to the lack of root cause analysis capabilities, it is difficult to accurately pinpoint the root cause of content quality problems, thus failing to automatically correct the content, resulting in extremely low efficiency in the content correction process.
[0003] For example, in the context of interpreting and reviewing insurance policy terms in the financial and insurance sector, traditional keyword-based rule engines may only be able to identify whether the terms contain words marked as high-risk, such as "exclusion" or "limitation." However, the meanings of these words may differ in different contexts, and the rule engine cannot determine whether they truly affect the reasonableness and accuracy of the terms based on the semantic context. If the "exclusion" section of a term is a reasonable explanation of a specific force majeure event, the rule engine may mistakenly judge the term as problematic simply because it detected the word. Moreover, since it cannot analyze whether the problem stems from unclear wording or flawed term design, it cannot automatically correct the content, requiring significant manual effort for investigation and processing, severely impacting the efficiency of content review and management.
[0004] Therefore, there is an urgent need to provide an intelligent content quality identification and management method to improve the accuracy of quality identification, enhance the efficiency of content correction, and thus optimize the content management performance of insurance knowledge bases and content platforms. Summary of the Invention
[0005] The purpose of this application is to propose a data analysis method, apparatus, computer device, and storage medium based on artificial intelligence, in order to solve the technical problems that existing keyword-based rule engines have low accuracy in content quality identification and that the lack of root cause analysis capabilities leads to low efficiency in the content correction process.
[0006] Firstly, an artificial intelligence-based data analysis method is provided, including: Obtain the target content data to be processed; The target content data is processed by distance calculation based on the preset low-quality content cluster center to obtain the corresponding distance data; Based on the distance data, a quality identification result of the target content data is generated; If the quality identification result is suspected low-quality content, then the target content data is subjected to multi-path parallel retrieval processing to obtain the corresponding multi-source retrieval information; Based on a preset target big model, root cause analysis is performed on the target content data and the multi-source retrieval information to obtain the corresponding root cause analysis results. The engine generates a target correction method corresponding to the root cause analysis results based on a preset correction strategy. Based on the target correction method, content correction processing is performed on the target content data.
[0007] Secondly, an artificial intelligence-based data analysis device is provided, including: The acquisition module is used to acquire the target content data to be processed. The calculation module is used to perform distance calculation processing on the target content data based on the preset low-quality content cluster center to obtain the corresponding distance data; The first generation module is used to generate a quality identification result of the target content data based on the distance data; The retrieval module is used to perform multi-path parallel retrieval processing on the target content data to obtain corresponding multi-source retrieval information if the quality identification result is suspected low-quality content; The analysis module is used to perform root cause analysis on the target content data and the multi-source retrieval information based on a preset target big model, and obtain the corresponding root cause analysis results. The second generation module is used to generate a target correction method corresponding to the root cause analysis results based on a preset correction strategy engine. The processing module is used to perform content correction processing on the target content data based on the target correction method.
[0008] Thirdly, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the above-described artificial intelligence-based data analysis method.
[0009] Fourthly, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, implements the steps of the aforementioned artificial intelligence-based data analysis method.
[0010] In the aforementioned scheme implemented by the data analysis method, device, computer equipment, and storage medium based on artificial intelligence, the target content data to be processed is first acquired; then, distance calculation processing is performed on the target content data based on a preset low-quality content cluster center to obtain corresponding distance data; then, a quality identification result of the target content data is generated based on the distance data; if the quality identification result is suspected low-quality content, multi-path parallel retrieval processing is performed on the target content data to obtain corresponding multi-source retrieval information; subsequently, root cause analysis processing is performed on the target content data and the multi-source retrieval information based on a preset target big model to obtain corresponding root cause analysis results; subsequently, a target correction method corresponding to the root cause analysis results is generated based on a preset correction strategy engine; finally, content correction processing for the target content data is performed based on the target correction method. Based on the above automated processing flow, this application calculates distance data from the target content data to be processed using a low-quality content cluster center. Then, based on the distance data, it accurately generates quality identification results for the target content data. If the quality identification result indicates suspected low-quality content, it performs multi-path parallel retrieval processing on the target content data to obtain multi-source retrieval information. Subsequently, based on a target large model, it performs automated root cause analysis on the target content data and multi-source retrieval information to obtain root cause analysis results. Following this, based on a correction strategy engine, it automatically generates a target correction method corresponding to the root cause analysis results, and then automatically executes content correction processing for the target content data based on the obtained target correction method. Thus, this application, by employing an automated processing flow combining distance-based low-quality content identification, multi-source root cause analysis based on a target large model, and strategy-based automated correction based on a correction strategy engine, can effectively improve the accuracy of content quality identification and achieve automated root cause analysis of low-quality content. Furthermore, it can automatically perform corresponding content correction processing based on the obtained root cause analysis, effectively improving the processing efficiency of content correction. Attached Figure Description
[0011] To more clearly illustrate the solutions in this application, the accompanying drawings used in the description of the embodiments of this application will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0012] Figure 1 This is an exemplary system architecture diagram to which this application can be applied; Figure 2 This is a flowchart of an embodiment of the artificial intelligence-based data analysis method according to this application; Figure 3This is a schematic diagram of a structure of an embodiment of the artificial intelligence-based data analysis device according to this application; Figure 4 This is a schematic diagram of the structure of one embodiment of the computer device according to this application. Detailed Implementation
[0013] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains; the terminology used herein in the specification of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having," and any variations thereof, in the specification, claims, and foregoing drawings of this application, are intended to cover non-exclusive inclusion. The terms "first," "second," etc., in the specification, claims, or foregoing drawings of this application are used to distinguish different objects, not to describe a particular order.
[0014] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.
[0015] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
[0016] like Figure 1 As shown, system architecture 100 may include terminal device 101, network 102, and server 103. Terminal device 101 may be a laptop 1011, tablet 1012, or mobile phone 1013. Network 102 is used as a medium to provide a communication link between terminal device 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, etc.
[0017] Users can use terminal device 101 to interact with server 103 via network 102 to receive or send messages, etc. Various communication client applications can be installed on terminal device 101, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social media platform software, etc.
[0018] Terminal device 101 can be various electronic devices with a display screen and support web browsing. In addition to laptops 1011, tablets 1012, or mobile phones 1013, terminal device 101 can also be an e-book reader, an MP3 player (Moving Picture Experts Group Audio Layer III), an MP4 player (Moving Picture Experts Group Audio Layer IV), a laptop computer, and a desktop computer, etc.
[0019] Server 103 can be a server that provides various services, such as a backend server that provides support for the pages displayed on terminal device 101.
[0020] It should be noted that the artificial intelligence-based data analysis method provided in this application embodiment is generally executed by a server / terminal device, and correspondingly, the artificial intelligence-based data analysis device is generally set in the server / terminal device.
[0021] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.
[0022] Continue to refer to Figure 2 The flowchart illustrates an embodiment of the AI-based data analysis method according to this application. The order of steps in the flowchart can be changed, and some steps can be omitted, depending on different needs. The AI-based data analysis method provided in this application can be applied to any scenario requiring content analysis, and therefore can be applied to products in these scenarios, such as content analysis products in the financial and insurance fields. The AI-based data analysis method includes the following steps: Step S201: Obtain the target content data to be processed.
[0023] In this embodiment, the data analysis method based on artificial intelligence runs on the electronic device (e.g., Figure 1The server / terminal device shown can acquire the target content data to be processed via wired or wireless connection. It should be noted that the aforementioned wireless connection methods may include, but are not limited to, 3G / 4G / 5G connections, WiFi connections, Bluetooth connections, WiMAX connections, Zigbee connections, UWB (ultra-wideband) connections, and other currently known or future-developed wireless connection methods. The specific implementing entity of this application is a data analysis system, or quality management system, applied to financial enterprises, which can be simply referred to as the system. This application can be applied to content analysis scenarios in the financial and insurance fields, such as content quality management scenarios in insurance knowledge bases, content platforms, or intelligent customer service systems. The aforementioned target content data to be processed can be newly entered content into the system, such as newly entered insurance knowledge content into the system's insurance knowledge base.
[0024] Step S202: Perform distance calculation processing on the target content data based on the preset low-quality content cluster center to obtain the corresponding distance data.
[0025] In this embodiment, the specific implementation process of performing distance calculation on the target content data based on the preset low-quality content cluster center to obtain the corresponding distance data will be further described in detail in subsequent specific embodiments of this application, and will not be elaborated on here.
[0026] Step S203: Generate a quality identification result of the target content data based on the distance data.
[0027] In this embodiment, the judgment can be made based on the calculated distance data. If the vector data of the target content data is close to the center of the aforementioned low-quality content cluster, i.e., the distance value is less than a certain preset threshold (this threshold can be adjusted and optimized according to the actual situation), it is marked as "suspected low-quality content". These marked "suspected low-quality content" will provide data to be processed for subsequent root cause analysis, so as to further analyze the reasons for its low quality.
[0028] Step S204: If the quality identification result is suspected low-quality content, then perform multi-path parallel retrieval processing on the target content data to obtain the corresponding multi-source retrieval information.
[0029] In this embodiment, the specific implementation process of performing multi-path parallel retrieval processing on the target content data to obtain the corresponding multi-source retrieval information will be further described in detail in subsequent specific embodiments of this application, and will not be elaborated on here.
[0030] Step S205: Based on the preset target big model, perform root cause analysis on the target content data and the multi-source retrieval information to obtain the corresponding root cause analysis results.
[0031] In this embodiment, the specific implementation process of performing root cause analysis on the target content data and the multi-source retrieval information based on the preset target big model to obtain the corresponding root cause analysis results will be further described in detail in subsequent specific embodiments of this application, and will not be elaborated on here.
[0032] Step S206: Generate a target correction method corresponding to the root cause analysis results based on a preset correction strategy engine.
[0033] In this embodiment, the specific implementation process of generating the target correction method corresponding to the root cause analysis result based on the preset correction strategy engine will be further described in detail in subsequent specific embodiments of this application, and will not be elaborated on here.
[0034] Step S207: Perform content correction processing on the target content data based on the target correction method.
[0035] In this embodiment, the content correction process corresponding to the target content data can be automatically executed according to the correction implementation steps of the target correction method corresponding to the root cause analysis results generated by the correction strategy engine.
[0036] This application first acquires the target content data to be processed; then, based on a preset low-quality content cluster center, it performs distance calculation processing on the target content data to obtain corresponding distance data; then, based on the distance data, it generates a quality identification result for the target content data; if the quality identification result is suspected low-quality content, it performs multi-path parallel retrieval processing on the target content data to obtain corresponding multi-source retrieval information; then, based on a preset target big model, it performs root cause analysis processing on the target content data and the multi-source retrieval information to obtain corresponding root cause analysis results; subsequently, based on a preset correction strategy engine, it generates a target correction method corresponding to the root cause analysis results; finally, it performs content correction processing on the target content data based on the target correction method. Based on the above automated processing flow, this application calculates distance data from the target content data to be processed using a low-quality content cluster center. Then, based on the distance data, it accurately generates quality identification results for the target content data. If the quality identification result indicates suspected low-quality content, it performs multi-path parallel retrieval processing on the target content data to obtain multi-source retrieval information. Subsequently, based on a target large model, it performs automated root cause analysis on the target content data and multi-source retrieval information to obtain root cause analysis results. Following this, based on a correction strategy engine, it automatically generates a target correction method corresponding to the root cause analysis results, and then automatically executes content correction processing for the target content data based on the obtained target correction method. Thus, this application, by employing an automated processing flow combining distance-based low-quality content identification, multi-source root cause analysis based on a target large model, and strategy-based automated correction based on a correction strategy engine, can effectively improve the accuracy of content quality identification and achieve automated root cause analysis of low-quality content. Furthermore, it can automatically perform corresponding content correction processing based on the obtained root cause analysis, effectively improving the processing efficiency of content correction.
[0037] In some alternative implementations, step S202 includes the following steps: The target content data is transformed based on a preset retrieval model to obtain corresponding vector data.
[0038] In this embodiment, the selection of the above-mentioned retrieval model is not specifically limited and can be determined according to actual business needs. Preferably, the BGE dense vector retrieval model can be used, as this model has powerful text vector representation capabilities and can map text to a high-dimensional vector space. The model file is obtained from official channels and loaded into the computing environment according to the corresponding model loading specifications to ensure that the model can run normally.
[0039] Furthermore, by using a retrieval model to extract and encode features from the target content data, the target content data is converted into a corresponding dense vector of fixed dimensions, i.e., the corresponding vector data, which contains the semantic information of the target content data.
[0040] Obtain the low-quality content cluster center corresponding to the retrieval model.
[0041] In this embodiment, the generation process of the aforementioned low-quality content cluster center includes the processing of base model transformation and comparative learning training. Specifically, the specific implementation process of base model transformation includes: 1) Data preparation: collecting insurance knowledge content, covering various texts such as insurance product introductions and insurance clause interpretations. These text data come from a wide range of sources, possibly including product description documents from insurance company official websites, frequently asked questions in customer service systems, and user sharing on insurance forums. The collected text data is initially cleaned to remove irrelevant characters, special symbols, duplicate content, etc., to ensure the standardization and consistency of the obtained insurance knowledge text. 2) Model selection and loading: the BGE dense vector retrieval model is selected as the base model (i.e., the retrieval model). 3) Text vectorization transformation: the cleaned insurance knowledge text is input into the BGE model one by one. The model will extract and encode features for each text, transforming it into a dense vector of fixed dimensions. This vector contains the semantic information of the text and can represent the position of the text in the vector space. For example, for a text about the introduction of a critical illness insurance product, after transformation by the BGE model, a vector that can accurately reflect the semantic features of the text will be obtained.
[0042] Base model transformation is a fundamental step in low-quality content identification. Its core purpose is to convert natural language text data into vector form that computers can understand and process. In this way, computers can more easily perform various mathematical operations and similarity comparisons on the text, providing the necessary data foundation for subsequent contrastive learning training and recognition mechanisms. Only by converting text into vectors can concepts such as distance in vector space be used to measure the similarity and differences between texts, thereby achieving effective identification of low-quality content.
[0043] The specific implementation process of contrastive learning training includes: 1) Determining positive and negative samples. Positive sample collection: Screening out content that users clicked and interacted with positively from user behavior data. For example, by analyzing the user rating system for insurance knowledge content, content with high ratings (e.g., 4 points and above) is marked as a positive sample; at the same time, considering the time users spend on the content page, content with a dwell time exceeding a certain threshold (e.g., 3 minutes) is also included in the positive sample category. These behaviors indicate that users have a high level of recognition for the quality of this content and consider it to have high value. Negative sample collection: Collecting content that was retrieved but not clicked or received negative feedback. For example, content displayed in search results but not clicked by users may indicate that this content is not attractive to users; in addition, content that users gave low ratings (e.g., 2 points and below) or explicitly expressed dissatisfaction in the feedback is also used as a negative sample. This content is of relatively low quality and may have problems such as inaccurate information or unclear expression. 2) Model optimization. Loss function selection: Using the InfoNce contrastive loss function to optimize the specified model (a vector space model serving low-quality content identification, which can be simply referred to as the model). The core idea of the InfoNce loss function is to learn the feature representation of data by maximizing the similarity between positive samples and minimizing the similarity between positive and negative samples. In contrastive learning training, it guides the model to distinguish between high-quality and low-quality content, enabling the separation of high-quality and low-quality content clusters in the vector space. The training process involves inputting positive and negative sample data into the model for training. In each training round, the model calculates the similarity between positive and negative samples and calculates the loss value according to the InfoNce loss function. Then, the model parameters are adjusted through backpropagation to gradually reduce the loss value. As training progresses, the model continuously optimizes, causing vectors of high-quality content to cluster together in the vector space to form high-quality content clusters, and vectors of low-quality content to cluster together to form low-quality content clusters, thus learning the separating features between the two.
[0044] Contrastive learning training is a crucial step in low-quality content identification. By identifying positive and negative samples and optimizing the model using a contrastive loss function, the model learns the features that distinguish between high-quality and low-quality content. In practical applications, high-quality and low-quality content often have different characteristics and manifestations. Through contrastive learning, the model can automatically uncover these features and effectively separate them in the vector space. This allows the subsequent identification mechanism to judge the quality of new content based on the distance between the vector and the centers of different content clusters, improving the accuracy and reliability of content quality identification.
[0045] Furthermore, the vector distance-based identification mechanism is a direct application of low-quality content identification. By calculating the distance between the vector of new content and the centers of different content clusters, the quality of new content can be quickly determined. This method is based on a vector space model trained through contrastive learning, using the distance relationships between vectors to measure the similarity between content and high-quality and low-quality content. Compared to traditional manual judgment methods, this vector distance-based identification mechanism is more efficient and objective, quickly filtering out content that may have quality problems, providing a basis for subsequent root cause analysis and processing.
[0046] Get the preset vector distance metric.
[0047] In this embodiment, the selection of the above-mentioned vector distance metric is not specifically limited, and can be determined according to actual business needs. For example, Euclidean distance, cosine similarity, etc. can be used.
[0048] The vector data and the center of the low-quality content cluster are calculated based on the vector distance metric to obtain the corresponding distance calculation result.
[0049] In this embodiment, the vector data and the cluster centers of low-quality content can be calculated according to the selected vector distance metric, and the resulting distance calculation result can be used as the corresponding distance data. For example, when using Euclidean distance, the shorter the distance, the closer the vector is to the cluster center; when using cosine similarity, the closer the similarity is to 1, the closer the vector is to the cluster center.
[0050] The distance calculation result is used as the distance data.
[0051] Based on the above processing flow, this application transforms the target content data into vector data using a retrieval model. Then, it calculates the distance between the vector data and the center of the low-quality content clusters based on the vector distance metric, and uses the calculated distance as the corresponding distance data. This allows for efficient and accurate distance calculation of the target content data, ensuring the accuracy of the obtained distance data. This facilitates subsequent analysis of the generated distance data to automatically identify the quality of the target content data and quickly filter out content that may have quality problems.
[0052] In some optional implementations of this embodiment, step S204 includes the following steps: The target content data is retrieved and processed based on a preset knowledge base to obtain the corresponding first retrieval information.
[0053] In this embodiment, the aforementioned retrieval process refers to insurance knowledge retrieval processing. The corresponding implementation process includes: first, identifying the sources of insurance knowledge to be retrieved, including the latest insurance product documents, technical white papers, and regulatory databases. These sources contain abundant authoritative information in the insurance field, providing accurate foundational data for root cause analysis. Then, based on the topics marked as "suspected low-quality" content, the target content data is retrieved from the corresponding knowledge base. For example, if the suspected low-quality content is about an introduction to a critical illness insurance policy, the detailed terms and coverage information of the critical illness insurance are retrieved from relevant product documents, and the relevant regulatory requirements for critical illness insurance are retrieved from the regulatory database. Through precise topic matching, authoritative information related to the suspected low-quality content (first retrieval information) can be quickly obtained, providing support for subsequent analysis.
[0054] The target content data is retrieved based on preset user feedback logs to obtain the corresponding second retrieval information.
[0055] In this embodiment, the aforementioned retrieval process refers to user feedback retrieval. The specific implementation process includes: first, collecting user feedback logs, which record various evaluations, questions, and complaints from users regarding insurance knowledge content. The logs are then organized and categorized for subsequent retrieval and analysis. Next, from the organized user feedback logs, historical evaluations, questions, and complaints related to similar content corresponding to the target content are retrieved to obtain corresponding second retrieval information. For example, for the aforementioned critical illness insurance introduction content, previous user feedback on the introduction content is retrieved, such as whether the description was unclear or whether there were any questions. By analyzing this historical feedback, common problems can be identified, and users' concerns and dissatisfaction with the content can be understood.
[0056] The target content data is subjected to a change history retrieval process to obtain the corresponding third retrieval information.
[0057] In this embodiment, the above-mentioned change history retrieval process includes: Record management: establishing a change history management system for content items (i.e., target content data), recording information such as the modification time, modified content, and modification personnel for each content item. Significant change judgment: retrieving the historical modification records of the content item to determine if there are any recent significant changes. For example, checking whether the critical illness insurance description has been modified recently, and whether the logic before and after the modification is consistent. If a significant change is found that may lead to logical inconsistencies, it is used as an important clue for root cause analysis.
[0058] External knowledge retrieval processing is performed on the target content data to obtain the corresponding fourth retrieval information.
[0059] In this embodiment, the aforementioned external knowledge retrieval process includes: API access: Accessing authoritative external databases or real-time news sources, such as medical databases and industry standard databases, via API interfaces. These external resources can provide the latest and most authoritative information, helping to verify the timeliness of facts. Fact verification: For relevant medical data, industry standards, and other information cited in suspected low-quality content (target content data), verification is performed through the accessed external resources. For example, checking whether the relevant medical data cited in the critical illness insurance introduction has changed over time, and whether industry standards have been updated. If outdated facts are found, this is considered an important factor in root cause analysis.
[0060] The first search information, the second search information, the third search information, and the fourth search information are integrated to obtain the corresponding integrated search information.
[0061] In this embodiment, the first, second, third, and fourth search information obtained from the retrieval can be integrated and processed, and the resulting integrated search information can be used as the corresponding multi-source search information.
[0062] The integrated search information is used as the multi-source search information.
[0063] Based on the above processing flow, this application, by retrieving multi-source information from different channels, can comprehensively and deeply understand the relevant circumstances of suspected low-quality content. Insurance knowledge retrieval provides authoritative knowledge in the insurance field, helping to determine the accuracy of the content; user feedback retrieval reflects users' actual feelings and needs regarding the content, helping to identify unclear expressions or issues that do not meet user expectations; change history retrieval can track the content's modification process and investigate logical problems caused by changes; external knowledge retrieval ensures the timeliness and accuracy of factual information cited in the content. Integrating this multi-source information provides rich material for subsequent structured reasoning, thereby improving the accuracy and comprehensiveness of root cause analysis.
[0064] In some alternative implementations, step S205 includes the following steps: Call the preset initial large model.
[0065] In this embodiment, the selection of the initial large model is not specifically limited and can be determined according to actual business needs. Preferably, the qwen3-235b large model can be used, which has powerful language understanding and generation capabilities and can handle complex insurance domain knowledge.
[0066] The initial large model is fine-tuned and optimized based on preset professional knowledge data to obtain the corresponding target large model.
[0067] In this embodiment, the specific implementation process of fine-tuning and optimizing the initial large model based on preset professional knowledge data to obtain the corresponding target large model will be further described in detail in subsequent specific embodiments of this application, and will not be elaborated on here.
[0068] Based on the target content data and the multi-source retrieval information, corresponding target prompt words are constructed.
[0069] In this embodiment, the specific implementation process of constructing corresponding target prompts based on the target content data and the multi-source retrieval information will be further described in detail in subsequent specific embodiments of this application, and will not be elaborated on here.
[0070] Based on the target big model, the target prompt words are reasoned and processed to generate corresponding reasoning results.
[0071] In this embodiment, after obtaining the optimized target model, it can be applied to actual reasoning tasks. Target prompts, constructed from retrieved multi-source search information and target content data to be processed, can be input into the target model. Then, based on its learned insurance domain knowledge and optimized reasoning capabilities, the target model performs reasoning processing according to a predetermined reasoning flow, and uses the generated reasoning results as the corresponding root cause analysis results.
[0072] Specifically, the implementation process of reasoning about target prompts based on the aforementioned target big model includes: 1. Problem Restatement and Localization. Core Viewpoint Extraction: The target big data model first understands the core viewpoints of the content to be diagnosed (i.e., the target content data) to clarify the scope of the problem. For example, for critical illness insurance information, determine whether the core viewpoints are about coverage, claim conditions, or other aspects. Through natural language processing technology, the target big data model can extract key information from the text and summarize the main content. Problem Scope Definition: Based on the core viewpoints, further define the scope of the problem. For example, if the core viewpoint is about coverage, the problem scope might include the types of covered diseases, coverage amount, coverage period, etc. Clearly defining the problem scope helps to make subsequent evidence comparison and root cause analysis more accurate.
[0073] 2. Evidence Comparison. Content vs. Regulations: Compare the relevant clauses and regulations in the content to be diagnosed with the provided legal documents to identify inconsistencies in facts, logic, or expression. For example, compare the coverage scope in the critical illness insurance description with legal requirements to see if there are any situations exceeding the scope of regulations or failing to meet legal requirements. Content vs. User Feedback: Compare the questions and dissatisfaction mentioned in user feedback with the content description to find unclear expressions. For example, if user feedback mentions difficulty understanding a certain claim condition, and the content description of that condition is not clear enough, then there is a problem of unclear expression. Content vs. Previous Logical Comparison: For content with a history of changes, compare the content before and after the modification logically to determine if there are any logical contradictions. For example, if the modified content conflicts with the previous content in terms of coverage conditions, resulting in logical inconsistencies, then there is a problem of logical contradiction.
[0074] 3. Root Cause Hypothesis Generation. Possible Cause Analysis: Based on the comparison results, analyze the possible root causes leading to these inconsistencies. Causes may include outdated facts (e.g., cited medical data has been updated), obscure wording (e.g., too many unexplained technical terms), logical contradictions (e.g., conflicting coverage conditions), scenario mismatch (e.g., the content does not match the actual insurance scenario), and incomplete information (e.g., missing important claims conditions). Hypothesis Formulation: Based on the possible cause analysis, propose specific root cause hypotheses. For example, if it is found that a certain medical indicator cited in the content has been updated in an authoritative external database, and the update time is recent, then the root cause hypothesis "outdated facts affect the accuracy of the content" can be proposed.
[0075] 4. Hypothesis Testing and Scoring. Confidence Assessment: Assign a confidence score (0-1) to each root cause hypothesis. The scoring criteria can be formulated based on factors such as the sufficiency and reliability of the evidence, and its relevance to the problem. For example, if a root cause hypothesis is supported by multiple authoritative pieces of evidence and is highly relevant to the problem, a higher confidence score is given; conversely, if the evidence is insufficient or not strongly relevant to the problem, a lower score is given. Reasoning: Citify specific evidence to explain the reasons for assigning the corresponding confidence score to each root cause hypothesis. For example, for the hypothesis that "the facts are outdated," if it is found that a certain medical indicator cited in the content has been updated in an authoritative external database, and the update time is recent, then the explanation can describe the update record of the external database in detail, including the update time, updated content, etc., to support the confidence score of this hypothesis.
[0076] The reasoning result is used as the root cause analysis result.
[0077] Based on the above automated processing flow, this application fine-tunes and optimizes the initial large model using professional knowledge data to obtain the corresponding target large model. Then, target prompts are constructed based on target content data and multi-source retrieval information. Furthermore, the target prompts are used to perform inference processing based on the target large model, and the generated inference results are used as the corresponding root cause analysis results. Thus, inference processing based on the target large model can improve the logic and accuracy of root cause analysis and provide a scientific basis for subsequent correction strategies.
[0078] In some optional implementations, the fine-tuning and optimization of the initial large model based on preset professional knowledge data to obtain the corresponding target large model includes the following steps: Acquire pre-collected professional knowledge data.
[0079] In this embodiment, the aforementioned professional knowledge data refers to data such as specific terms and logical relationships collected from insurance regulations, product terms, and user feedback.
[0080] Based on a preset target fine-tuning strategy, the initial large model is fine-tuned using the professional knowledge data to obtain the corresponding first large model.
[0081] In this embodiment, the aforementioned target fine-tuning strategy specifically employs the LoRA (Low-Rank Adaptation) technique to fine-tune the initial large model. The core idea of LoRA is to adapt to specific domain knowledge and tasks by introducing low-rank matrices without changing most of the model's parameters. Specifically, low-rank matrices are inserted into certain key layers of the large model. These low-rank matrices have relatively few parameters but can significantly influence the output of the large model.
[0082] The collected professional knowledge data is used as training samples for fine-tuning. By adjusting the parameters of the low-rank matrix, the initial large model is made to better understand knowledge in the insurance field, such as the provisions on claims processing time limits in insurance regulations and special agreements in product terms. This allows for fine-tuning of the initial large model and the resulting fine-tuned first large model.
[0083] Furthermore, LoRA fine-tuning aims to improve the adaptability and accuracy of the initial large model in the insurance domain. Since general-purpose large models may lack a deep understanding of insurance-specific expertise, LoRA fine-tuning allows for the infusion of insurance-specific knowledge without significantly altering the model structure, enabling the initial large model to better handle insurance-related tasks. This approach retains the powerful capabilities of the initial large model while optimizing for a specific domain, improving the performance of the fine-tuned first large model in identifying low-quality content and performing root cause analysis in the insurance field.
[0084] Obtain the preset target reward function and invoke the target reinforcement learning algorithm.
[0085] In this embodiment, the definition of the above-mentioned target reward function includes: 1) Format reward: A high reward is given for the correctness of the output results conforming to the JSON format, and a penalty is given for format errors. For example, if the root cause analysis results output by the model can be accurately organized according to the JSON format, including necessary fields such as problem type, root cause description, confidence level, and cited evidence, a high format reward is given; conversely, if there are format errors, such as missing fields or disordered format, a corresponding penalty is given. 2) Content reward: A reward is given for the accuracy and depth of the root cause analysis, and the sufficiency of the cited evidence is evaluated. For example, if the root cause hypothesis proposed by the model is accurate and reasonable and can provide sufficient evidence support, a high content reward is given; if the root cause analysis is inaccurate or the cited evidence is insufficient, a lower reward is given.
[0086] The aforementioned reinforcement learning algorithm specifically employs the GRPO algorithm. By introducing the GRPO algorithm into the output stage of the inference chain, the large model, fine-tuned by LoRA, is trained. The GRPO algorithm is a reinforcement learning algorithm that continuously optimizes the model's policy through interaction with the environment, thereby improving model performance.
[0087] Based on the target reward function, the first large model is trained and optimized using the target reinforcement learning algorithm to obtain the corresponding second large model.
[0088] In this embodiment, the first large model, fine-tuned by LoRA, is trained and optimized using a target reinforcement learning algorithm (GRPO algorithm) based on a defined target reward function. During training, the first large model continuously tries different output strategies, adjusting its parameters based on feedback from the target reward function, so that the model's output gradually meets the requirements. Finally, the trained second large model can output a structured JSON object containing detailed information such as problem type, root cause description, confidence level, and cited evidence.
[0089] The second major model is taken as the target major model.
[0090] Based on the above processing flow, this application employs a target-based fine-tuning strategy. It uses collected professional knowledge data to fine-tune the initial large model to obtain a corresponding first large model. Then, it trains and optimizes the first large model using a combination of a target reward function and a target reinforcement learning algorithm. The optimized second large model is then used as the final target large model. This allows for efficient and accurate fine-tuning and optimization of the initial large model, improving the performance of the resulting target large model. Specifically, LoRA fine-tuning aims to enhance the adaptability and accuracy of the initial large model in the insurance domain. LoRA fine-tuning allows for the injection of insurance-specific knowledge without significantly altering the model structure, enabling the initial large model to better handle insurance-related tasks. This method retains the powerful capabilities of the initial large model while optimizing for a specific domain, improving the application performance of the fine-tuned first large model in insurance low-quality content identification and root cause analysis. Furthermore, reinforcement learning, by defining a reward function for training and optimizing the first large model, ensures that the model output conforms to structured format requirements and improves the accuracy and depth of root cause analysis. Formatting rewards ensure the standardization and consistency of output results, facilitating subsequent processing and use; content rewards, on the other hand, guide the model to conduct more accurate and in-depth root cause analysis, uncover the true core factors behind the problem, and provide more valuable and persuasive analytical conclusions, thereby improving the practicality and reliability of the target large model in real-world application scenarios and better meeting business needs and user expectations.
[0091] In some optional implementations of this embodiment, constructing corresponding target prompts based on the target content data and the multi-source retrieval information includes the following steps: The target content data and the multi-source retrieval information are integrated to obtain the corresponding integrated information.
[0092] In this embodiment, the above-mentioned information integration processing includes: integrating the retrieved multi-source search information (including insurance knowledge, user feedback, change history, external knowledge, etc.) with the target content data to be processed, removing duplicate information, and classifying and summarizing the relevant information to ensure the integrity and accuracy of the information.
[0093] Obtain the preset prompt word construction strategy.
[0094] In this embodiment, the strategy for constructing the aforementioned prompt word includes: constructing a highly structured prompt word based on the integrated information. The prompt word contains the core points of the content to be diagnosed, relevant evidence retrieved, and questions that need to be answered by the large model. For example, the prompt word could be designed as: "The following is a suspected low-quality content about a critical illness insurance product [content to be diagnosed], which also provides relevant insurance product terms [insurance knowledge evidence], user feedback [user feedback evidence], change history [change history evidence], and external medical data [external knowledge evidence]. Please analyze what problems exist in this content and provide possible root causes."
[0095] Based on the aforementioned prompt word construction strategy, the integrated information is processed to obtain the corresponding generated data.
[0096] In this embodiment, the integrated information can be processed by building data based on the strategy content of the above-mentioned prompt word construction strategy, and the generated data obtained by construction can be used as the corresponding target prompt word.
[0097] The generated data is used as the target prompt word.
[0098] Based on the above processing flow, this application integrates target content data with multi-source retrieval information to obtain integrated information. Then, using a prompt word construction strategy, the integrated information is processed to construct data, and the generated data is used as corresponding target prompt words. This allows for the efficient and accurate construction of target prompt words that can guide the target large-scale model in chain-like reasoning, improving the generation efficiency and intelligence of target prompt words. Furthermore, the use of target prompt words helps the target large-scale model systematically and accurately analyze the problems and root causes of suspected low-quality content.
[0099] In some optional implementations of this embodiment, step S206 includes the following steps: The root cause analysis results are evaluated and processed based on the correction strategy engine to identify the corresponding problem types.
[0100] In this embodiment, the correction strategy engine makes decisions based on the root cause analysis results (type, confidence level) and preset business rules, automatically determining how to process the knowledge base content after the root cause analysis is completed. This process is mainly divided into two branches: fully automated correction and human-machine collaborative correction. Different processing strategies are adopted for simple and complex problems respectively, aiming to efficiently and accurately update the knowledge base content and improve its quality and usability.
[0101] The root cause analysis results are evaluated using a correction strategy engine (referred to as the engine) to determine whether a problem is simple or complex. Specifically, simple problems typically have clear characteristics and fixed handling methods, such as outdated facts (e.g., outdated regulatory references, updated data metrics), typos, etc. The engine also checks the confidence level in the root cause analysis results. Only when the confidence level reaches a preset high threshold (e.g., 90% or higher) is the problem considered a high-confidence simple problem, and it then enters the fully automated correction process. For example, if the root cause analysis determines that a certain insurance regulation reference is outdated and has a confidence level of 95%, it meets the criteria for a high-confidence simple problem.
[0102] In addition, complex problems typically have the following characteristics: they require subjective judgment from professionals, and there is no single solution. For example, the wording may be obscure (requiring professionals to determine how to optimize the wording to make it easier to understand based on the target audience's knowledge level and comprehension ability), and logical restructuring may be necessary (requiring a re-examination of the content's logic to ensure its coherence and rationality). Similar to fully automated correction, the engine also examines the confidence level in the root cause analysis results. For complex problems, even if the confidence level is not particularly high (e.g., between 60% and 80%), due to the complexity and subjectivity of the processing, they will still enter a human-machine collaborative correction process.
[0103] Determine whether the problem type is a simple problem type.
[0104] In this embodiment, the problem type can be determined to be a simple problem type or a complex problem type by performing content analysis on the above-mentioned problem types.
[0105] If the problem type is a simple problem type, then obtain the first correction processing method corresponding to the simple problem type, and use the first correction processing method as the target correction method.
[0106] In this embodiment, the first correction method described above is a fully automatic correction method corresponding to the simple question type. Specifically, it includes: once a simple question is determined to have high confidence, the system automatically approves the correction suggestions given in the root cause analysis. For outdated facts, the correction suggestion is usually to update to the latest version; for typos, the correction suggestion is to replace the typos with the correct words. Then, the system directly updates the corresponding content (i.e., the target content data) in the knowledge base according to the correction suggestions. For example, updating outdated legal references in the knowledge base to the latest version includes modifying information such as the legal name, clause content, and publication date.
[0107] In addition, after updating the knowledge base content, the system will record detailed information about the correction. This includes the correction time (accurate to the specific date and time), the corrected content (clearly indicating which parts were updated, such as the specific clauses cited in regulations), the content before correction (retaining the original content for future reference and comparison), and the content after correction. Furthermore, this recorded information is stored in a dedicated database for subsequent effectiveness evaluation. For example, by analyzing correction records over a period of time, we can understand common question types and update frequencies in the knowledge base, providing data support for further optimizing knowledge base management and correction strategies.
[0108] Furthermore, the automated correction process automates the handling of simple issues with high confidence levels. This mechanism's advantage lies in its ability to quickly and accurately resolve common problems in the knowledge base without manual intervention, significantly improving the efficiency of knowledge base updates. By setting a high confidence threshold, the accuracy and reliability of corrections are ensured, preventing errors in knowledge base content due to misjudgments. Simultaneously, detailed correction behavior records provide rich data for subsequent performance evaluation, contributing to the continuous optimization of knowledge base quality and management strategies.
[0109] If the problem type is a complex problem type, then obtain the second correction processing method corresponding to the complex problem type, and use the second correction processing method as the target correction method.
[0110] In this embodiment, the second correction method is a human-machine collaborative correction method corresponding to complex problem types, specifically including: 1) Information push to the review interface. When a complex problem is identified, the system will push the original content, the root cause analysis report, and the draft correction suggestions generated by the initial large model to the rich text review interface. The original content is the original text in the knowledge base that needs to be corrected. The root cause analysis report details the cause, type, and confidence level of the problem. The draft correction suggestions generated by the initial large model are preliminary correction schemes based on the root cause analysis results. For example, for obscure critical illness insurance descriptions, the original content may contain a large number of professional terms without corresponding explanations; the root cause analysis report will point out that the problem lies in the obscure expression, which may affect user understanding; the draft correction suggestions generated by the initial large model may provide simple and easy-to-understand explanations of certain professional terms, such as adding "(critical illness insurance, providing the insured with protection against critical illnesses)" after "critical illness insurance".
[0111] 2) Administrator Review and Operation. After logging into the rich text review interface, administrators can view the original content, root cause analysis report, and draft revision suggestions. Based on their professional knowledge and experience, administrators can quickly assess the rationality and feasibility of the draft revision suggestions. Administrators have three options: adopt, edit, or reject the suggestion. If the draft revision suggestion is deemed fully compliant, it can be adopted directly, and the system will update the revised content in the knowledge base. If the draft suggestion needs further improvement, the administrator can choose to edit it, modify and optimize the content, and then approve the revision. If the draft suggestion is deemed unreasonable, the administrator can choose to reject it and provide reasons for rejection and their own revision ideas. For example, when reviewing a draft revision suggestion for a critical illness insurance description that is obscure, if the administrator feels that the explanation of professional terminology is not detailed enough, they can edit it to add more detailed explanations, such as explaining "malignant tumor" as "(malignant tumor, i.e., cancer, is a disease formed by abnormal cell proliferation, with invasiveness and metastasis)," and then approve the revision.
[0112] After obtaining the root cause analysis results corresponding to the target content data, a further prompt is designed to generate specific and actionable draft remediation recommendations using the initial large model. For example: Input: {Root Cause: "Fact is outdated", Evidence: "Content cites Regulation A v1.0, but the latest version is v2.0"} Output: Remediation Recommendation: "Update all instances of 'Regulation A v1.0' in the text to 'Regulation A v2.0' and notify the user that this is a significant change."
[0113] Furthermore, the human-machine collaborative correction process fully leverages the advantages of artificial intelligence and human expertise to address complex issues. The initial draft correction suggestions generated by the large model provide administrators with preliminary reference directions, saving them time and effort from starting from scratch. Administrators then review and optimize the draft correction suggestions using their own expertise and experience, ensuring the accuracy and rationality of the corrected results. This human-machine collaborative approach improves the efficiency of manual review while guaranteeing the quality of the knowledge base content, enabling it to better handle complex and ever-changing knowledge base correction needs.
[0114] Based on the above processing flow, this application evaluates the root cause analysis results using a correction strategy engine to identify the corresponding problem types. Then, based on the obtained problem types, it selects the corresponding correction methods and uses them as the corresponding target correction methods. This enables the use of the correction strategy engine to automatically and intelligently generate target correction methods corresponding to the root cause analysis results, ensuring the intelligence and accuracy of the target correction method generation.
[0115] In some alternative implementations, the user information obtained is subject to user consent and complies with relevant laws and policies.
[0116] Furthermore, any software tools or components not belonging to our company that appear in the embodiments of this application are merely illustrative examples and do not represent actual use.
[0117] Furthermore, the main innovative points of this application are as follows: 1. Closed-Loop Autonomous System Architecture: A fully automated closed-loop management system was created, integrating "low-quality content identification → multi-dimensional root cause analysis → intelligent correction suggestion generation → strategic automatic / collaborative correction". This achieves end-to-end automation of content quality management, significantly reducing manual intervention. The system can continuously optimize its performance using corrected feedback data, possessing the ability to "self-evolve" and fundamentally solving the problem of stagnant performance in existing systems.
[0118] 2. Multi-path parallel retrieval and information fusion mechanism: For identified low-quality content, a synchronous retrieval process is initiated, acquiring multi-source evidence from four dimensions: insurance knowledge base, user feedback logs, content change history, and external authoritative databases. This provides a comprehensive information foundation for in-depth analysis. It breaks the limitations of a single information source, ensuring that root cause analysis covers various possibilities such as outdated facts, user misunderstanding, editing errors, and changes in the external environment, providing solid data support for accurate diagnosis.
[0119] 3. A five-step chain-like reasoning prompt structure was designed to guide the large-scale model through a standardized and transparent thinking process of "problem localization → evidence comparison → root cause hypothesis → verification scoring." Combined with LoRA fine-tuning technology, insurance vertical domain knowledge was injected into the large-scale model, improving its accuracy in specific domain analyses. The GRPO reinforcement learning algorithm was introduced, using "format reward" and "content reward" functions to train the large-scale model to output strictly compliant, structured analysis results. The general capabilities of the large-scale model were transformed into a reliable, professional diagnostic tool, ensuring the traceability of the analysis process and the credibility of the results. The output is in a standardized JSON format, providing directly parsable input for subsequent automated corrections.
[0120] 4. A correction strategy engine was designed. Based on the "problem type" and "confidence score" output by the root cause analysis module, it dynamically decides the processing method: fully automatic correction is performed for simple problems with high confidence; for low-confidence or complex problems, human-machine collaborative correction is initiated, providing the administrator with a rich text interface containing the original content, root cause report, and draft correction suggestions. This achieves intelligent workflow routing, maximizing processing efficiency while ensuring security and quality, and realizing optimal resource allocation of "fully automatic for simple problems, and efficient human-machine collaboration for complex problems."
[0121] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
[0122] It should be emphasized that, to further ensure the privacy and security of the root cause analysis results, these results can also be stored in a blockchain node.
[0123] The blockchain referred to in this application is a novel application model of computer technologies such as distributed data storage, peer-to-peer transmission, consensus mechanisms, and encryption algorithms. Essentially, a blockchain is a decentralized database, a chain of data blocks linked together using cryptographic methods. Each data block contains information about a batch of network transactions, used to verify the validity of the information (anti-counterfeiting) and generate the next block. A blockchain can include an underlying blockchain platform, a platform product service layer, and an application service layer.
[0124] The embodiments of this application can acquire and process relevant data based on artificial intelligence technology. Artificial intelligence (AI) refers to the theories, methods, technologies, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to obtain optimal results. Foundational technologies for artificial intelligence generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies mainly encompass computer vision, robotics, biometrics, speech processing, natural language processing, and machine learning / deep learning.
[0125] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing related hardware through computer-readable instructions. These computer-readable instructions can be stored in a computer-readable storage medium. When the program is executed, it can include the processes of the embodiments of the above methods. The aforementioned storage medium can be a non-volatile storage medium such as a magnetic disk, optical disk, or read-only memory (ROM), or random access memory (RAM).
[0126] It should be understood that although the steps in the flowcharts of the accompanying figures are shown sequentially as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the accompanying figures may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times, and their execution order is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the sub-steps or stages of other steps.
[0127] Further reference Figure 3 As a response to the above Figure 2 To implement the method shown, this application provides an embodiment of an artificial intelligence-based data analysis device, which is similar to... Figure 2 Corresponding to the method embodiments shown, this device can be specifically applied to various electronic devices.
[0128] like Figure 3 As shown, the artificial intelligence-based data analysis device 300 described in this embodiment includes: an acquisition module 301, a calculation module 302, a first generation module 303, a retrieval module 304, an analysis module 305, a second generation module 306, and a processing module 307. Wherein: The acquisition module 301 is used to acquire the target content data to be processed; The calculation module 302 is used to perform distance calculation processing on the target content data based on the preset low-quality content cluster center to obtain the corresponding distance data; The first generation module 303 is used to generate a quality identification result of the target content data based on the distance data; The retrieval module 304 is used to perform multi-path parallel retrieval processing on the target content data to obtain corresponding multi-source retrieval information if the quality identification result is suspected low-quality content. Analysis module 305 is used to perform root cause analysis on the target content data and the multi-source retrieval information based on a preset target big model, and obtain the corresponding root cause analysis results; The second generation module 306 is used to generate a target correction method corresponding to the root cause analysis results based on a preset correction strategy engine. The processing module 307 is used to perform content correction processing on the target content data based on the target correction method.
[0129] In this embodiment, the operations performed by the above modules or units correspond one-to-one with the steps of the artificial intelligence-based data analysis method in the aforementioned implementation method, and will not be repeated here.
[0130] In some optional implementations of this embodiment, the calculation module 302 includes: The conversion submodule is used to convert the target content data based on a preset retrieval model to obtain the corresponding vector data. The first acquisition submodule is used to acquire the low-quality content cluster center corresponding to the retrieval model; The second acquisition submodule is used to acquire the preset vector distance metric. The calculation submodule is used to perform calculations on the vector data and the center of the low-quality content cluster based on the vector distance metric to obtain the corresponding distance calculation results. The first determining submodule is used to use the distance calculation result as the distance data.
[0131] In some optional implementations of this embodiment, the retrieval module 304 includes: The first retrieval submodule is used to retrieve and process the target content data based on a preset knowledge base to obtain the corresponding first retrieval information; The second retrieval submodule is used to retrieve and process the target content data based on a preset user feedback log to obtain the corresponding second retrieval information; The third retrieval submodule is used to perform change history retrieval processing on the target content data to obtain the corresponding third retrieval information. The fourth retrieval submodule is used to perform external knowledge retrieval processing on the target content data to obtain the corresponding fourth retrieval information; The integration submodule is used to integrate the first search information, the second search information, the third search information, and the fourth search information to obtain the corresponding integrated search information. The second determining submodule is used to use the integrated search information as the multi-source search information.
[0132] In some optional implementations of this embodiment, the analysis module 305 includes: Call the submodule to invoke the preset initial large model; The first processing submodule is used to fine-tune and optimize the initial large model based on preset professional knowledge data to obtain the corresponding target large model; A submodule is constructed to build corresponding target prompt words based on the target content data and the multi-source retrieval information; The reasoning submodule is used to perform reasoning processing on the target prompt words based on the target big model and generate corresponding reasoning results; The third determining submodule is used to use the reasoning result as the root cause analysis result.
[0133] In some optional implementations of this embodiment, the processing submodule includes: The first acquisition unit is used to acquire pre-collected professional knowledge data; The fine-tuning unit is used to fine-tune the initial large model based on a preset target fine-tuning strategy, using the professional knowledge data to obtain the corresponding first large model. The first processing unit is used to obtain the preset target reward function and to call the target reinforcement learning algorithm; The second processing unit is used to train and optimize the first large model based on the target reward function and the target reinforcement learning algorithm to obtain the corresponding second large model. The first determining unit is used to select the second large model as the target large model.
[0134] In some optional implementations of this embodiment, the construction submodule includes: An integration unit is used to perform information integration processing on the target content data and the multi-source retrieval information to obtain corresponding integrated information; The second acquisition unit is used to acquire the preset prompt word construction strategy; A construction unit is used to perform data construction processing on the integrated information based on the prompt word construction strategy to obtain corresponding generated data; The second determining unit is used to use the generated data as the target prompt word.
[0135] In some optional implementations of this embodiment, the second generation module 306 includes: The evaluation submodule is used to evaluate the root cause analysis results based on the correction strategy engine in order to identify the corresponding problem type. The judgment submodule is used to determine whether the question type is a simple question type; The second processing submodule is used to obtain a first correction processing method corresponding to the simple problem type if the problem type is a simple problem type, and use the first correction processing method as the target correction method; The third processing submodule is used to obtain a second correction processing method corresponding to the complex problem type if the problem type is a complex problem type, and to use the second correction processing method as the target correction method.
[0136] To address the aforementioned technical problems, embodiments of this application also provide a computer device. Please refer to [link / reference needed]. Figure 4 , Figure 4 This is a basic structural block diagram of the computer device in this embodiment.
[0137] The computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are interconnected via a system bus. It should be noted that only the computer device 4 with components 41-43 is shown in the figure; however, it should be understood that it is not required to implement all the shown components, and more or fewer components can be implemented alternatively. Those skilled in the art will understand that the computer device described here is a device capable of automatically performing numerical calculations and / or information processing according to pre-set or stored instructions, and its hardware includes, but is not limited to, microprocessors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), digital signal processors (DSPs), embedded devices, etc.
[0138] The computer device can be a desktop computer, laptop, handheld computer, or cloud server, etc. The computer device can interact with the user via a keyboard, mouse, remote control, touchpad, or voice control.
[0139] The memory 41 includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (e.g., SD or DX memory), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as the hard disk or memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc., equipped on the computer device 4. Of course, the memory 41 may also include both the internal storage unit and its external storage device of the computer device 4. In this embodiment, the memory 41 is typically used to store the operating system and various application software installed on the computer device 4, such as computer-readable instructions for data analysis methods based on artificial intelligence. In addition, the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
[0140] In some embodiments, the processor 42 may be a central processing unit (CPU), a controller, a microcontroller, a microprocessor, or other data processing chip. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is used to execute computer-readable instructions stored in the memory 41 or to process data, for example, to execute computer-readable instructions of the artificial intelligence-based data analysis method.
[0141] The network interface 43 may include a wireless network interface or a wired network interface, which is typically used to establish communication connections between the computer device 4 and other electronic devices.
[0142] This application also provides another embodiment, namely, providing a computer-readable storage medium storing computer-readable instructions that can be executed by at least one processor to cause the at least one processor to perform the steps of the artificial intelligence-based data analysis method described above.
[0143] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in the various embodiments of this application.
[0144] Obviously, the embodiments described above are only some embodiments of this application, not all embodiments. The accompanying drawings show preferred embodiments of this application, but do not limit the patent scope of this application. This application can be implemented in many different forms; rather, the purpose of providing these embodiments is to provide a more thorough and comprehensive understanding of the disclosure of this application. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or make equivalent substitutions for some of the technical features. Any equivalent structures made using the content of this application's specification and drawings, directly or indirectly applied to other related technical fields, are similarly within the scope of patent protection of this application.
Claims
1. A data analysis method based on artificial intelligence, characterized in that, Includes the following steps: Obtain the target content data to be processed; The target content data is processed by distance calculation based on the preset low-quality content cluster center to obtain the corresponding distance data; Based on the distance data, a quality identification result of the target content data is generated; If the quality identification result is suspected low-quality content, then the target content data is subjected to multi-path parallel retrieval processing to obtain the corresponding multi-source retrieval information; Based on a preset target big model, root cause analysis is performed on the target content data and the multi-source retrieval information to obtain the corresponding root cause analysis results. The engine generates a target correction method corresponding to the root cause analysis results based on a preset correction strategy. Based on the target correction method, content correction processing is performed on the target content data.
2. The data analysis method based on artificial intelligence according to claim 1, characterized in that, The step of performing distance calculation on the target content data based on a preset low-quality content cluster center to obtain the corresponding distance data specifically includes: The target content data is transformed based on a preset retrieval model to obtain corresponding vector data; Obtain the cluster center of low-quality content corresponding to the retrieval model; Obtain the preset vector distance metric; Based on the vector distance metric, the vector data and the center of the low-quality content cluster are calculated to obtain the corresponding distance calculation result. The distance calculation result is used as the distance data.
3. The data analysis method based on artificial intelligence according to claim 1, characterized in that, The step of performing multi-path parallel retrieval processing on the target content data to obtain corresponding multi-source retrieval information specifically includes: Based on a preset knowledge base, the target content data is retrieved and processed to obtain the corresponding first retrieval information; Based on the preset user feedback logs, the target content data is retrieved and processed to obtain the corresponding second retrieval information; The target content data is subjected to a change history retrieval process to obtain the corresponding third retrieval information; External knowledge retrieval processing is performed on the target content data to obtain the corresponding fourth retrieval information; The first search information, the second search information, the third search information, and the fourth search information are integrated to obtain the corresponding integrated search information. The integrated search information is used as the multi-source search information.
4. The data analysis method based on artificial intelligence according to claim 1, characterized in that, The step of performing root cause analysis on the target content data and the multi-source retrieval information based on a preset target big model to obtain the corresponding root cause analysis results specifically includes: Call the preset initial large model; The initial large model is fine-tuned and optimized based on preset professional knowledge data to obtain the corresponding target large model; Based on the target content data and the multi-source retrieval information, corresponding target prompt words are constructed; Based on the target big model, the target prompt words are reasoned and processed to generate corresponding reasoning results; The reasoning result is used as the root cause analysis result.
5. The data analysis method based on artificial intelligence according to claim 4, characterized in that, The step of fine-tuning and optimizing the initial large model based on preset professional knowledge data to obtain the corresponding target large model specifically includes: Acquire pre-collected professional knowledge data; Based on the preset target fine-tuning strategy, the initial large model is fine-tuned using the professional knowledge data to obtain the corresponding first large model; Obtain the preset target reward function and invoke the target reinforcement learning algorithm; Based on the target reward function, the first large model is trained and optimized using the target reinforcement learning algorithm to obtain the corresponding second large model; The second major model is taken as the target major model.
6. The data analysis method based on artificial intelligence according to claim 4, characterized in that, The step of constructing corresponding target prompt words based on the target content data and the multi-source retrieval information specifically includes: The target content data and the multi-source retrieval information are integrated to obtain corresponding integrated information. Obtain the preset prompt word construction strategy; Based on the prompt word construction strategy, the integrated information is processed to obtain the corresponding generated data; The generated data is used as the target prompt word.
7. The data analysis method based on artificial intelligence according to claim 1, characterized in that, The step of generating a target correction method corresponding to the root cause analysis results based on a preset correction strategy engine specifically includes: The root cause analysis results are evaluated and processed based on the correction strategy engine to identify the corresponding problem types; Determine whether the problem type is a simple problem type; If the problem type is a simple problem type, then obtain the first correction processing method corresponding to the simple problem type, and use the first correction processing method as the target correction method; If the problem type is a complex problem type, then obtain the second correction processing method corresponding to the complex problem type, and use the second correction processing method as the target correction method.
8. A data analysis device based on artificial intelligence, characterized in that, include: The acquisition module is used to acquire the target content data to be processed. The calculation module is used to perform distance calculation processing on the target content data based on the preset low-quality content cluster center to obtain the corresponding distance data; The first generation module is used to generate a quality identification result of the target content data based on the distance data; The retrieval module is used to perform multi-path parallel retrieval processing on the target content data to obtain corresponding multi-source retrieval information if the quality identification result is suspected low-quality content; The analysis module is used to perform root cause analysis on the target content data and the multi-source retrieval information based on a preset target big model, and obtain the corresponding root cause analysis results. The second generation module is used to generate a target correction method corresponding to the root cause analysis results based on a preset correction strategy engine. The processing module is used to perform content correction processing on the target content data based on the target correction method.
9. A computer device, characterized in that, It includes a memory and a processor, wherein the memory stores computer-readable instructions, and the processor executes the computer-readable instructions to implement the steps of the data analysis method based on artificial intelligence as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-readable instructions, which, when executed by a processor, implement the steps of the data analysis method based on artificial intelligence as described in any one of claims 1 to 7.