A method and apparatus for retrieving RAG knowledge based on a wind farm knowledge base
By using the RAG knowledge retrieval method based on the wind farm knowledge base, combined with intent parsing, vector database and graph database retrieval, a structured knowledge report is generated, which solves the problems of semantic understanding and technical terminology processing in knowledge management in wind power scenarios, and improves operation and maintenance efficiency and the accuracy of knowledge retrieval.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 东方电气风电股份有限公司
- Filing Date
- 2026-03-19
- Publication Date
- 2026-06-19
AI Technical Summary
Traditional knowledge management methods suffer from weak semantic understanding, lack of contextual association, and poor dynamic adaptability in wind power scenarios, making it difficult to meet the needs for real-time and accurate knowledge services. Existing RAG systems also face challenges such as heterogeneous knowledge sources and difficulties in handling technical terms in wind power scenarios.
The RAG knowledge retrieval method based on wind farm knowledge base is adopted. Through intent parsing, vector database retrieval and graph database retrieval, combined with the pre-set knowledge generation model in the wind power field, a structured knowledge report is generated.
It has achieved a closed-loop transformation from natural language queries to accurate and professional answers in the wind power field, improved the accuracy, professionalism and practicality of wind farm operation and maintenance knowledge retrieval, reduced the knowledge search cost for operation and maintenance personnel, and provided standardized and implementable knowledge support for fault diagnosis and operation and maintenance decision-making.
Smart Images

Figure CN122240674A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of wind power generation knowledge management technology, and more specifically, to a RAG knowledge retrieval method and device based on a wind farm knowledge base. Background Technology
[0002] With the global energy transition and the advancement of "dual carbon" goals, the installed capacity of wind power continues to expand, with single offshore wind farms reaching hundreds of megawatts. The remote locations and complex environments of these turbines have led to a dramatic increase in the complexity of wind farm data and knowledge, encompassing multi-dimensional information resources. Traditional knowledge management methods rely on keyword searches and simple database queries, which suffer from weak semantic understanding, lack of contextual understanding, and poor dynamic adaptability. This makes it difficult to meet the needs for real-time and accurate knowledge services, thereby impacting wind farm operation and maintenance efficiency and equipment utilization.
[0003] While large language models possess powerful knowledge integration capabilities, their direct application in industrial scenarios suffers from issues such as knowledge gaps, illusions, and insufficient domain specialization. Retrieval-Augmented Generation (RAG) technology combines large model generation with the factual accuracy of external knowledge bases, providing a new solution for industrial knowledge management. However, existing RAG systems still face challenges in wind power scenarios, such as heterogeneous knowledge sources and difficulties in handling specialized terminology. Therefore, this application presents a RAG knowledge retrieval method specifically designed for large-scale wind farms to support the construction and efficient operation and maintenance management of smart wind farms. Summary of the Invention
[0004] The purpose of this invention is to address the shortcomings of the prior art by providing a RAG knowledge retrieval method and device based on a wind farm knowledge base. This method achieves a closed-loop transformation from natural language queries in the wind power field to accurate and professional answers by first parsing the query intent, then performing database-specific retrieval, and finally integrating the results to generate a structured report.
[0005] To achieve the above objectives, the technical solutions adopted in the embodiments of this application are as follows: In a first aspect, embodiments of this application provide a RAG knowledge retrieval method based on a wind field knowledge base, including: The input query statement is parsed to obtain query intent information, which includes: query semantic information, corresponding query entity information, and answer type label; Based on the query semantic information, a vector retrieval is performed on the vector database in the preset wind field knowledge base to obtain target knowledge fragments similar to the query semantic information; Based on the query entity information and the answer type label, a knowledge graph retrieval is performed on the graph database in the preset wind field knowledge base to obtain the target graph path information corresponding to the query entity information; Based on the target knowledge fragments and the target graph path information, a structured knowledge report is generated using a pre-defined knowledge generation model in the wind power field.
[0006] In an optional implementation, the query semantic information includes: query text; the step of performing vector retrieval on a vector database in a preset wind field knowledge base based on the query semantic information to obtain target knowledge fragments similar to the query semantic information includes: The query text is semantically encoded to obtain the corresponding query semantic feature vector; Based on the query semantic feature vector, a vector retrieval is performed on the first vector database in the preset wind field knowledge base to obtain a first target knowledge fragment similar to the query semantic information.
[0007] In an optional implementation, the query semantic information further includes: query time-series data; the step of performing vector retrieval on a vector database in a preset wind field knowledge base based on the query semantic information to obtain target knowledge fragments similar to the query semantic information includes: Feature extraction is performed on the query time series data to obtain the corresponding query time series feature vector; Based on the query time-series feature vector, a vector retrieval is performed on the second vector database in the preset wind field knowledge base to obtain a second target knowledge fragment similar to the query semantic information.
[0008] In an optional implementation, before performing vector retrieval on a first vector database in a preset wind field knowledge base based on the query semantic feature vector to obtain a first target knowledge fragment similar to the query semantic information, the method further includes: Obtain unstructured knowledge text and corresponding metadata in the wind power field; The unstructured knowledge text is split into knowledge fragments to obtain multiple knowledge fragments; Semantic encoding is performed on each knowledge fragment to obtain the text vector corresponding to the knowledge fragment; Based on the metadata and the text vector corresponding to the knowledge fragment, construct the knowledge vector corresponding to the knowledge fragment; The first vector database is constructed based on the knowledge vectors corresponding to the multiple knowledge fragments.
[0009] In an optional implementation, before performing vector retrieval on a second vector database in a preset wind field knowledge base based on the query time-series feature vector to obtain a second target knowledge fragment similar to the query semantic information, the method further includes: Acquire multiple historical time-series data in the wind power sector; Feature extraction is performed on each of the historical time series data to obtain the time series feature vector corresponding to the historical time series data; A second vector database is constructed based on the time-series feature vectors corresponding to multiple historical time-series data.
[0010] In an optional implementation, the step of performing a knowledge graph retrieval on the graph database in the preset wind field knowledge base based on the query entity information and the answer type tag to obtain the target graph path information corresponding to the query entity information includes: Starting from the target entity corresponding to the query entity information in the graph database, the path traversal is performed on the node relationships corresponding to the answer type label to obtain the target graph path information.
[0011] In an optional implementation, before performing a knowledge graph retrieval on the graph database in the preset wind field knowledge base based on the query entity information and the answer type tag to obtain the target graph path information corresponding to the query entity information, the method further includes: Wind power knowledge entities are extracted from the unstructured knowledge text in the wind power field to obtain multiple knowledge entities; Based on the knowledge entities, perform syntactic analysis on the unstructured knowledge text to obtain the entity relationships between the knowledge entities and other knowledge entities; The graph database is constructed based on the multiple knowledge entities and the entity relationships between the knowledge entities and other knowledge entities.
[0012] In an optional implementation, the step of generating a structured knowledge report based on the target knowledge fragment and the target graph path information, using a preset knowledge generation model in the wind power field, includes: Based on the target map path information, a fault tree analysis model is constructed; Based on the target knowledge fragment, obtain the processing step flowchart and standard clause reference information; Based on the fault tree analysis model, the processing step flowchart, and the standard clause reference information, the structured knowledge report is generated using the preset knowledge generation model.
[0013] In an optional implementation, before generating a structured knowledge report using a pre-defined knowledge generation model in the wind power field based on the target knowledge fragment and the target graph path information, the method further includes: Conflict detection is performed on the target knowledge fragment and the target graph path information to obtain conflict detection results; The step of generating a structured knowledge report based on the target knowledge fragment and the target graph path information, using a pre-set knowledge generation model in the wind power field, includes: Based on the target knowledge fragment, the target graph path information, and the conflict detection results, the structured knowledge report is generated using the preset knowledge generation model.
[0014] Secondly, embodiments of this application also provide a RAG knowledge retrieval device based on a wind field knowledge base, comprising: The parsing module is used to parse the input query statement to obtain query intent information, which includes: query semantic information, corresponding query entity information, and answer type label; The retrieval module is used to perform vector retrieval in the vector database of the preset wind field knowledge base based on the query semantic information to obtain target knowledge fragments similar to the query semantic information. The retrieval module is also used to perform knowledge graph retrieval on the graph database in the preset wind field knowledge base based on the query entity information and the answer type tag, so as to obtain the target graph path information corresponding to the query entity information; The generation module is used to generate a structured knowledge report by using a pre-set knowledge generation model in the wind power field based on the target knowledge fragment and the target graph path information.
[0015] Thirdly, embodiments of this application also provide a computer device, including: a processor, a storage medium, and a bus, wherein the storage medium stores program instructions executable by the processor, and when the computer device is running, the processor communicates with the storage medium via the bus, and the processor executes the program instructions to perform the steps of the RAG knowledge retrieval method based on the wind field knowledge base as described in any of the first aspects.
[0016] Fourthly, embodiments of this application also provide a computer-readable storage medium storing a computer program, which, when executed by a processor, performs the steps of the RAG knowledge retrieval method based on a wind field knowledge base as described in any of the first aspects.
[0017] The beneficial effects of this application are: This application provides a method and device for RAG knowledge retrieval based on a wind farm knowledge base. The method includes: parsing the input query statement to obtain query intent information, which includes query semantic information, corresponding query entity information, and answer type tags; performing vector retrieval on a vector database in a preset wind farm knowledge base based on the query semantic information to obtain target knowledge fragments similar to the query semantic information; performing knowledge graph retrieval on a graph database in the preset wind farm knowledge base based on the query entity information and answer type tags to obtain target graph path information corresponding to the query entity information; and generating a structured knowledge report based on the target knowledge fragments and target graph path information using a preset knowledge generation model in the wind power field.
[0018] The method presented in this application achieves a closed-loop transformation from natural language queries to accurate and professional answers in the wind power field by first parsing the query intent, then searching through different databases, and finally merging them to generate a structured report. This not only solves the problem of insufficient semantic understanding in traditional keyword retrieval, but also takes into account the semantic similarity and entity association logic of knowledge through the collaborative retrieval of a dual-database system of vector and graph databases. Furthermore, by combining this with a wind power-specific large-scale model to generate traceable structured reports, the method significantly improves the accuracy, professionalism, and practicality of wind farm operation and maintenance knowledge retrieval, effectively reduces the knowledge search costs for operation and maintenance personnel, and provides standardized and implementable knowledge support for fault diagnosis and operation and maintenance decision-making. Attached Figure Description
[0019] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present invention and should not be regarded as a limitation on the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0020] Figure 1 One of the flowcharts for a RAG knowledge retrieval method based on a wind field knowledge base provided in this application embodiment; Figure 2 A second flowchart illustrating a RAG knowledge retrieval method based on a wind field knowledge base, provided for an embodiment of this application; Figure 3 The third flowchart illustrates a RAG knowledge retrieval method based on a wind field knowledge base, as provided in this application embodiment. Figure 4 The fourth flowchart illustrates a RAG knowledge retrieval method based on a wind field knowledge base, as provided in this application embodiment. Figure 5 The fifth flowchart illustrates a RAG knowledge retrieval method based on a wind field knowledge base, as provided in this application embodiment. Figure 6 A flowchart illustrating a RAG knowledge retrieval method based on a wind field knowledge base, provided for an embodiment of this application, is shown in Figure 6. Figure 7 The seventh flowchart illustrates a RAG knowledge retrieval method based on a wind field knowledge base, as provided in this application embodiment. Figure 8 This is the eighth flowchart illustrating a RAG knowledge retrieval method based on a wind field knowledge base, provided for an embodiment of this application. Figure 9 A schematic diagram of the functional modules of a RAG knowledge retrieval device based on a wind field knowledge base provided in an embodiment of this application; Figure 10 This is a schematic diagram of a computer device provided in an embodiment of this application. Detailed Implementation
[0021] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are some embodiments of the present invention, but not all embodiments.
[0022] Therefore, the following detailed description of the embodiments of this application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely to illustrate selected embodiments of the application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without inventive effort are within the scope of protection of this application.
[0023] In the description of this application, it should be noted that if the terms "upper", "lower", etc. appear to indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings, or the orientation or positional relationship that the product of this application is usually placed in, it is only for the convenience of describing this application and simplifying the description, and does not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of this application.
[0024] Furthermore, the terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Additionally, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0025] It should be noted that, where there is no conflict, the features in the embodiments of this application can be combined with each other.
[0026] Retrieval Augmentation (RAG) is an artificial intelligence technology architecture that combines information retrieval and large model generation. Its core is to allow the large model to retrieve relevant and accurate information from an external dedicated knowledge base before generating an answer, and then generate an answer based on the retrieved information, rather than simply relying on the model's own pre-trained knowledge.
[0027] The RAG knowledge retrieval method based on a wind field knowledge base provided in this application will be explained in detail below with reference to the accompanying drawings and specific examples. The RAG knowledge retrieval method based on a wind field knowledge base provided in this application can also be implemented by a computer device by running an algorithm or software. The computer device can be, for example, a server or a terminal, and the terminal can be a user computer. Figure 1 This application provides one of the flowcharts illustrating a RAG knowledge retrieval method based on a wind field knowledge base; for example... Figure 1 As shown, the method includes: S101. Perform intent parsing on the input query statement to obtain query intent information.
[0028] The query intent information includes: query semantic information, corresponding query entity information, and answer type labels.
[0029] In this embodiment, a wind power-specific fine-tuned intent recognition model is used as the core engine for intent parsing. For example, the intent recognition model adopts a large wind power model (Wind Bidirectional Encoder Representations from Transformers-X1, WindBERT-X1). This model is based on the Transformer architecture to complete domain adaptation optimization and can perform multi-granular semantic parsing of natural language query statements input by users.
[0030] For example, if the input query is: "How to handle a high-temperature alarm in a gearbox", the intent recognition model uses a self-attention mechanism to capture the contextual dependencies of keywords such as "gearbox", "high-temperature alarm", and "handling". Combined with the built-in wind power domain rule matching engine, it accurately identifies the core intent of the query as fault diagnosis. Simultaneously, it calls a preset entity recognition model for the wind power domain to label the gearbox component entity (excluding model and parameter entities) from the query. It also matches the query with answer type tags for fault solutions, ultimately integrating these to form query intent information. The query semantic information is: fault handling related semantics for a high-temperature alarm in a gearbox; the query entity information is: component: gearbox; and the answer type tag is: fault solution.
[0031] For complex queries that include time-series data associations, the intent recognition model can also simultaneously identify the secondary intents of the time-series data query, providing a basis for subsequent multi-type searches.
[0032] S102. Based on the query semantic information, perform vector retrieval on the vector database in the preset wind field knowledge base to obtain target knowledge fragments similar to the query semantic information.
[0033] Specifically, based on the query semantic information obtained in step S101, a retrieval operation is performed using the vector database architecture in the preset wind field knowledge base. Target knowledge fragments highly similar to the query semantic information are traversed from the vector database to provide text-based knowledge support for subsequent fault diagnosis.
[0034] S103. Based on the query entity information and answer type tags, perform knowledge graph retrieval on the graph database in the preset wind field knowledge base to obtain the target graph path information corresponding to the query entity information.
[0035] Optionally, starting from the target entity corresponding to the query entity information in the graph database, a path traversal is performed on the node relationships corresponding to the answer type labels to obtain the target graph path information.
[0036] For example, based on the query entity information obtained in step S101: component: gearbox and answer type tag: fault solution, a knowledge graph retrieval is performed in the graph database of the preset wind farm knowledge base.
[0037] This graph database is built on the IEC 61400 wind power industry standard to construct the top-level core concept. It stores entities and various relationships centered on wind turbines, components, faults, and solutions. Taking the gearbox entity as the starting point for retrieval, it matches the three types of core relationships predefined in the graph database, namely, contain-component, associated-fault, and corresponding-solution, based on the answer type tags of the fault solution, and performs a path traversal within 3 degrees.
[0038] During the traversal, the PageRank algorithm is used to sort the nodes by weight, prioritizing the selection of frequently occurring fault nodes and solution nodes. For example, the final result is a target graph path information consisting of multiple associated paths, starting from the gearbox, including gearbox-related-lubricating oil cooling system fault, lubricating oil cooling system fault-corresponding-cooling pump repair, gearbox-related-heat dissipation device abnormality, and heat dissipation device abnormality-corresponding-cleaning heat sink, presenting the entity association logic and solution link related to the gearbox high temperature alarm.
[0039] In knowledge graph retrieval, the PageRank algorithm is used to evaluate node importance. The PageRank value PR(vi) of node vi is calculated iteratively as follows:
[0040] Where N is the total number of nodes in the graph database, In(vi) is the set of nodes pointing to node vi, Out(vj) is the set of nodes pointed to by node vj, and d is the damping factor (usually set to 0.85). Important nodes (such as high-frequency faults) will receive higher weights.
[0041] S104. Based on the target knowledge fragments and target graph path information, a structured knowledge report is generated using a pre-set knowledge generation model in the wind power field.
[0042] Specifically, the target knowledge fragments and target graph path information are mixed, and a pre-defined knowledge generation model in the wind power field is used to generate a structured knowledge report. For example, the target knowledge fragments related to high temperature in gearboxes are integrated with the target graph path information and fed into the pre-defined knowledge generation model in the wind power field as input information.
[0043] Among them, the pre-trained knowledge generation model for the wind power field is a finely tuned model specifically for the wind power sector (WindGenerative Pre-trained Transformer-3.5, WindGPT-3.5). This model pre-injects wind farm fault diagnosis expert role instructions through the Prompt project, possessing long context processing capabilities and accurately parsing standard clauses, historical case features, and entity relationships in the input information. The pre-trained knowledge generation model first sorts out the causal logic of the fault based on the target graph path information, and then combines the specific operation methods, industry standard requirements, and historical case experience in the target knowledge fragments. Following the standardized format of wind farm operation and maintenance reports, it generates a structured knowledge report containing fault cause analysis, step-by-step handling solutions, standard clause references, historical case references, and precautions. The report adds knowledge traceability annotations to key content, such as [Source: Chapter 8 of IEC 61400-25 standard], [Source: 202X wind farm gearbox high temperature fault case], and also supports one-click export to PDF format, directly providing users with standardized and traceable decision-making basis for on-site operations.
[0044] In summary, this application provides a RAG knowledge retrieval method based on a wind farm knowledge base. The method includes: parsing the input query statement to obtain query intent information, which includes query semantic information, corresponding query entity information, and answer type tags; performing vector retrieval on a vector database in a preset wind farm knowledge base based on the query semantic information to obtain target knowledge fragments similar to the query semantic information; performing knowledge graph retrieval on a graph database in the preset wind farm knowledge base based on the query entity information and answer type tags to obtain target graph path information corresponding to the query entity information; and generating a structured knowledge report using a preset knowledge generation model in the wind power field based on the target knowledge fragments and target graph path information.
[0045] The method presented in this application achieves a closed-loop transformation from natural language to precise and professional answers in the wind power field by first parsing the query intent, then searching through different databases, and finally merging them to generate a structured report. This not only solves the problem of insufficient semantic understanding in traditional keyword retrieval, but also takes into account the semantic similarity and entity association logic of knowledge through the collaborative retrieval of a dual-database system of vector and graph databases. Furthermore, by combining a wind power-specific large-scale model to generate traceable structured reports, the method significantly improves the accuracy, professionalism, and practicality of wind farm operation and maintenance knowledge retrieval, effectively reduces the knowledge search cost for operation and maintenance personnel, and provides standardized and implementable knowledge support for fault diagnosis and operation and maintenance decision-making.
[0046] This application also provides another possible implementation of the RAG knowledge retrieval method based on a wind field knowledge base, wherein the query semantic information includes: query text. Figure 2This is the second flowchart illustrating a RAG knowledge retrieval method based on a wind field knowledge base, as provided in this application embodiment. Figure 2 As shown, based on the query semantic information, a vector retrieval is performed on the vector database in the preset wind field knowledge base to obtain target knowledge fragments similar to the query semantic information, including: S201. Perform semantic encoding on the query text to obtain the corresponding query semantic feature vector.
[0047] In this embodiment, for the plain text portion of the query semantic information, such as the query text "How to handle a high temperature alarm in a gearbox", the WindBERT-Vec model, specifically fine-tuned for the wind power field, is used for semantic encoding. This model is an optimized version of the Sentence Embeddings using Siamese BERT-Networks (Sentence-BERT) model, which is an improved version of the BERT architecture, in the wind power field. After fine-tuning and training with massive amounts of wind power operation and maintenance manuals, fault logs, and industry standard texts, it can accurately capture the semantic features of wind power professional terms.
[0048] The WindBERT-Vec model first performs preprocessing on the query text, such as word segmentation, stop word removal, and normalization of technical terms. It maps technical terms such as gearbox high temperature alarm into domain-specific word vectors. Then, it performs deep semantic representation of the text through a bidirectional encoding layer and optimizes the semantic representation effect by combining contrastive learning. Finally, it generates a query semantic feature vector with a dimension of 1024.
[0049] S202. Based on the query semantic feature vector, perform vector retrieval on the first vector database in the preset wind field knowledge base to obtain the first target knowledge fragment similar to the query semantic information.
[0050] Specifically, the first vector database is a text-based knowledge vector database built on Elasticsearch-KNN. KNN stands for K-Nearest Neighbor, which is used to select the k nearest neighbors for a given sample point. All knowledge vectors stored in the first vector database are 1024-dimensional, consistent with the dimension of the query semantic feature vectors, and are all generated by the WindBERT-Vec model. It also incorporates metadata information such as document source, publication time, and knowledge type.
[0051] After inputting the query semantic feature vector into the first vector database, the similarity between the query semantic feature vector and the knowledge vector in the first vector database is calculated using the L2 regularized Euclidean distance formula. The specific formula is as follows:
[0052] in, To query semantic feature vectors, This represents the i-th knowledge vector in the first vector database. The vectors are sorted in ascending order based on the calculated distance values; a smaller distance value indicates higher semantic similarity. For example, the top 20 knowledge vectors are initially selected. Then, a second filtering is performed using the knowledge vectors' metadata (such as prioritizing the latest industry standards and historical cases from similar wind farms). Finally, the first target knowledge fragment, highly similar to the query semantic information, is obtained. This fragment primarily consists of textual knowledge in the wind power field, such as fragments from operation and maintenance manuals, standard clauses, and textual historical cases. The number of first target knowledge fragments can be multiple and is not limited here.
[0053] In the method provided in this application embodiment, the query text is semantically encoded to obtain the corresponding query semantic feature vector. The retrieval is carried out by combining the adapted first vector database and the similarity calculation method. This not only ensures the professionalism of the text semantic encoding and can accurately capture the semantic association of wind power professional terms, but also improves the retrieval efficiency and matching accuracy of text-based knowledge fragments through the standardized vector retrieval process, effectively filtering out the first target knowledge fragments that are highly similar to the query semantics.
[0054] This application also provides another possible implementation of the RAG knowledge retrieval method based on a wind field knowledge base, wherein the query semantic information further includes: query time series data. Figure 3 This is the third flowchart illustrating a RAG knowledge retrieval method based on a wind field knowledge base, as provided in the embodiments of this application. Figure 3 As shown, based on the query semantic information, a vector retrieval is performed on the vector database in the preset wind field knowledge base to obtain target knowledge fragments similar to the query semantic information, including: S301. Extract features from the query time series data to obtain the corresponding query time series feature vector.
[0055] In this embodiment, if the query semantic information includes query time-series data, such as gearbox vibration data, temperature change curves, and power curves collected synchronously when the gearbox is under high temperature alarm, a deep generative neural network WaveNet encoder is used for feature extraction. The WaveNet encoder expands the receptive field exponentially through a dilated causal convolution structure, effectively capturing local pattern features in the query time-series data, such as abnormal peaks in the vibration data and abrupt changes in the temperature curve. Combined with a Long Short Term Memory (LSTM) layer, the long-term dependency features of the time-series data are further learned, such as the continuous change trend of temperature over time and the correlation between power and temperature.
[0056] The WaveNet encoder first normalizes and denoises the original time-series data to eliminate noise interference during data acquisition. Then, it extracts local features through the convolutional layer of the WaveNet encoder and inputs them into the LSTM layer for long-term dependency feature learning. Finally, it compresses the high-dimensional original time-series data into a 512-dimensional query time-series feature vector.
[0057] The core of the WaveNet encoder is the dilated causal convolution, and the output of its l-th layer... Output from the previous layer The result of convolution is specifically represented as:
[0058] in, d represents the expansion rate, which is Convolution operation, and is a learnable parameter, f is the activation function, and this structure can exponentially expand the receptive field and effectively capture long-term temporal dependencies.
[0059] S302. Based on the query time-series feature vector, perform vector retrieval on the second vector database in the preset wind field knowledge base to obtain a second target knowledge fragment similar to the query semantic information.
[0060] The second vector database is a time-series knowledge vector database built on the Facebook AI Similarity Search (Faiss) index library. It is specifically used to store time-series data feature vectors in the wind power field. All vectors are 512-dimensional and are extracted and generated by the WaveNet encoder, covering time-series knowledge such as historical vibration data, temperature change data, and power curve data of various wind turbine components.
[0061] After inputting the query time-series feature vector into the second vector database, the efficient similarity calculation capability of the Faiss index is utilized. Cosine similarity is used as the metric to quickly calculate the similarity between the query vector and all time-series feature vectors in the database. For example, the top 10 time-series feature vectors with the highest similarity are selected, along with their corresponding time-series knowledge fragments. These knowledge fragments include historical fault time-series data similar to the current query time-series data features, trends in equipment status changes corresponding to faults, and time-series data recovery patterns after maintenance and repair. These serve as the second target knowledge fragment, complementing the first target knowledge fragment and providing time-series data-based knowledge support for fault diagnosis.
[0062] In the method provided in this application embodiment, feature extraction is performed on the query time-series data to obtain the corresponding query time-series feature vector, and then retrieval is achieved through a dedicated time-series vector database. This fills the gap in the processing of time-series operation and maintenance data in traditional knowledge retrieval, realizes the collaborative retrieval of text semantics and equipment runtime time-series data, provides knowledge support at the time-series data level for wind turbine fault diagnosis, makes fault analysis more in line with the actual operating status of equipment, and improves the comprehensiveness of retrieval results.
[0063] This application also provides another possible implementation of the RAG knowledge retrieval method based on a wind field knowledge base. Figure 4 This is the fourth flowchart illustrating a RAG knowledge retrieval method based on a wind field knowledge base, as provided in the embodiments of this application. Figure 4 As shown, before performing vector retrieval on the first vector database in the preset wind field knowledge base based on the query semantic feature vector to obtain the first target knowledge fragment similar to the query semantic information, the method further includes: S401. Obtain unstructured knowledge text and corresponding metadata in the wind power field.
[0064] In this embodiment, a multimodal document parsing engine is used to comprehensively collect various unstructured knowledge texts in the wind power field. The collection scope covers operation and maintenance manuals and technical manuals provided by wind turbine manufacturers, international and domestic wind power industry standards (such as IEC 61400 series), historical operation and maintenance reports, fault logs, accident cases of wind farms, patent texts and academic papers in the wind power field, as well as owner feedback on problems and solutions documents, etc.
[0065] For image and scanned documents, OCR technology is used to convert them into editable text. For table-type documents, professional table extraction technology is used to extract the structured content. At the same time, each piece of unstructured knowledge text collected is labeled with corresponding metadata, which includes information such as document source, publication time, knowledge type, applicable wind turbine model, and core components involved.
[0066] S402. Decompose unstructured knowledge text into knowledge fragments to obtain multiple knowledge fragments.
[0067] Specifically, the original text is first segmented into sentences and paragraphs. Then, based on the professional knowledge boundaries in the wind power field, such as core modules like fault type, handling steps, standard clauses, and case background, the initially segmented text is further refined. For example, a gearbox maintenance manual can be split into multiple independent knowledge segments, such as gearbox daily inspection knowledge segments, gearbox temperature abnormality fault handling knowledge segments, and gearbox lubricant replacement knowledge segments. The length of each knowledge segment is controlled within a reasonable range to ensure semantic integrity while facilitating subsequent encoding and retrieval.
[0068] S403. Semantically encode each knowledge segment to obtain the text vector corresponding to the knowledge segment.
[0069] Specifically, the WindBERT-Vec model, which shares the same encoding as the query text, is used to perform unified semantic encoding on each knowledge fragment in the wind power field.
[0070] The WindBERT-Vec model first performs preprocessing on each knowledge fragment, including wind power terminology normalization, stop word removal, and part-of-speech tagging, mapping the professional terms in the fragment to domain-specific word vectors. Then, it uses a bidirectional encoding layer to perform deep semantic representation of the knowledge fragments. Combined with a contrastive learning strategy, it optimizes the model's semantic representation capabilities using text data from the wind power domain as training samples, ensuring that the generated text vectors accurately reflect the core semantics of the knowledge fragments and their professional relevance to the wind power field. Finally, it generates a 1024-dimensional text vector for each knowledge fragment.
[0071] S404. Construct the knowledge vector corresponding to the knowledge fragment based on the metadata and the text vector corresponding to the knowledge fragment.
[0072] Specifically, metadata and text vectors are fused to construct composite knowledge vectors corresponding to knowledge fragments. First, the metadata is digitized, transforming it into a fixed-dimensional metadata vector. Then, the 1024-dimensional text vector and metadata vector are concatenated and deeply fused through a feature fusion layer to eliminate feature redundancy. This ultimately generates a composite knowledge vector containing both semantic and metadata features. This knowledge vector retains 1024 dimensions, preserving the core semantic information of the knowledge fragment while incorporating attribute information such as document source and applicable scenarios, enabling subsequent retrieval to be precisely filtered based on metadata.
[0073] S405. Construct a first vector database based on the knowledge vectors corresponding to multiple knowledge fragments.
[0074] Specifically, a first-ever vector database dedicated to the wind power industry is built based on the Elasticsearch-KNN vector retrieval engine. All composite knowledge vectors corresponding to knowledge fragments are imported into Elasticsearch-KNN in batches. A unique index identifier is created for each knowledge vector, and it is associated with the corresponding knowledge fragment to ensure that the original knowledge fragment can be quickly located after retrieving the knowledge vector. Simultaneously, the database retrieval rules are configured, setting L2 Euclidean distance as the default similarity metric, optimizing the index structure to improve retrieval efficiency, and supporting Approximate Nearest Neighbor (ANN) search, which can quickly filter out the vector with the highest similarity to the query vector from massive amounts of knowledge vectors.
[0075] In addition, an incremental update mechanism is configured for the first vector database. When new wind power text-based knowledge is added, knowledge vectors can be generated through the same process and imported into the database in batches, thereby realizing dynamic expansion of the database and ensuring the timeliness of knowledge.
[0076] The method provided in this application involves the collection and refined segmentation of multi-source unstructured knowledge text, professional semantic encoding, and metadata fusion to construct composite knowledge vectors. Ultimately, it builds an adapted vector database, achieving systematic and digital storage of dispersed and heterogeneous text-based knowledge related to wind power. This not only solves the problem of fragmented wind power knowledge but also allows for precise filtering of retrieval based on knowledge attributes through the fusion of metadata and semantic vectors. Meanwhile, the incremental update mechanism ensures the timeliness of the knowledge base, laying a high-quality and structured knowledge foundation for text vector retrieval and improving the stability and scalability of subsequent retrieval.
[0077] This application also provides another possible implementation of the RAG knowledge retrieval method based on a wind field knowledge base. Figure 5 The fifth flowchart illustrates a RAG knowledge retrieval method based on a wind field knowledge base, as provided in this application embodiment. Figure 5 As shown, before performing vector retrieval on the second vector database in the preset wind field knowledge base based on the query time-series feature vector to obtain a second target knowledge fragment similar to the query semantic information, the method further includes: S501: Obtain multiple historical time-series data in the wind power sector.
[0078] In this embodiment, the wind power field data acquisition system is used to acquire various historical time-series data in the wind power field in batches. The acquisition scope covers the operation sequence data of each core component of the wind turbine (wind rotor, nacelle, transmission chain, gearbox, generator, etc.), including vibration data, temperature data, speed data, power curve data, voltage and current data, etc. It also covers the time-series data when various wind turbine faults occur, the time-series data during the fault handling process, and the operation sequence data after the fault is recovered.
[0079] During the data collection process, the integrity and accuracy of the time series data are ensured. The collection time, wind turbine number, component name, operating condition and other attribute information corresponding to each time series data are recorded. Missing or abnormal time series data are completed and noise is preprocessed. At the same time, historical time series data are classified and organized according to wind turbine model, component type and data type to provide a standardized data source for subsequent feature extraction.
[0080] S502. Extract features from each historical time series data to obtain the time series feature vector corresponding to the historical time series data.
[0081] Specifically, the WaveNet encoder, which shares the same source as the feature extraction for query time-series data, is used to perform unified feature extraction on various types of historical wind power time-series data. For historical time-series data of different types and dimensions, normalization is first performed to map the data to the same numerical range and eliminate dimensional differences. Then, the preprocessed time-series data is input into the WaveNet encoder, which captures local features in the data through dilated causal convolutional layers, such as the frequency characteristics of vibration data and the abrupt change characteristics of temperature data. Subsequently, the data is input into an LSTM layer to learn the long-term dependency features of the time-series data, such as the temperature change trend over time and the correlation between power and vibration. Finally, a fully connected layer compresses the high-dimensional feature representation into a 512-dimensional time-series feature vector.
[0082] S503. Construct a second vector database based on the time series feature vectors corresponding to multiple historical time series data.
[0083] Specifically, based on the Faiss high-performance index library, a second vector database dedicated to the wind power field is constructed to store feature vectors of wind power time-series data. All 512-dimensional time-series feature vectors corresponding to historical time-series data are imported into the Faiss index library in batches. A unique index identifier is created for each time-series feature vector, and it is associated and mapped with the corresponding original historical time-series data and data attribute information.
[0084] The retrieval rules for the second vector database are configured, using cosine similarity as the default similarity metric. The index structure is optimized to support millisecond-level approximate nearest neighbor search, meeting the retrieval speed requirements for real-time wind farm fault diagnosis. Simultaneously, a real-time incremental update mechanism is configured for the database, allowing newly collected time-series data to be extracted and imported into the second vector database in real time, ensuring the real-time nature of time-series knowledge.
[0085] The method provided in this application collects various types of historical time-series data of wind farms in a standardized manner, extracts core feature vectors, and builds a dedicated time-series vector database. This achieves the characteristic and efficient storage of wind power equipment's operating time-series data, ensuring millisecond-level response speed for time-series data retrieval to meet the needs of real-time fault diagnosis in wind farms. It can also accurately match historical data with similar features to the queried time-series data, providing time-series knowledge support for fault tracing and status analysis, and enriching the knowledge types of the knowledge base.
[0086] This application also provides another possible implementation of the RAG knowledge retrieval method based on a wind field knowledge base. Figure 6 This is the sixth flowchart illustrating a RAG knowledge retrieval method based on a wind field knowledge base, as provided in this application embodiment. Figure 6As shown, before performing a knowledge graph retrieval on the graph database in the preset wind field knowledge base based on the query entity information and answer type tags to obtain the target graph path information corresponding to the query entity information, the method further includes: S601. Extract wind power knowledge entities from unstructured knowledge text in the wind power field to obtain multiple knowledge entities.
[0087] In this embodiment, a BERT-BiLSTM-CRF fusion model is used to accurately extract knowledge entities from unstructured knowledge text in the wind power field. The model uses a Bidirectional Encoder Representations from Transformers (BERT) as the basic feature extraction layer, inputting fine-tuned word vectors from the wind power field to generate deep semantic features for each character in the unstructured knowledge text. The output of BERT is then input into a Bidirectional Long Short-Term Memory (BiLSTM) layer to capture the contextual semantic relationships of the text and further optimize the feature representation. Finally, a Conditional Random Field (CRF) layer is used to jointly decode the feature sequence, solving for the optimal entity label sequence to accurately identify and extract wind power knowledge entities from the text.
[0088] In entity recognition, the Conditional Random Field (CRF) layer is used to jointly decode the feature sequence h=(h1,h2,...,hT) extracted by BERT-BiLSTM to predict the optimal label sequence y=(y1,y2,...,yT), whose conditional probability is defined as:
[0089] Where fk is the feature function, λk is its weight, and Z(h) is the normalization factor. The label sequence with the highest probability is solved by the Viterbi algorithm.
[0090] The extracted knowledge entities cover core concept entities in the wind power field, including equipment entities such as wind turbines, gearboxes, generators, heat dissipation devices, fault entities such as high temperature alarms, abnormal vibrations, and power curve deviations, and standard entities such as IEC61400-25 and GB / T 1907, resulting in multiple wind power knowledge entities.
[0091] S602. Perform syntactic analysis on unstructured knowledge text based on knowledge entities to obtain entity relationships between knowledge entities and other knowledge entities.
[0092] Specifically, based on the knowledge entities obtained, a method combining dependency parsing and rule matching in the wind power field is used to perform syntactic analysis on unstructured knowledge texts and extract the relationships between knowledge entities.
[0093] First, dependency parsing is used to identify the syntactic structural relationships between words in the text, such as subject-predicate, verb-object, modifier-head, and causal relationships. Then, combined with the predefined entity relationship system in the wind power field, such as location, inclusion, cause, have parameters, corresponding solutions, and follow standards, the syntactic structural relationships are mapped and transformed to extract the domain-specific entity relationships between knowledge entities. For example, from the text: "A failure in the gearbox's lubricating oil cooling system will cause a high-temperature alarm in the gearbox," syntactic analysis identifies the inclusion relationship between the gearbox and the lubricating oil cooling system, and the cause relationship between the lubricating oil cooling system failure and the high-temperature alarm in the gearbox. From the text: "The processing of the high-temperature alarm in the gearbox follows the IEC 61400-25 standard," the relationship between the high-temperature alarm in the gearbox and the IEC 61400-25 standard is extracted, ultimately resulting in multiple sets of entity-relationship-entity triples, which fully represent the association logic between knowledge entities.
[0094] S603. Construct a graph database based on multiple knowledge entities and the entity relationships between knowledge entities and other knowledge entities.
[0095] Specifically, based on the Neo4j graph database engine, a knowledge graph database dedicated to the wind power field is constructed, with knowledge entities as nodes and entity relationships as edges between nodes, to achieve structured storage of knowledge in the wind power field.
[0096] First, create various entity nodes in the Neo4j graph database and add attribute information to each node. For example, add attributes such as model, applicable fan, and manufacturer to the gearbox node, and add attributes such as fault level and occurrence frequency to the fault node. Then, create relation edges between the corresponding entity nodes based on the entity-relationship-entity triple.
[0097] Meanwhile, a top-level ontology framework for the graph database is constructed based on the IEC 61400 wind power industry standard, classifying and managing entity nodes and relation edges to ensure a clear hierarchical structure and rigorous logic. Furthermore, a query language retrieval interface is configured for the graph database, supporting various retrieval methods such as multi-hop reasoning, node traversal, and path query. An incremental update mechanism is also implemented, allowing newly extracted entities and relations to be integrated into the graph database in real time, enabling dynamic expansion of the knowledge graph.
[0098] The method provided in this application extracts wind power knowledge entities and their relationships to build a graph database with entities as nodes and relationships as edges. This achieves structured and visualized storage of wind power knowledge, breaks the linear association limitation of traditional text knowledge, accurately mines the deep association knowledge of query entities, and clearly presents the logical links between wind power equipment, faults, and solutions, providing structured relational knowledge support for fault diagnosis.
[0099] This application also provides another possible implementation of the RAG knowledge retrieval method based on a wind field knowledge base. Figure 7 This is the seventh flowchart illustrating a RAG knowledge retrieval method based on a wind field knowledge base, as provided in the embodiments of this application. Figure 7 As shown, based on the target knowledge fragments and target graph path information, a pre-defined knowledge generation model in the wind power field is used to generate a structured knowledge report, including: S701. Construct a fault tree analysis model based on the target map path information.
[0100] In this embodiment, a targeted fault tree analysis model is constructed based on the target map path information and combined with the fault tree analysis method in the wind power field.
[0101] First, the corresponding core fault event, such as the gearbox high temperature alarm, is identified as the top event in the fault tree and placed at the top level. Then, based on the entity relationships in the target graph path information, the direct cause events of the top event, such as lubricating oil cooling system failure, heat dissipation device malfunction, and low lubricating oil level, are identified as intermediate events and placed at the next level below the top event. The direct causes of the intermediate events are further broken down, such as the lubricating oil cooling system failure being broken down into bottom events like coolant pump damage and coolant pipe blockage, and placed at the bottom level of the fault tree.
[0102] By identifying the causes and inclusion relationships in the target graph path information, the logical relationships between events in the fault tree are clarified. At the same time, by combining historical fault data in the wind power field, the occurrence probability is added to each basic event, and finally a fault tree analysis model with clear hierarchy, rigorous logic and probability information is constructed, which intuitively presents the causal relationship chain and root cause of the core fault.
[0103] For example, a rule engine is used to match the causal chain between high gearbox temperature and abnormal cooling system. This is combined with maintenance records from a historical case dated [Date] for pattern alignment. A Bayesian network is then used to quantify the probability of each cause, generating a logical inference chain that includes possible cause 1: lubricating oil cooling system failure (according to standard IEC 61400-25). In fault diagnosis, the Bayesian network is used to calculate the posterior probability of a root cause Hi (e.g., lubricating oil cooling system failure) when evidence E (e.g., high gearbox temperature) is observed.
[0104] Where P(Hi) is the prior probability (based on historical frequency), P(E|Hi) is the likelihood probability (the probability of observing evidence given a cause), and the posterior probability P(Hi|E) is used to rank the probabilities of possible causes.
[0105] S702. Based on the target knowledge fragment, obtain the processing step flowchart and standard clause reference information.
[0106] Specifically, the target knowledge fragments are structured and analyzed to extract information, and the process flow table and standard clause reference information related to the query fault are selected.
[0107] First, using text structure extraction technology, specific operational steps for fault handling are extracted from maintenance manuals and fault handling cases in the target knowledge fragment. These steps are then organized into a standardized process flow chart, clearly defining the operational content, tools, standards, and precautions for each step. Next, standard clauses related to fault diagnosis and handling are extracted from industry standard documents and technical specifications in the target knowledge fragment. Information such as standard number, standard name, clause content, and applicable scenarios are then compiled to form standard clause reference information.
[0108] S703. Based on the fault tree analysis model, the process flow table, and the standard clause reference information, a large model is generated using preset knowledge to produce a structured knowledge report.
[0109] Specifically, the fault tree analysis model, processing step flowchart, and standard clause reference information are integrated as core inputs and fed into the WindGPT-3.5 large-scale knowledge generation model for fine-tuning in the wind power field to generate a structured knowledge report.
[0110] A pre-loaded standardized template for wind farm operation and maintenance reports is used to guide the model to generate reports according to a fixed structure via a Prompt project. For example, during the generation process, the model integrates the causal logic of fault tree analysis into the fault cause analysis section, visually presenting the relationships between top, intermediate, and bottom events, as well as the probability of occurrence of each bottom event. The processing step flowchart is fully integrated into the fault handling plan section, maintaining the logic and operability of the steps. Standard clause references are accurately marked next to the corresponding processing steps and fault analysis content, adding knowledge traceability tags. Simultaneously, similar historical cases are selected from the target knowledge fragments and integrated into the case reference section of the report, providing practical evidence for fault handling. After the report is generated, the model uses a template engine to standardize the format, ultimately generating a structured knowledge report.
[0111] The method provided in this application clearly presents the causal logic of faults by constructing a fault tree analysis model, extracts standardized processing procedures and standard clauses from knowledge fragments, and then generates a structured report according to the operation and maintenance specifications by combining a large-scale model specifically for the wind power industry. This not only transforms fragmented search results into a systematic and practical professional report, but also incorporates fault trees, standard references, knowledge tracing, and other content, making the structured knowledge report logical, standardized, and auditable. Users do not need to reorganize the search results, and it directly provides standardized decision-making basis for on-site operation and maintenance, greatly improving the efficiency of operation and maintenance work.
[0112] This application also provides another possible implementation of the RAG knowledge retrieval method based on a wind field knowledge base. Figure 8 This is the eighth flowchart illustrating a RAG knowledge retrieval method based on a wind field knowledge base, as provided in this application embodiment. Figure 8 As shown, before generating a structured knowledge report based on target knowledge fragments and target graph path information using a pre-defined knowledge generation model in the wind power field, this method also includes: S801. Perform conflict detection on the target knowledge fragments and target graph path information to obtain conflict detection results.
[0113] In this embodiment, a multi-source evidence consistency verification mechanism is introduced to detect conflicts between target knowledge fragments and target graph path information, ensuring the consistency of information between the two types of knowledge sources.
[0114] For example, check whether the cause of the gearbox high-temperature alarm mentioned in the target knowledge fragment is consistent with the fault correlation in the target graph path information; check whether the fault parameter threshold in the knowledge fragment matches the parameter threshold of the correlation standard in the graph; check the publication time of different knowledge sources to determine whether there is a timeliness conflict; check whether there are duplicate messages in the target knowledge fragment and the target graph path information. Classify and label the detected conflict information, and record the location of the conflict information, the knowledge content involved, and the confidence score of each knowledge source, ultimately forming the conflict detection result.
[0115] Based on the above, using the target knowledge fragments and target graph path information, a pre-defined knowledge generation model in the wind power field is adopted to generate a structured knowledge report, including: S802. Based on the target knowledge fragments, target graph path information, and conflict detection results, a structured knowledge report is generated using a pre-set knowledge generation model.
[0116] The target knowledge fragments, target graph path information, and conflict detection results are input into the pre-set knowledge generation model WindGPT-3.5. Before generating a structured knowledge report, the pre-set knowledge generation model performs knowledge fusion and conflict resolution based on the conflict detection results. The Dempster-Shafer trust function theory is used to weight and fuse conflicting knowledge information. Evidence weights are set according to the confidence scores of each knowledge source, and conflicting content is selected, modified, or synthesized. For conflicts with clear data source priorities, information from high-priority knowledge sources is directly used; for example, when industry standards conflict with historical cases, industry standards prevail. For time-sensitive conflicts, the latest published knowledge information is used. For logical conflicts, a comprehensive judgment is made by combining professional knowledge in the wind power field with historical fault data to select the information that best fits the actual operation and maintenance scenario. After completing conflict resolution and knowledge fusion, the model generates a structured knowledge report, ensuring that the core content such as fault analysis and handling solutions in the structured knowledge report are conflict-free, providing reliable knowledge support for wind farm operation and maintenance decisions.
[0117] It's worth noting that after the structured knowledge report is generated, a concise summary is produced using the BERT-Summarization model. The report is then visualized using a LaTeX template-based typesetting system, supporting the automatic insertion of charts such as fault tree diagrams and step-by-step tables. A comprehensive knowledge sourcing module is also included, adding "[Source: XXX]" annotations to key decision points, core clauses, and solutions in the report to clearly identify the knowledge source (e.g., industry standards, historical cases, maintenance manuals), ensuring the auditability and traceability of the report content. Furthermore, the large model output can be compressed using knowledge distillation technology to generate a more concise version, adapting to the display needs of mobile maintenance terminals and improving the convenience of on-site maintenance.
[0118] The method provided in this application embodiment performs consistency verification and fusion of the results of vector retrieval and graph retrieval through a conflict detection and resolution mechanism. Based on a wind power-specific rule base, it detects conflict points between the two types of knowledge sources from multiple dimensions, and then resolves conflicts through weighted fusion, priority selection, and other methods. This ensures that the knowledge information input into the large model is free from logical contradictions and parameter conflicts, making the final generated structured knowledge report consistent and logically rigorous. This avoids decision-making errors caused by multi-source knowledge conflicts, improves the reliability and accuracy of the knowledge report, and provides unambiguous and reliable knowledge support for wind farm operation and maintenance decisions.
[0119] It should be noted that after generating the structured knowledge report, a distributed log collection system is used to capture new materials in real time, such as human-computer interaction question-and-answer pairs, user feedback on search results, and operation and maintenance application results data, and store them in the structured database PostgreSQL and the unstructured knowledge lake MinIO. A wind power named entity recognition model (WindNER) is used to incrementally extract information from the new materials, identify out-of-vocabulary terms (such as novel composite faults), and extract fault-phenomenon-solution triples. New knowledge is integrated into the graph database through entity alignment and conflict detection. Simultaneously, the vector retrieval model parameters are optimized based on the Proximal Policy Optimization (PPO) reinforcement learning algorithm, and the retrieval ranking algorithm is adjusted using A / B testing. This achieves a closed-loop iteration of data collection, knowledge extraction, graph fusion, and model optimization, allowing the wind farm knowledge base to continuously evolve with the accumulation of wind farm operation and maintenance experience.
[0120] Furthermore, the entire RAG wind farm knowledge base architecture does not operate independently. Instead, it is deeply integrated with existing wind farm systems through standardized API interfaces. Based on a Representational State Transfer (RESTful) architecture, the interfaces are designed with standardized endpoint definitions and secure authentication. This allows for seamless integration with Supervisory Control and Data Acquisition (SCADA) systems and work order management systems. When the SCADA system detects a wind turbine anomaly, it can automatically invoke the diagnostic interface to generate an early warning work order, achieving a closed-loop system from fault detection to diagnosis and handling. Simultaneously, it supports modular deployment with center-edge collaboration. On the edge side, lightweight models are deployed on edge nodes using Docker containerization, TensorFlow Lite model compression, and model distillation technologies, enabling millisecond-level local inference. In the cloud center, a Kubernetes cluster is deployed, using a distributed framework for global training of large models. Federated learning is employed to protect privacy by keeping the data in place while the model moves. Over-the-Air (OTA) technology is used for differential upgrades to synchronize knowledge updates to edge nodes. Combined with Redis caching and a Flink stream processing engine to optimize real-time performance, the system ensures stable operation in the complex environment of the wind farm.
[0121] The following will continue to explain the RAG knowledge retrieval device and computer equipment based on the wind field knowledge base provided in any of the above embodiments of this application. The specific implementation process and the resulting technical effects are the same as those in the corresponding method embodiments. For the sake of brevity, the parts not mentioned in this embodiment can be referred to the corresponding content in the method embodiments.
[0122] Figure 9 This is a schematic diagram of the functional modules of a RAG knowledge retrieval device based on a wind field knowledge base, provided as an embodiment of this application. Figure 9 As shown, the RAG knowledge retrieval device 100 based on the wind field knowledge base includes: The parsing module 110 is used to parse the input query statement to obtain query intent information, which includes: query semantic information, corresponding query entity information and answer type label; The retrieval module 120 is used to perform vector retrieval in the vector database of the preset wind field knowledge base based on the query semantic information to obtain target knowledge fragments similar to the query semantic information. The retrieval module 120 is also used to perform knowledge graph retrieval on the graph database in the preset wind field knowledge base based on the query entity information and answer type tags, and obtain the target graph path information corresponding to the query entity information. The generation module 130 is used to generate a large model based on the target knowledge fragments and target map path information, using preset knowledge in the wind power field, and generate a structured knowledge report.
[0123] Optionally, the query semantic information includes: query text; the retrieval module 120 is also used to perform semantic encoding on the query text to obtain the corresponding query semantic feature vector; based on the query semantic feature vector, a vector retrieval is performed on the first vector database in the preset wind field knowledge base to obtain a first target knowledge fragment similar to the query semantic information.
[0124] Optionally, the query semantic information also includes: query time series data; the retrieval module 120 is further used to extract features from the query time series data to obtain the corresponding query time series feature vector; based on the query time series feature vector, vector retrieval is performed on the second vector database in the preset wind field knowledge base to obtain a second target knowledge fragment similar to the query semantic information.
[0125] Optionally, the device further includes: The acquisition module is used to acquire unstructured knowledge text and corresponding metadata in the wind power field. The splitting module is used to split unstructured knowledge text into knowledge fragments, resulting in multiple knowledge fragments; The encoding module is used to perform semantic encoding on each knowledge fragment to obtain the text vector corresponding to the knowledge fragment; The construction module is used to construct knowledge vectors corresponding to knowledge fragments based on metadata and text vectors corresponding to knowledge fragments; and to construct the first vector database based on the knowledge vectors corresponding to multiple knowledge fragments.
[0126] Optionally, the device further includes: The acquisition module is also used to acquire multiple historical time-series data in the wind power field; The extraction module is used to extract features from each historical time series data to obtain the time series feature vector corresponding to the historical time series data. The building module is also used to construct a second vector database based on the time series feature vectors corresponding to multiple historical time series data.
[0127] Optionally, the retrieval module 120 is also used to perform path traversal on the node relationships corresponding to the answer type labels in the graph database, starting from the target entity corresponding to the query entity information, to obtain the target graph path information.
[0128] Optionally, the device further includes: The extraction module is used to extract wind power knowledge entities from unstructured knowledge text in the wind power field, resulting in multiple knowledge entities. The analysis module is used to perform syntactic analysis on unstructured knowledge text based on knowledge entities to obtain entity relationships between knowledge entities and other knowledge entities; The building module is also used to construct a graph database based on multiple knowledge entities and the entity relationships between knowledge entities and other knowledge entities.
[0129] Optionally, the generation module 130 is also used to construct a fault tree analysis model based on the target map path information; obtain a processing step flowchart and standard clause reference information based on the target knowledge fragments; and generate a structured knowledge report using a preset knowledge generation model based on the fault tree analysis model, the processing step flowchart and the standard clause reference information.
[0130] Optionally, the device further includes: The detection module is used to perform conflict detection on target knowledge fragments and target graph path information, and obtain conflict detection results; The generation module 130 is also used to generate a structured knowledge report based on the target knowledge fragments, target graph path information and conflict detection results, using a preset knowledge generation model.
[0131] The above-described device is used to execute the method provided in the foregoing embodiments, and its implementation principle and technical effect are similar, so they will not be described again here.
[0132] These modules can be one or more integrated circuits configured to implement the above methods, such as one or more Application Specific Integrated Circuits (ASICs), one or more microprocessors, or one or more Field Programmable Gate Arrays (FPGAs). Alternatively, when a module is implemented using processing element scheduler code, the processing element can be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. Furthermore, these modules can be integrated together as a system-on-a-chip (SOC).
[0133] Figure 10 This is a schematic diagram of a computer device provided in an embodiment of this application. This computer device can be used for RAG knowledge retrieval based on a wind field knowledge base. Figure 10 As shown, the computer device includes: a processor 210, a storage medium 220, and a bus 230.
[0134] Storage medium 220 stores machine-readable instructions executable by processor 210. When the computer device is running, processor 210 communicates with storage medium 220 via bus 230, and processor 210 executes the machine-readable instructions to perform the steps of the above method embodiment. The specific implementation and technical effects are similar, and will not be described again here.
[0135] Optionally, this application also provides a storage medium 220, on which a computer program is stored. When the computer program is run by a processor, it executes the steps of the above-described method embodiments. The specific implementation and technical effects are similar, and will not be repeated here.
[0136] In the several embodiments provided by this invention, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.
[0137] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0138] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or in the form of hardware plus software functional units.
[0139] The integrated units implemented as software functional units described above can be stored in a computer-readable storage medium. These software functional units, stored in a storage medium, include several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute some steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0140] The above are merely specific embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A RAG knowledge retrieval method based on a wind field knowledge base, characterized in that, include: The input query statement is parsed to obtain query intent information, which includes: query semantic information, corresponding query entity information, and answer type label; Based on the query semantic information, a vector retrieval is performed on the vector database in the preset wind field knowledge base to obtain target knowledge fragments similar to the query semantic information; Based on the query entity information and the answer type label, a knowledge graph retrieval is performed on the graph database in the preset wind field knowledge base to obtain the target graph path information corresponding to the query entity information; Based on the target knowledge fragments and the target graph path information, a structured knowledge report is generated using a pre-defined knowledge generation model in the wind power field.
2. The method according to claim 1, characterized in that, The query semantic information includes: query text; the step of performing vector retrieval on a vector database in a preset wind field knowledge base based on the query semantic information to obtain target knowledge fragments similar to the query semantic information includes: The query text is semantically encoded to obtain the corresponding query semantic feature vector; Based on the query semantic feature vector, a vector retrieval is performed on the first vector database in the preset wind field knowledge base to obtain a first target knowledge fragment similar to the query semantic information.
3. The method according to claim 1, characterized in that, The query semantic information further includes: query time-series data; the step of performing vector retrieval on a vector database in a preset wind field knowledge base based on the query semantic information to obtain target knowledge fragments similar to the query semantic information includes: Feature extraction is performed on the query time series data to obtain the corresponding query time series feature vector; Based on the query time-series feature vector, a vector retrieval is performed on the second vector database in the preset wind field knowledge base to obtain a second target knowledge fragment similar to the query semantic information.
4. The method according to claim 2, characterized in that, Before performing vector retrieval on a first vector database in a preset wind field knowledge base based on the query semantic feature vector to obtain a first target knowledge fragment similar to the query semantic information, the method further includes: Obtain unstructured knowledge text and corresponding metadata in the wind power field; The unstructured knowledge text is split into knowledge fragments to obtain multiple knowledge fragments; Semantic encoding is performed on each knowledge fragment to obtain the text vector corresponding to the knowledge fragment; Based on the metadata and the text vector corresponding to the knowledge fragment, construct the knowledge vector corresponding to the knowledge fragment; The first vector database is constructed based on the knowledge vectors corresponding to the multiple knowledge fragments.
5. The method according to claim 3, characterized in that, Before performing vector retrieval on the second vector database in the preset wind field knowledge base based on the query time-series feature vector to obtain a second target knowledge fragment similar to the query semantic information, the method further includes: Acquire multiple historical time-series data in the wind power sector; Feature extraction is performed on each of the historical time series data to obtain the time series feature vector corresponding to the historical time series data; A second vector database is constructed based on the time-series feature vectors corresponding to multiple historical time-series data.
6. The method according to claim 1, characterized in that, The step of performing a knowledge graph retrieval on the graph database in the preset wind field knowledge base based on the query entity information and the answer type tag to obtain the target graph path information corresponding to the query entity information includes: Starting from the target entity corresponding to the query entity information in the graph database, the path traversal is performed on the node relationships corresponding to the answer type label to obtain the target graph path information.
7. The method according to claim 1, characterized in that, Before performing a knowledge graph retrieval on the graph database in the preset wind field knowledge base based on the query entity information and the answer type tag to obtain the target graph path information corresponding to the query entity information, the method further includes: Wind power knowledge entities are extracted from the unstructured knowledge text in the wind power field to obtain multiple knowledge entities; Based on the knowledge entities, perform syntactic analysis on the unstructured knowledge text to obtain the entity relationships between the knowledge entities and other knowledge entities; The graph database is constructed based on the multiple knowledge entities and the entity relationships between the knowledge entities and other knowledge entities.
8. The method according to claim 1, characterized in that, The step of generating a structured knowledge report based on the target knowledge fragment and the target graph path information, using a pre-set knowledge generation model in the wind power field, includes: Based on the target map path information, a fault tree analysis model is constructed; Based on the target knowledge fragment, obtain the processing step flowchart and standard clause reference information; Based on the fault tree analysis model, the processing step flowchart, and the standard clause reference information, the structured knowledge report is generated using the preset knowledge generation model.
9. The method according to claim 1, characterized in that, Before generating a structured knowledge report using a pre-defined knowledge generation model in the wind power field based on the target knowledge fragment and the target graph path information, the method further includes: Conflict detection is performed on the target knowledge fragment and the target graph path information to obtain conflict detection results; The step of generating a structured knowledge report based on the target knowledge fragment and the target graph path information, using a pre-set knowledge generation model in the wind power field, includes: Based on the target knowledge fragment, the target graph path information, and the conflict detection results, the structured knowledge report is generated using the preset knowledge generation model.
10. A computer device, characterized in that, include: The computer device includes a processor, a storage medium, and a bus. The storage medium stores program instructions executable by the processor. When the computer device is running, the processor communicates with the storage medium via the bus. The processor executes the program instructions to perform the steps of the RAG knowledge retrieval method based on a wind field knowledge base as described in any one of claims 1 to 9.