Power small sample entity extraction method and system based on composite annotation joint training
By employing composite annotation and adversarial training methods, the problems of error propagation and small-sample overfitting in entity recognition in the power field are solved, enabling high-precision extraction of nested entities and construction of a power knowledge graph.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- STATE GRID JIANGXI ELECTRIC POWER CO LTD RES INST
- Filing Date
- 2026-05-19
- Publication Date
- 2026-06-16
AI Technical Summary
Existing entity recognition technologies suffer from error propagation problems in the power vertical field, cannot effectively handle the names of power equipment with multi-layered nested structures, and deep learning models are prone to overfitting under small sample conditions. General pre-trained models lack semantic understanding of power regulation procedures, leading to safety hazards.
A composite annotation joint training method is adopted to fuse entity type and relation type into composite label, construct semantic adversarial hard negative samples, and utilize a domain adaptive feature encoder and adversarial training dual-branch joint decoding architecture to achieve synchronous extraction of entities and relations.
It eliminated error propagation, improved the ability to identify nested entities and determine boundaries, enhanced the robustness and accuracy of the model under small sample conditions, and constructed an electricity knowledge graph.
Smart Images

Figure CN122221189A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of power artificial intelligence and natural language processing technology, and particularly relates to a method and system for extracting small-sample entities in the power industry based on composite annotation joint training. Background Technology
[0002] With the digital transformation of new power systems, power grid dispatching and operation and maintenance have accumulated massive amounts of text data. Accurately extracting equipment names, operational actions, voltage levels, and their interrelationships is fundamental to building a power grid knowledge graph and implementing intelligent verification to prevent misoperation. However, existing entity recognition technologies face the following challenges in the application of power vertical industries: Traditional pipelined approaches typically employ a "first entity identification, then relation extraction" method, which suffers from a severe error propagation problem: if the entity boundary identification in the first stage is incorrect, the relation extraction in the second stage will inevitably fail. Furthermore, power equipment names often exhibit multi-level nested structures, and traditional BIO sequence labeling methods, which assume each character belongs to only one label, cannot effectively handle such nested structures.
[0003] Under small sample conditions, deep learning models are prone to overfitting to specific noise in the training set, resulting in poor generalization ability. Existing data augmentation methods can alleviate this problem at the sample level, but lack effective adversarial perturbation mechanisms at the model parameter level.
[0004] The general pre-trained model lacks a deep semantic understanding of power regulation procedures and professional terminology, and is prone to misreporting "prohibition instructions" as "execution instructions", which poses a significant safety hazard. Summary of the Invention
[0005] The purpose of this invention is to overcome the shortcomings of the prior art and provide a method and system for extracting small-sample entities in the power industry based on composite annotation joint training. Through composite annotation strategy, domain adaptive feature encoder and adversarial training dual-branch joint decoding architecture, the synchronous extraction of entities and relations in the small-sample power industry scenario is achieved.
[0006] In a first aspect, the present invention provides a method for extracting small-sample entities in the power sector based on joint training with composite annotation, comprising: The entity types and relation types to be extracted from the power text are merged into composite tags. The composite tags are then used to label the entity span in the original power text to obtain positive samples. Based on the non-triggering statement pattern in the power safety regulations, a text fragment with literal similarity but opposite semantics to the positive sample is constructed, and the text fragment is labeled with an empty label as a semantic adversarial difficult negative sample; The positive samples and semantically adversarial difficult negative samples are mixed to obtain an adversarial dataset; Each sample in the adversarial dataset is input into a preset domain adaptive feature encoder, and the domain adaptive feature encoder outputs a fused feature vector sequence. The domain adaptive feature encoder is an encoder obtained by cascading bidirectional gated recurrent units using a BERT model pre-trained on a power regulation corpus as a base. The fused feature vector sequence is input into a dual-branch joint decoding architecture based on adversarial training to train an entity extraction model. During the training process, the multi-label classification loss is calculated first through the main task branch, and the contrast loss is calculated through the auxiliary task branch. Then, the adversarial loss of the main task branch is recalculated after adding adversarial perturbation. The model parameters are updated after weighted summation of the multi-label classification loss, contrast loss, and adversarial loss. The acquired real-time power text is input into the entity extraction model, and the entity extraction model outputs a power knowledge graph corresponding to the real-time power text.
[0007] Secondly, the present invention provides a small-sample entity extraction system for electricity based on composite annotation joint training, comprising: The annotation module is configured to merge the entity type and relation type to be extracted in the power text into a composite label, and use the composite label to annotate the entity span in the original power text to obtain positive samples; The construction module is configured to construct text fragments that are literally similar to positive samples but semantically opposite, based on the non-triggering statement patterns in the power safety regulations, and to label the text fragments with empty labels as semantic adversarial hard negative samples; The mixing module is configured to mix the positive samples and semantically adversarial difficult negative samples to obtain an adversarial dataset; The extraction module is configured to input each sample in the adversarial dataset into a preset domain adaptive feature encoder, and the domain adaptive feature encoder outputs a fused feature vector sequence. The domain adaptive feature encoder is an encoder obtained by cascading bidirectional gated recurrent units using a BERT model pre-trained on a power regulation corpus as a base. The training module is configured to input the fused feature vector sequence into a dual-branch joint decoding architecture based on adversarial training to train an entity extraction model. During the training process, the multi-label classification loss is first calculated through the main task branch, and the contrast loss is calculated through the auxiliary task branch. Then, the adversarial loss of the main task branch is recalculated after adding adversarial perturbation. The model parameters are updated after weighted summation of the multi-label classification loss, contrast loss, and adversarial loss. The output module is configured to input the acquired real-time power text into the entity extraction model, and the entity extraction model outputs a power knowledge graph corresponding to the real-time power text.
[0008] Thirdly, an electronic device is provided, comprising: at least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the steps of the power small sample entity extraction method based on composite annotation joint training according to any embodiment of the present invention.
[0009] Fourthly, the present invention also provides a computer-readable storage medium having a computer program stored thereon, wherein when the program instructions are executed by a processor, the processor performs the steps of the power small sample entity extraction method based on composite annotation joint training according to any embodiment of the present invention.
[0010] This application presents a method and system for small-sample entity extraction in the power industry based on joint training with composite annotations. It fuses entity types and relation types into composite labels to annotate positive samples; constructs semantically adversarial hard negative samples based on non-triggering statement patterns in power safety regulations, and mixes them to obtain an adversarial dataset; utilizes a cascaded bidirectional gated recurrent unit of a BERT model pre-trained on power corpus to construct a domain-adaptive feature encoder to extract fused feature vectors; constructs a dual-branch joint decoding architecture based on adversarial training, with the main task branch using a global pointer network with rotational position encoding to decode nested entities in parallel, and the auxiliary task branch using span contrastive learning to improve boundary accuracy and introducing fast gradient adversarial perturbations; inputs real-time power text into the trained model, outputs entities and relations, and stores them in a graph database to construct a power knowledge graph; eliminates error propagation, improves the ability to identify nested entities and distinguish boundaries, and exhibits strong robustness under small-sample conditions. Attached Figure Description
[0011] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0012] Figure 1 A flowchart of a method for extracting small-sample entities in the power industry based on joint training with composite annotations, provided in an embodiment of the present invention; Figure 2 This is a structural block diagram of a small-sample entity extraction system for electricity based on joint training with composite annotation, provided in an embodiment of the present invention. Figure 3 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation
[0013] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0014] Please see Figure 1 The diagram shows a flowchart of a small-sample entity extraction method for electricity based on joint training with composite annotations, as presented in this application.
[0015] like Figure 1 As shown, the power small sample entity extraction method based on composite annotation joint training specifically includes the following steps: Step S101: The entity types and relation types to be extracted in the power text are merged into composite tags. The composite tags are used to annotate the entity span in the original power text to obtain positive samples.
[0016] In this step, we set an entity type set E and a relation type set R, and construct a composite label set C={(e,r0)|e∈E,r∈R}, where each composite label c∈C represents that the target text fragment has both an entity type e and a relation type r0. For each entity span in the original power text, based on the entity type to which the entity span belongs and the relational role that the entity span plays in the context, the corresponding composite label is selected from the composite label set C to label the entity span, thus obtaining positive samples.
[0017] In one specific embodiment, based on the business characteristics of power dispatch instructions and operation ticket texts, a set of entity types E and a set of relation types R are pre-defined.
[0018] For example, the entity type set E may include, but is not limited to, the following three categories: Equipment names: such as "disconnecting switch", "circuit breaker", "10kV busbar", "121 switch", etc.
[0019] Operational actions: such as "close", "open", "check", "put in", "exit", etc.
[0020] Voltage levels: such as "10kV", "35kV", "110kV", etc.
[0021] The set of relation types R can include, but is not limited to, the following three categories: Operation association: Indicates the execution relationship between an operation action and the operating device, such as the "close" operation being associated with "isolating switch".
[0022] Loop connection: This refers to the connection relationship between two lines or buses involved in a loop closing operation.
[0023] Decoupling association: Indicates the device separation relationship involved in the decoupling operation.
[0024] Construct a collection of compound tags Perform a Cartesian product between the entity type set E and the relation type set R to generate a composite label set C. Each composite label represents a text fragment that simultaneously possesses entity type e and relation type r0. For example: (Device Name, Operation Association) → Composite Tag EQ-OP; (Operation Action, Operation Association) → Composite Tag AC-OP; (Voltage level, loop connection) → Composite label VOL-HH; With this composite label, the model only needs to identify a span and its corresponding composite label to simultaneously obtain the entity type of that span and its relational role in the knowledge graph, without the need for subsequent separate relational classification.
[0025] For each entity span in the original power text, the annotator performs the following operations based on the actual semantics of that span: (1) Determine the entity type e∈E for this span. For example, the entity type of the text fragment “10kV bus” is “equipment name”.
[0026] (2) Analyze the relationship type r0∈R that the span undertakes in the current context. For example, in the sentence “close the 10kV bus disconnect switch”, “10kV bus” and the operation “close” constitute an “operation association” relationship, so its relationship role is “operation association”.
[0027] (3) Select the corresponding composite label c=(e,r0) from the composite label set C and assign the label to the span.
[0028] Example of annotation: Original power dispatch text: "Dispatch instruction: Close switch 121 on 10kV Line A". Annotation process: To locate the entity span "closed": the entity type is "operation action", the relationship role is "operation association" (because it is the operation being performed), and the composite tag (operation action, operation association) is selected, which can be abbreviated as AC-OP.
[0029] Locate the entity span "10kV Line A 121 Switch": the entity type is "Equipment Name", the relationship role is "Operation Association" (because it is the object being operated on), select the composite label (Equipment Name, Operation Association), abbreviated as EQ-OP.
[0030] Locating the entity span "10kV": the entity type is "voltage level", and the relationship role is "loop association" (if the context involves a loop association, otherwise it can be empty). Here, it is assumed to be (voltage level, loop association), abbreviated as VOL-HH.
[0031] After the above annotations are completed, each annotated text sample constitutes a positive sample. A positive sample contains the text character sequence, the start and end positions of each entity span, and the corresponding composite label. These positive samples will serve as the positive example portion of the adversarial dataset in step S103.
[0032] Step S102: Based on the non-triggering statement pattern in the power safety regulations, construct a text fragment that is literally similar to the positive sample but semantically opposite, and label the text fragment with an empty label as a semantic adversarial difficult negative sample.
[0033] In this step, three common non-trigger statement patterns are summarized from the "Electric Power Safety Work Regulations" and related dispatching regulations. These templates are characterized by the fact that although they contain high-frequency operational verbs and equipment nouns from the positive samples, the overall semantics represent prohibition, planning, or status descriptions, rather than actual operational instructions.
[0034] The first type is the prohibited template, with the sentence structure: "It is strictly forbidden to perform [Action] operation on [Entity] in violation of regulations." Here, [Entity] is a placeholder for the device name, and [Action] is a placeholder for the operation action. This template is used to express prohibited operations, such as "It is strictly forbidden to open the disconnect switch in violation of regulations." The second type is the planning template, whose sentence structure is: "It is planned to conduct an [Entity][Action] anti-accident drill next week." This template is used to express simulated operations in future plans, rather than actual operational instructions, such as "It is planned to conduct a 10kV busbar loop-closing anti-accident drill next week." The third type is the status description template, whose sentence structure is: "The monitoring backend shows that [Entity] is in the [Action] position." This template is used to express the current operating status of the device, rather than a command operation, such as "The monitoring backend shows that the circuit breaker is in the closed position." From the positive samples marked in step S101, the most frequently occurring operation actions and typical equipment names are statistically analyzed. Typical high-frequency operation actions include "close", "open", "loop close", and "disconnect", etc.; typical equipment names include "disconnect switch", "circuit breaker", "10kV busbar", "121 switch", etc.
[0035] Fill the selected entity words into the corresponding placeholders in the template to generate multiple text fragments. Each fragment literally contains keywords for the actual operation, but due to the template's limitations (prohibition, plan, status description), its overall semantics do not belong to any actual operation instructions.
[0036] The following is an example of how to construct it: Based on the prohibited template, replacing [Entity] with "Isolating Switch" and [Action] with "Open" yields the text: "It is strictly prohibited to open the isolating switch in violation of regulations." Then, replacing [Entity] with "10kV Busbar" and [Action] with "Close Loop" yields the text: "It is strictly prohibited to close the 10kV busbar in violation of regulations." Based on the planning template, replace [Entity] with "Circuit Breaker" and [Action] with "Close", resulting in the text: "A circuit breaker closing anti-accident drill is scheduled to be conducted next week." Then replace [Entity] with "121 Switch" and [Action] with "Open", resulting in the text: "A 121 switch opening anti-accident drill is scheduled to be conducted next week." Based on the status description template, replacing [Entity] with "Circuit Breaker" and [Action] with "Close" yields the text: "The monitoring backend shows that the circuit breaker is in the closed position." Then, replacing [Entity] with "Disconnect Switch" and [Action] with "Open Position" yields the text: "The monitoring backend shows that the disconnect switch is in the open position." For all the text fragments generated above, label them as empty (usually denoted as "O" or empty set), indicating that the text does not contain any entities or relations that need to be extracted. When the model sees these samples during training, it should output "No entity span".
[0037] These text fragments labeled as empty are called "semantic adversarial hard negative samples." Their "difficulty" lies in the fact that they share the same keywords as positive samples (such as "isolating switch," "open"), and if the model relies solely on word-level features, it will misclassify them as entities. Only by combining the syntactic structure and context of the entire sentence (such as preceding signals like "strictly prohibited from violating regulations," "proposed to," and "monitoring background display") can these fragments be correctly identified as not constituting operational instructions. By adding these negative samples to the training set, the model is forced to learn to distinguish between "operational instructions" and "non-trigger descriptions," thereby improving the discrimination safety in real-world power-related texts.
[0038] Step S103: Mix the positive samples and the semantic adversarial difficult negative samples to obtain the adversarial dataset.
[0039] Step S104: Input each sample in the adversarial dataset into a preset domain adaptive feature encoder. The domain adaptive feature encoder outputs a fused feature vector sequence. The domain adaptive feature encoder is an encoder obtained by cascading bidirectional gated recurrent units using a BERT model pre-trained on a power regulation corpus as a base.
[0040] In this step, we collect the power grid dispatch control management regulations, power system safety work regulations, substation operation management specifications, and de-sensitized historical dispatch instruction tickets and maintenance logs to build a dedicated corpus for the power industry. Based on the general pre-trained BERT model, the masked language model task is further trained using the power field-specific corpus to obtain the BERT model pre-trained with the power control procedure corpus. The output of the BERT model pre-trained on the power regulation procedure corpus is cascaded with a bidirectional gated recurrent unit to form a domain-adaptive feature encoder. The text in the adversarial dataset is input into the domain adaptive feature encoder, which then extracts the global context semantic vector through the BERT model, and then extracts the local temporal feature vector through the bidirectional gated recurrent unit. The global context semantic vector and the local temporal feature vector are fused to output a fused feature vector sequence.
[0041] In one specific embodiment, firstly, professional technical documents related to power regulation, operation and maintenance, and safety procedures, as well as text data from actual production, are collected. Specifically, this includes: The "Distribution Network Dispatch and Control Management Regulations" specifies the standard terminology and procedures for distribution network dispatch operations.
[0042] The "Safety Work Regulations for Power Systems" includes various safety constraints, prohibitions, and operating procedures.
[0043] The "Substation Operation and Management Standard" covers substation equipment naming, status switching, and operation records.
[0044] Desensitized historical dispatch instruction tickets: the actual dispatch instruction text, such as "Close the 10kV A line 121 switch".
[0045] Desensitized maintenance log: Equipment maintenance, operation descriptions, etc. recorded by maintenance personnel.
[0046] These documents and logs are then cleaned (irrelevant symbols are removed, and a unified encoding format is applied), and merged to form a corpus specifically for the power industry. The size of this corpus can be determined according to actual needs, but it should typically contain millions to tens of millions of words to ensure that the model can fully learn the vocabulary distribution and syntactic patterns in the power industry.
[0047] Using the trained parameters of a general pre-trained BERT model (such as BERT-base-Chinese) as initial weights, the Masked Language Model (MLM) task is then performed on a pre-constructed corpus specifically for the power industry.
[0048] The MLM task operates as follows: For each sentence in the corpus, 15% of character positions are randomly selected. There is an 80% probability that these positions will be replaced with the special marker [MASK], a 10% probability that they will be replaced with a random character, and a 10% probability that they will remain unchanged. The model needs to predict the original character at the occluded position based on the context. The training objective is to minimize the cross-entropy loss between the predicted and actual characters.
[0049] By iteratively training the model on the aforementioned power corpus for several rounds (e.g., 3-5 epochs), the model's attention weights and word embeddings are gradually adjusted. This internalizes prior knowledge from the power sector—such as equipment state transition logic (the correspondence between "separate position" and "closed position"), operational safety constraints (the co-occurrence pattern of "prohibited" and operational verbs), and contextual patterns of technical terms ("loop closure" usually co-occurs with "10kV bus")—into the parameters of each layer of the Transformer. The model after this adaptive pre-training is called the distribution network-specific BERT model, abbreviated as DN-BERT.
[0050] Compared to the general BERT, DN-BERT has significant advantages in small-sample power tasks: the model no longer needs to learn terms such as "circuit breaker" and "closing" from scratch, but has domain common sense, thus greatly reducing the dependence on the amount of labeled data.
[0051] Although BERT can capture global context semantics, power dispatch instructions are often short imperative sentences (such as "check that the 10kV A line 121 switch is in the open position"), which have strong local timing dependencies. In order to further strengthen the logical relationship of the sequence of actions in the instructions, this invention cascades a bidirectional gated loop unit (BiGRU) at the output of the DN-BERT.
[0052] Specifically, the sequence of context semantic vectors output by DN-BERT for each character position of the input text is denoted as E=[e1,e2,…,e…]. n ], where e n Let n be the context semantic vector at the nth character position, where n is the sequence length. This sequence is used as input to a BiGRU. A BiGRU consists of a forward GRU and a backward GRU: The forward GRU processes the sequence from left to right, capturing the temporal logic of the operation flow (e.g., "check" before "close"), and generating a forward hidden state sequence.
[0053] The backward GRU processes the sequence from right to left, capturing the modification relationship between the post-modifier and the pre-modifier (e.g., "in the correct position" modifies "121 switch"), and generates a backward hidden state sequence.
[0054] Then, the forward and backward hidden states at the same position are concatenated (or summed, averaged) to obtain the final fused feature vector for that position. The fused feature vectors at all positions constitute a sequence of fused feature vectors.
[0055] The BiGRU is equivalent to a "timing filter" that can enhance the sequential characteristics of actions in instructions, making the encoder more robust to instructions with reversed word order or complex modifications.
[0056] Input each text sample from the adversarial dataset into the above cascade encoder: First, after the text is segmented (or directly into characters), each character is mapped to a corresponding character embedding vector. Combined with position encoding and segment encoding, this forms the input to DN-BERT.
[0057] DN-BERT is computed through multiple Transformer encoders, outputting a contextual semantic vector for each character position. .
[0058] Then, the BiGRU receives the context semantic vector. The forward and backward hidden states are computed in parallel, and the concatenated state is output as a fused feature vector. .
[0059] Finally, the encoder outputs a shape of The fusion feature vector matrix (where The feature dimension is typically 768 or higher, which is the sequence of fused feature vectors of the text. This sequence contains both global semantic information (from DN-BERT) and local temporal information (from BiGRU), providing rich and accurate feature representations for the subsequent dual-branch joint decoding architecture.
[0060] Through the above steps, the domain-adaptive feature encoder successfully transforms the original power text into a vector form that can be efficiently processed by computers, and incorporates professional knowledge in the power field, laying a solid foundation for entity extraction in small sample scenarios.
[0061] Step S105: Input the fused feature vector sequence into the adversarial training-based dual-branch joint decoding architecture to train the entity extraction model. During the training process, the multi-label classification loss is calculated through the main task branch, the contrast loss is calculated through the auxiliary task branch, the adversarial perturbation is added, and the adversarial loss of the main task branch is recalculated. The multi-label classification loss, contrast loss and adversarial loss are weighted and summed to update the model parameters.
[0062] In this step, the fused feature vector sequence is input into the main task branch and the auxiliary task branch respectively. The global pointer network in the main task branch is used to calculate the first and last boundary scoring matrix of the composite label, and then the multi-label classification loss is calculated. The anchor span, positive sample span and negative sample span are constructed through the auxiliary task branch to calculate the contrast loss. The total loss is calculated based on the multi-label classification loss and the contrastive loss, and the total loss is backpropagated to calculate the gradient of the total loss with respect to the character embedding sequence matrix corresponding to the power text. The character embedding sequence matrix is a matrix formed by stacking each character in the power text into a vector according to the character position order. The character embedding sequence matrix serves as the input layer of the domain adaptive feature encoder. The adversarial perturbation amount is calculated based on the gradient, and then superimposed on the character embedding sequence matrix to generate adversarial samples. The adversarial sample is then input again into the domain adaptive feature encoder and the dual-branch joint decoding architecture, and the adversarial loss under the adversarial state is calculated only through the main task branch; The final joint training loss is calculated based on the adversarial loss and the total loss. The trainable parameters of the domain adaptive feature encoder and the dual-branch joint decoding architecture are then updated using an optimizer to obtain the entity extraction model.
[0063] It should be noted that the multi-label classification loss is calculated by using the global pointer network in the main task branch to compute the first and last boundary scoring matrix of the composite label, and then calculating the multi-label classification loss, including: Let the fused feature vector sequence be... ,in For sequence length, For feature dimension, For the set of real numbers, This is the fused feature vector at the nth character position in the fused feature vector sequence; For the Composite tags, connected through a fully connected layer Mapped to query vector sequence and key vector sequence ,in, , , For the first The class of compound tags in the first Query vectors for each position, For the first The class of compound tags in the first A key vector at each position; The query vector and key vector are rotated using rotational position encoding, specifically including: for position index... The vector, Treating each pair of dimensions of a vector as a group, let the first dimension be... Group corresponding dimension and , The rotation transformation formula is: , , In the formula, and These represent the query vectors before the rotation transformation. The peacekeeping The numerical value of the dimension. and These represent the query vectors after rotation transformation. The peacekeeping The numerical value of the dimension. and These represent the key vectors before the rotation transformation. The peacekeeping The numerical value of the dimension. and These represent the key vectors after rotation transformation. The peacekeeping The numerical value of the dimension. For the first The rotation frequency of the subspace. ; Based on the query vector and key vector after rotation transformation, calculate any starting position in the text sequence. and termination position The candidate span constitutes the first Scoring values for composite tags The expression is: , In the formula, The starting position after rotation transformation The query vector at that location, The termination position after rotation transformation The key vector at that location, It is the transpose symbol; The multi-label classification loss is calculated using the multi-label cross-entropy loss function. The expression is: , In the formula, This represents the total number of composite tags in the set of composite tags. For the first The set of real entity spans for composite tags, i.e., the actual positive sample spans existing in the sample. For the first The set of negative sample spans for composite labels, i.e., negative sample spans that do not exist in the sample.
[0064] Furthermore, by constructing anchor span, positive sample span, and negative sample span through auxiliary task branches, the contrastive loss is calculated, including: The true entity spans are obtained from the positive samples of the adversarial dataset, and each true entity span is denoted as... ,in, and These represent the start and end indexes of the real entity span in the text sequence, respectively, satisfying... The span of the real entity is defined as the anchor point span, and the feature vector of the anchor point span is denoted as the anchor point feature vector. The anchor point feature vector is obtained by average pooling the fused feature vector within the span of the anchor point, as expressed in the following expression: , In the formula, For the first in the text sequence The fused feature vector of each character position. This indicates the length of the anchor point span; Construct a set of positive samples Specifically, this includes feature vectors of other real entity spans in the current training batch that have the same composite label category as the anchor span, and the feature vectors of the anchor span. The enhanced feature vector is obtained by forward propagation through two different random deactivation masks. The random deactivation mask refers to the mask matrix that randomly sets the output of some neurons to zero with a certain probability during the training process. Construct a negative sample set This includes feature vectors of entity word fragments extracted from semantically adversarial hard negative samples and boundary-offset negative samples. The boundary-offset negative samples are constructed as follows: Set the sliding window radius Enumerate all that satisfy and candidate span , and These represent the start and end positions of the candidate span, respectively; From the candidate spans, spans with an intersection-union ratio greater than 0 and less than a preset threshold with the anchor point span are selected as negative boundary offset samples, wherein the intersection-union ratio is the ratio of the intersection length to the union length of the two spans; The contrastive loss is calculated using the information-noise contrastive estimation loss function, and its expression is: , In the formula, It is an exponential function with the natural constant e as its base. The cosine similarity function is used. This is a temperature coefficient used to adjust the model's focus on difficult samples; its value ranges from 0.05 to 0.1. For positive sample feature vectors, These are the feature vectors of negative samples.
[0065] To improve the model's generalization ability and robustness under small sample conditions, this invention introduces a fast gradient adversarial perturbation mechanism during training. The specific implementation process is as follows: Forward propagation of original samples: Input the fused feature vector sequence of the current batch into the main task branch and the auxiliary task branch, and calculate the fusion feature vectors of the current batch into the main task branch and the auxiliary task branch respectively. and .
[0066] Calculate the original total loss: ,in This is a balancing coefficient used to adjust the weight of auxiliary tasks.
[0067] Backpropagation and gradient calculation: Perform backpropagation and calculate the loss function on the character embedding sequence matrix. gradient The character embedding sequence matrix is a matrix formed by mapping each character in the original power text into a vector and stacking them according to their positions. It serves as the input layer of the domain-adaptive feature encoder.
[0068] Generate adversarial perturbation quantities: Calculate adversarial perturbation quantities based on gradients. ,in The preset perturbation coefficient (usually between 0.1 and 1.0) is used to control the perturbation amplitude; the denominator is the L2 norm of the gradient, used for normalization.
[0069] Generate adversarial examples: The perturbation is superimposed on the original character embedding sequence matrix to obtain adversarial examples. .
[0070] Adversarial example forward propagation: forward propagation of adversarial examples The domain-adaptive feature encoder and the two-branch joint decoding architecture are input again, but the adversarial loss in the adversarial state is calculated only through the main task branch (without going through the auxiliary task branch). . The calculation method and It's exactly the same, except the input is replaced with an adversarial example.
[0071] Joint Loss and Parameter Update: Calculating the final joint training loss ,in To combat the loss weight coefficients, an optimizer (such as AdamW) is used to update all trainable parameters of the domain-adaptive feature encoder and the two-branch joint decoding architecture using gradient descent. Before updating the parameters, the character embedding layer parameters need to be restored to their original values. To avoid the accumulation of disturbances.
[0072] Iterative training: Repeat the above process, performing one original forward propagation and one adversarial forward propagation in each training batch, until the model's performance on the validation set no longer improves. The final saved model is the trained entity extraction model.
[0073] Through the aforementioned joint training mechanism, fast gradient adversarial perturbation ensures the robustness of the parameters, the global pointer network guarantees the decoding capability of nested entities, and span contrastive learning ensures the accuracy of boundary features. These three elements complement each other, jointly achieving the goal of high-precision entity extraction in the small-sample scenario of the power industry.
[0074] During the model inference phase, the auxiliary task branch (span contrastive learning) and the fast gradient adversarial perturbation mechanism are removed from the trained entity extraction model, retaining only the trained domain adaptive feature encoder and the main task branch (global pointer network). Real-time acquired electricity text is input into this simplified model, and the scoring matrix of all composite labels can be calculated through a single forward propagation. Entities and their composite labels in the text are directly parsed based on preset thresholds, eliminating the need for complex post-processing logic. The extraction results are stored in the Neo4j graph database, with entities as nodes and the relationship types carried in the composite labels as edges between nodes, ultimately constructing a time-series electricity knowledge graph.
[0075] Step S106: Input the acquired real-time power text into the entity extraction model, and the entity extraction model outputs a power knowledge graph corresponding to the real-time power text.
[0076] In summary, the method in this application fuses entity type and relation type into a composite label to annotate positive samples; constructs semantically adversarial hard negative samples based on non-triggering statement patterns in power safety regulations, and mixes them to obtain an adversarial dataset; utilizes a cascaded bidirectional gated recurrent unit of a BERT model pre-trained on power corpus to construct a domain-adaptive feature encoder to extract fused feature vectors; constructs a dual-branch joint decoding architecture based on adversarial training, with the main task branch using a global pointer network with rotational position encoding to decode nested entities in parallel, and the auxiliary task branch using span contrastive learning to improve boundary accuracy and introducing fast gradient adversarial perturbation; inputs real-time power text into the trained model, outputs entities and relations and stores them in a graph database to construct a power knowledge graph; eliminates error propagation, improves the ability to identify nested entities and distinguish boundaries, and has strong robustness under small sample conditions.
[0077] Please see Figure 2 The diagram shows a structural block diagram of a small-sample entity extraction system for electricity based on composite annotation joint training, as proposed in this application.
[0078] like Figure 2 As shown, the power small sample entity extraction system 200 includes a labeling module 210, a construction module 220, a mixing module 230, an extraction module 240, a training module 250, and an output module 260.
[0079] The annotation module 210 is configured to fuse the entity types and relation types to be extracted from the power text into composite labels, and use the composite labels to annotate the entity spans in the original power text to obtain positive samples; the construction module 220 is configured to construct text fragments that are literally similar to the positive samples but semantically opposite, based on the non-triggering statement patterns in the power safety regulations, and annotate the text fragments with empty labels as semantic adversarial hard negative samples; the mixing module 230 is configured to mix the positive samples and the semantic adversarial hard negative samples to obtain an adversarial dataset; the extraction module 240 is configured to input each sample in the adversarial dataset into a preset domain adaptive feature encoder, and the domain adaptive feature encoder outputs a fused feature vector sequence, wherein the domain adaptive feature encoder... The adaptive feature encoder is an encoder obtained by cascading bidirectional gated recurrent units using a BERT model pre-trained on a power regulation procedure corpus as a base. The training module 250 is configured to train an entity extraction model by inputting the fused feature vector sequence into a two-branch joint decoding architecture based on adversarial training. During training, the multi-label classification loss is calculated first through the main task branch, and the contrastive loss is calculated through the auxiliary task branch. Then, the adversarial loss of the main task branch is recalculated after adding adversarial perturbation. The multi-label classification loss, contrastive loss, and adversarial loss are weighted and summed to update the model parameters. The output module 260 is configured to input the acquired real-time power text into the entity extraction model, and the entity extraction model outputs a power knowledge graph corresponding to the real-time power text.
[0080] It should be understood that Figure 2 The modules and references described in the document Figure 1 The steps described in the text correspond to those in the method described above. Therefore, the operations, features, and corresponding technical effects described above also apply to the method described in the text. Figure 2 The various modules in the document will not be described in detail here.
[0081] In other embodiments, the present invention also provides a computer-readable storage medium having a computer program stored thereon, wherein when the program instructions are executed by a processor, the processor performs the power small sample entity extraction method based on composite annotation joint training in any of the above method embodiments. In one embodiment, the computer-readable storage medium of the present invention stores computer-executable instructions, which are configured as follows: The entity types and relation types to be extracted from the power text are merged into composite tags. The composite tags are then used to label the entity span in the original power text to obtain positive samples. Based on the non-triggering statement pattern in the power safety regulations, a text fragment with literal similarity but opposite semantics to the positive sample is constructed, and the text fragment is labeled with an empty label as a semantic adversarial difficult negative sample; The positive samples and semantically adversarial difficult negative samples are mixed to obtain an adversarial dataset; Each sample in the adversarial dataset is input into a preset domain adaptive feature encoder, and the domain adaptive feature encoder outputs a fused feature vector sequence. The domain adaptive feature encoder is an encoder obtained by cascading bidirectional gated recurrent units using a BERT model pre-trained on a power regulation corpus as a base. The fused feature vector sequence is input into a dual-branch joint decoding architecture based on adversarial training to train an entity extraction model. During the training process, the multi-label classification loss is calculated first through the main task branch, and the contrast loss is calculated through the auxiliary task branch. Then, the adversarial loss of the main task branch is recalculated after adding adversarial perturbation. The model parameters are updated after weighted summation of the multi-label classification loss, contrast loss, and adversarial loss. The acquired real-time power text is input into the entity extraction model, and the entity extraction model outputs a power knowledge graph corresponding to the real-time power text.
[0082] Computer-readable storage media may include a stored program area and a stored data area, wherein the stored program area may store an operating system and an application program required for at least one function; the stored data area may store data created based on the use of the power small-sample entity extraction system trained on composite annotations, etc. Furthermore, the computer-readable storage medium may include high-speed random access memory, and may also include memory, such as at least one disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, the computer-readable storage medium may optionally include memory remotely located relative to a processor, which can be connected to the power small-sample entity extraction system trained on composite annotations via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
[0083] Figure 3 This is a schematic diagram of the structure of the electronic device provided in the embodiment of the present invention, such as... Figure 3 As shown, the device includes a processor 310 and a memory 320. The electronic device may also include an input device 330 and an output device 340. The processor 310, memory 320, input device 330, and output device 340 can be connected via a bus or other means. Figure 3Taking a bus connection as an example, the memory 320 is the computer-readable storage medium described above. The processor 310 executes various server functions and data processing by running non-volatile software programs, instructions, and modules stored in the memory 320, thereby implementing the power small-sample entity extraction method based on composite annotation joint training described in the above method embodiment. The input device 330 can receive input digital or character information and generate key signal inputs related to user settings and function control of the power small-sample entity extraction system based on composite annotation joint training. The output device 340 may include a display screen or other display device.
[0084] The aforementioned electronic device can execute the method provided in the embodiments of the present invention, and has the corresponding functional modules and beneficial effects for executing the method. Technical details not described in detail in this embodiment can be found in the method provided in the embodiments of the present invention.
[0085] In one implementation, the above-described electronic device is applied to a power small-sample entity extraction system based on composite annotation joint training, for a client, and includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to: The entity types and relation types to be extracted from the power text are merged into composite tags. The composite tags are then used to label the entity span in the original power text to obtain positive samples. Based on the non-triggering statement pattern in the power safety regulations, a text fragment with literal similarity but opposite semantics to the positive sample is constructed, and the text fragment is labeled with an empty label as a semantic adversarial difficult negative sample; The positive samples and semantically adversarial difficult negative samples are mixed to obtain an adversarial dataset; Each sample in the adversarial dataset is input into a preset domain adaptive feature encoder, and the domain adaptive feature encoder outputs a fused feature vector sequence. The domain adaptive feature encoder is an encoder obtained by cascading bidirectional gated recurrent units using a BERT model pre-trained on a power regulation corpus as a base. The fused feature vector sequence is input into a dual-branch joint decoding architecture based on adversarial training to train an entity extraction model. During the training process, the multi-label classification loss is calculated first through the main task branch, and the contrast loss is calculated through the auxiliary task branch. Then, the adversarial loss of the main task branch is recalculated after adding adversarial perturbation. The model parameters are updated after weighted summation of the multi-label classification loss, contrast loss, and adversarial loss. The acquired real-time power text is input into the entity extraction model, and the entity extraction model outputs a power knowledge graph corresponding to the real-time power text.
[0086] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods of various embodiments or some parts of embodiments.
[0087] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method for extracting small-sample entities in the power industry based on joint training with composite annotation, characterized in that, include: The entity types and relation types to be extracted from the power text are merged into composite tags. The composite tags are then used to label the entity span in the original power text to obtain positive samples. Based on the non-triggering statement pattern in the power safety regulations, a text fragment with literal similarity but opposite semantics to the positive sample is constructed, and the text fragment is labeled with an empty label as a semantic adversarial difficult negative sample; The positive samples and semantically adversarial difficult negative samples are mixed to obtain an adversarial dataset; Each sample in the adversarial dataset is input into a preset domain adaptive feature encoder, and the domain adaptive feature encoder outputs a fused feature vector sequence. The domain adaptive feature encoder is an encoder obtained by cascading bidirectional gated recurrent units using a BERT model pre-trained on a power regulation corpus as a base. The fused feature vector sequence is input into a dual-branch joint decoding architecture based on adversarial training to train an entity extraction model. During the training process, the multi-label classification loss is calculated first through the main task branch, and the contrast loss is calculated through the auxiliary task branch. Then, the adversarial loss of the main task branch is recalculated after adding adversarial perturbation. The model parameters are updated after weighted summation of the multi-label classification loss, contrast loss, and adversarial loss. The acquired real-time power text is input into the entity extraction model, and the entity extraction model outputs a power knowledge graph corresponding to the real-time power text.
2. The method for extracting small-sample entities in the power industry based on joint training with composite annotations as described in claim 1, characterized in that, The step of using the composite tag to annotate the entity span in the original power text to obtain positive samples includes: Define a set of entity types E and a set of relation types R, and construct a set of composite labels C={(e,r0)|e∈E,r∈R}, where each composite label c∈C represents that the target text fragment has both an entity type e and a relation type r0. For each entity span in the original power text, based on the entity type to which the entity span belongs and the relational role that the entity span plays in the context, the corresponding composite label is selected from the composite label set C to label the entity span, thus obtaining positive samples.
3. The method for extracting small-sample entities in the power industry based on joint training with composite annotations as described in claim 1, characterized in that, The step of inputting each sample in the adversarial dataset into a preset domain-adaptive feature encoder, and the domain-adaptive feature encoder outputting a fused feature vector sequence, includes: Collect distribution network dispatch control management regulations, power system safety work regulations, substation operation management specifications, and de-sensitized historical dispatch order tickets and maintenance logs to construct a dedicated corpus for the power industry; Based on the general pre-trained BERT model, the masked language model task is further trained using the power field-specific corpus to obtain the BERT model pre-trained with the power control procedure corpus. The output of the BERT model pre-trained on the power regulation procedure corpus is cascaded with a bidirectional gated recurrent unit to form a domain-adaptive feature encoder. The text in the adversarial dataset is input into the domain adaptive feature encoder, which then extracts the global context semantic vector through the BERT model, and then extracts the local temporal feature vector through the bidirectional gated recurrent unit. The global context semantic vector and the local temporal feature vector are fused to output a fused feature vector sequence.
4. The method for extracting small-sample entities in the power industry based on joint training with composite annotations as described in claim 1, characterized in that, The step of training the entity extraction model by inputting the fused feature vector sequence into a two-branch joint decoding architecture based on adversarial training includes: The fused feature vector sequence is input into the main task branch and the auxiliary task branch respectively. The global pointer network in the main task branch is used to calculate the first and last boundary scoring matrix of the composite label, and then the multi-label classification loss is calculated. The anchor span, positive sample span and negative sample span are constructed through the auxiliary task branch to calculate the contrast loss. The total loss is calculated based on the multi-label classification loss and the contrastive loss, and the total loss is backpropagated to calculate the gradient of the total loss with respect to the character embedding sequence matrix corresponding to the power text. The adversarial perturbation amount is calculated based on the gradient, and then superimposed on the character embedding sequence matrix to generate adversarial samples. The adversarial sample is then input again into the domain adaptive feature encoder and the dual-branch joint decoding architecture, and the adversarial loss under the adversarial state is calculated only through the main task branch; The final joint training loss is calculated based on the adversarial loss and the total loss. The trainable parameters of the domain adaptive feature encoder and the dual-branch joint decoding architecture are then updated using an optimizer to obtain the entity extraction model.
5. The method for extracting small-sample entities in the power industry based on joint training with composite annotations according to claim 4, characterized in that, The character embedding sequence matrix is a matrix formed by mapping each character in the power text to a vector and stacking them in order of character position. The character embedding sequence matrix serves as the input layer of the domain adaptive feature encoder.
6. The method for extracting small-sample entities in the power industry based on joint training with composite annotations according to claim 4, characterized in that, The calculation of the first and last boundary scoring matrix of the composite label through the global pointer network in the main task branch, and then the calculation of the multi-label classification loss, includes: Let the fused feature vector sequence be... ,in For sequence length, For feature dimension, For the set of real numbers, This is the fused feature vector at the nth character position in the fused feature vector sequence; For the Composite tags, connected through a fully connected layer Mapped to query vector sequence and key vector sequence ,in, , , For the first The class of compound tags in the first A query vector for each position. For the first The class of compound tags in the first A key vector at each position; The query vector and key vector are rotated using rotational position encoding, specifically including: for position index... The vector, Treating each pair of dimensions of a vector as a group, let the first dimension be... Group corresponding dimension and , The rotation transformation formula is: , , In the formula, and These represent the query vectors before the rotation transformation. The peacekeeping The numerical value of the dimension. and These represent the query vectors after rotation transformation. The peacekeeping The numerical value of the dimension. and These represent the key vectors before the rotation transformation. The peacekeeping The numerical value of the dimension. and These represent the key vectors after rotation transformation. The peacekeeping The numerical value of the dimension. For the first The rotation frequency of the subspace. ; Based on the query vector and key vector after rotation transformation, calculate any starting position in the text sequence. and termination position The candidate span constitutes the first Scoring values for composite tags The expression is: , In the formula, The starting position after rotation transformation The query vector at that location, The termination position after rotation transformation The key vector at that location, It is the transpose symbol; The classification loss of the main task is calculated using the multi-label cross-entropy loss function. The expression is: , In the formula, This represents the total number of composite tags in the set of composite tags. For the first The set of real entity spans for composite tags, i.e., the actual positive sample spans existing in the sample. For the first The set of negative sample spans for composite labels, i.e., negative sample spans that do not exist in the sample.
7. The method for extracting small-sample entities in the power industry based on joint training with composite annotations according to claim 4, characterized in that, The comparison loss is calculated by constructing the anchor span, positive sample span, and negative sample span through auxiliary task branches, including: The true entity spans are obtained from the positive samples of the adversarial dataset, and each true entity span is denoted as... ,in, and These represent the start and end indexes of the real entity span in the text sequence, respectively, satisfying... The span of the real entity is defined as the anchor point span, and the feature vector of the anchor point span is denoted as the anchor point feature vector. The anchor point feature vector is obtained by average pooling the fused feature vector within the span of the anchor point, as expressed in the following expression: , In the formula, For the first in the text sequence The fused feature vector of each character position. This indicates the length of the anchor point span; Construct a set of positive samples Specifically, this includes feature vectors of other real entity spans in the current training batch that have the same composite label category as the anchor span, and the feature vectors of the anchor span. The enhanced feature vector is obtained by forward propagation through two different random deactivation masks. The random deactivation mask refers to the mask matrix that randomly sets the output of some neurons to zero with a certain probability during the training process. Construct a negative sample set This includes feature vectors of entity word fragments extracted from semantically adversarial hard negative samples and boundary-offset negative samples. The boundary-offset negative samples are constructed as follows: Set the sliding window radius Enumerate all that satisfy and candidate span , and These represent the start and end positions of the candidate span, respectively; From the candidate spans, spans with an intersection-union ratio greater than 0 and less than a preset threshold with the anchor point span are selected as negative boundary offset samples, wherein the intersection-union ratio is the ratio of the intersection length to the union length of the two spans; The contrastive loss is calculated using the information-noise contrastive estimation loss function, and its expression is: , In the formula, It is an exponential function with the natural constant e as its base. The cosine similarity function is used. This is a temperature coefficient used to adjust the model's focus on difficult samples; its value ranges from 0.05 to 0.
1. For positive sample feature vectors, These are the feature vectors of the negative samples.
8. A small-sample entity extraction system for electricity based on joint training with composite annotation, characterized in that, include: The annotation module is configured to merge the entity type and relation type to be extracted in the power text into a composite label, and use the composite label to annotate the entity span in the original power text to obtain positive samples; The construction module is configured to construct text fragments that are literally similar to positive samples but semantically opposite, based on the non-triggering statement patterns in the power safety regulations, and to label the text fragments with empty labels as semantic adversarial hard negative samples; The mixing module is configured to mix the positive samples and semantically adversarial difficult negative samples to obtain an adversarial dataset; The extraction module is configured to input each sample in the adversarial dataset into a preset domain adaptive feature encoder, and the domain adaptive feature encoder outputs a fused feature vector sequence. The domain adaptive feature encoder is an encoder obtained by cascading bidirectional gated recurrent units using a BERT model pre-trained on a power regulation corpus as a base. The training module is configured to input the fused feature vector sequence into a dual-branch joint decoding architecture based on adversarial training to train an entity extraction model. During the training process, the multi-label classification loss is first calculated through the main task branch, and the contrast loss is calculated through the auxiliary task branch. Then, the adversarial loss of the main task branch is recalculated after adding adversarial perturbation. The model parameters are updated after weighted summation of the multi-label classification loss, contrast loss, and adversarial loss. The output module is configured to input the acquired real-time power text into the entity extraction model, and the entity extraction model outputs a power knowledge graph corresponding to the real-time power text.
9. An electronic device, characterized in that, include: At least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method according to any one of claims 1 to 7.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the method according to any one of claims 1 to 7.