Method, device and computer storage medium for determining contract performance entity category
By employing multi-dimensional semantic enhancement and global label decoding, the problem of low accuracy in identifying performance entity categories caused by professional terminology and nested clauses in civil aviation contracts was solved, achieving efficient identification of contract performance entity categories.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TRAVELSKY TECHNOLOGY LIMITED
- Filing Date
- 2026-03-09
- Publication Date
- 2026-06-19
AI Technical Summary
In the context of civil aviation contract performance management, due to the semantic ambiguity of professional terms and the nested clause structure in the contract text, existing technologies have difficulty accurately identifying the types of contract performance entities, resulting in low identification accuracy.
A multi-dimensional semantic enhancement method is adopted. By using a pre-built domain knowledge base, the semantic vector sequence is modeled in a targeted manner at the character level boundary, term level semantics and clause level logical structure. Combined with bidirectional sequence encoding, global feature enhancement, residual connection and random deactivation processing, a highly discriminative fusion feature sequence is generated, and the contract performance entity category is determined by global label decoding.
It improves the accuracy of identifying contract performance entity categories, accurately distinguishes the boundaries of technical terms and the internal logical connections of clauses, and enhances semantic consistency representation while maintaining fine-grained character-level perception.
Smart Images

Figure CN122240836A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of natural language processing technology, and more specifically, to a method, apparatus, and computer storage medium for determining the type of contract performance entity. Background Technology
[0002] In civil aviation contract performance management scenarios, contract texts typically contain numerous technical terms, nested clause structures, and cross-sentence semantic dependencies. Existing technologies often employ general sequence labeling models for entity recognition, but due to the lack of targeted modeling for the semantic ambiguity of aviation terminology, they are prone to misidentifying technical expressions as general vocabulary. Furthermore, traditional models, when processing clauses containing nested condition-responsibility structures, often suffer from broken entity boundaries or category mismatches due to a lack of awareness of local logical boundaries. In addition, the cross-sentence information associations in contracts make it difficult for unidirectional encoding models to fully capture the long-distance dependencies between performance responsibilities and supporting elements, thus significantly reducing the accuracy of identifying core entities such as the performance responsibility subject, the subject matter, and supporting conditions.
[0003] There is currently no effective solution to the above problems. Summary of the Invention
[0004] This application provides a method, apparatus, and computer storage medium for determining the category of contract performance entities, in order to at least solve the technical problem in the prior art that the accuracy of identifying the category of contract performance entities is low due to semantic shifts in professional terminology in contract texts, complex nested clause structures, and significant cross-sentence dependencies.
[0005] According to one aspect of the embodiments of this application, a method for determining the category of a contract performance entity is provided, comprising: obtaining a semantic vector sequence of contract text; performing multi-dimensional semantic enhancement on the semantic vector sequence based on a pre-built domain knowledge base to generate a fused feature sequence of the contract text, wherein the multi-dimensional semantic enhancement includes directional modeling of character-level boundaries, term-level semantics, and clause-level logical structure of the semantic vector sequence; performing target processing operations on the fused feature sequence to obtain a target feature sequence, wherein the target processing operations include bidirectional sequence encoding, global feature enhancement, residual connections, and random deactivation processing, wherein bidirectional sequence encoding represents character-level features with bidirectional contextual dependencies generated by forward and inverse neural networks, global feature enhancement is used to highlight key feature sequences and suppress redundant feature sequences, and random deactivation processing is used to set some feature dimensions in the feature sequence to zero; performing global label decoding on the target feature sequence to obtain a target label sequence that conforms to a label transition matrix, wherein the label transition matrix is used to prohibit label transitions in the target feature sequence that violate the semantic logic of the contract; and determining the category to which the contract performance entity in the contract text belongs based on the target label sequence that conforms to the label transition matrix.
[0006] Optionally, obtaining the semantic vector sequence of the contract text includes: preprocessing the contract text to generate a contract character sequence, wherein the preprocessing includes data cleaning, structured parsing and format standardization of the contract text; and encoding the contract character sequence based on a pre-trained model to obtain the semantic vector sequence.
[0007] Optionally, before performing multi-dimensional semantic enhancement on the semantic vector sequence based on a pre-built domain knowledge base to generate a fusion feature sequence of the contract text, the method further includes: sorting out historical contract texts and related industry standards, extracting organizations, objects, resources, and regulatory entities related to performance obligations, forming multiple contract performance entity categories; constructing a domain dictionary, an entity relation library, and a terminology mapping table based on multiple contract performance entity categories, wherein the domain dictionary is used to represent the standardized semantic units of professional terms and their respective contract performance entity categories, the entity relation library is used to represent the business semantic association rules between contract performance entities, and the terminology mapping table is used to represent the semantic equivalence relationships between synonyms, abbreviations, and full abbreviations; and constructing a domain knowledge base based on the domain dictionary, entity relation library, and terminology mapping table.
[0008] Optionally, based on a pre-built domain knowledge base, multi-dimensional semantic enhancement is performed on the semantic vector sequence to generate a fused feature sequence of the contract text. This includes: using structured parameters in the domain dictionary as constraints, performing character-level boundary-aware mapping on the semantic vector sequence to obtain a character-level feature sequence, where the character-level boundary-aware mapping is used to capture the boundary starting position of the contract performance entity; using professional terms in the domain dictionary as semantic anchors, performing terminology-level semantic orientation enhancement on the semantic vector sequence to obtain a terminology-level feature sequence, where the terminology-level semantic orientation enhancement is used to strengthen the distinction between terms and general vocabulary in the semantic space; using business semantic associations in the entity relation database as logical guidance, performing clause-level structural modeling on the semantic vector sequence to obtain a clause-level feature sequence, where the clause-level structural modeling is used to improve the contextual consistency of the contract performance entity under complex semantic structures; and fusing the character-level feature sequence, terminology-level feature sequence, and clause-level feature sequence to obtain a fused feature sequence.
[0009] Optionally, target processing operations are performed on the fused feature sequence to obtain a target feature sequence, including: encoding the fused feature sequence through a bidirectional temporal neural network to obtain a temporal feature sequence; performing global feature enhancement on the temporal feature sequence to obtain the global dependencies of each position within the temporal feature sequence, generating a weighted feature sequence; performing residual concatenation between the weighted feature sequence and the temporal feature sequence to obtain a residual output sequence; and performing random deactivation processing on the residual output sequence to obtain the target feature sequence.
[0010] Optionally, global label decoding is performed on the target feature sequence to obtain a target labeled sequence that conforms to the label transition matrix. This includes: mapping the target feature sequence to a target score matrix of each sequence label through a linear transformation, wherein the sequence label is used to indicate the starting position, intermediate position, and non-entity position of the contract performance entity; constructing a label transition matrix based on sequence label constraint rules and domain semantic constraints, wherein the sequence label constraint rules are used to define the compliant transition path of the sequence label, and the domain semantic constraints are used to prohibit label combinations that conflict with contract semantics; and determining the target labeled sequence using a dynamic programming algorithm based on the target score matrix and the label transition matrix.
[0011] Optionally, based on the target annotation sequence that conforms to the label transition matrix, the category to which the contract performance entity in the contract text belongs is determined, including: extracting continuous sequence label tags from the target annotation sequence and concatenating the extracted sequence label tags to obtain the entity string; mapping the entity string to a standard string based on a terminology mapping table; and determining the contract performance entity category to which the standard string belongs as the category to which the contract performance entity in the contract text belongs based on a domain dictionary.
[0012] According to another aspect of the embodiments of this application, an apparatus for determining the category of a contract performance entity is also provided, comprising: an acquisition unit for acquiring a semantic vector sequence of contract text; a first processing unit for performing multi-dimensional semantic enhancement on the semantic vector sequence based on a pre-built domain knowledge base to generate a fused feature sequence of the contract text, wherein the multi-dimensional semantic enhancement includes directional modeling of character-level boundaries, term-level semantics, and clause-level logical structure of the semantic vector sequence; and a second processing unit for performing target processing operations on the fused feature sequence to obtain a target feature sequence, wherein the target processing operations include bidirectional sequence encoding, global feature enhancement, residual concatenation, and so on. The system includes a random deactivation process, a bidirectional sequence encoding representation that generates character-level features with bidirectional contextual dependencies through forward and inverse neural networks, a global feature enhancement process to highlight key feature sequences and suppress redundant feature sequences, and a random deactivation process to set some feature dimensions in the feature sequence to zero. The third processing unit performs global label decoding on the target feature sequence to obtain a target labeled sequence that conforms to the label transition matrix, where the label transition matrix is used to prohibit label transitions in the target feature sequence that violate the semantic logic of the contract. The determination unit determines the category to which the contract performance entity belongs in the contract text based on the target labeled sequence that conforms to the label transition matrix.
[0013] According to another aspect of the embodiments of this application, a computer-readable storage medium is also provided, which stores a computer program, wherein when the computer program is executed, it causes the device where the computer-readable storage medium is located to perform the above-described method for determining the type of contract performance entity.
[0014] According to another aspect of the embodiments of this application, an electronic device is also provided, including one or more processors and a memory, the memory being used to store one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors cause the one or more processors to perform the above-described method for determining the type of contract performance entity.
[0015] According to another aspect of the embodiments of this application, a computer program product is also provided, including a computer program or instructions, which, when executed by a processor, implement the above-described method for determining the type of contract performance entity.
[0016] In this application, the method for determining the category of contract performance entities first obtains the semantic vector sequence of the contract text; based on a pre-built domain knowledge base, the semantic vector sequence is enhanced with multi-dimensional semantics to generate a fused feature sequence of the contract text, wherein the multi-dimensional semantic enhancement includes targeted modeling of character-level boundaries, term-level semantics, and clause-level logical structure of the semantic vector sequence; the fused feature sequence is subjected to target processing operations to obtain a target feature sequence, wherein the target processing operations include bidirectional sequence encoding, global feature enhancement, residual connections, and random deactivation processing, wherein bidirectional sequence encoding represents character-level features with bidirectional contextual dependencies generated by forward and inverse neural networks, global feature enhancement is used to highlight key feature sequences and suppress redundant feature sequences, and random deactivation processing is used to set some feature dimensions in the feature sequence to zero; the target feature sequence is globally labeled and decoded to obtain a target labeled sequence that conforms to the label transition matrix, wherein the label transition matrix is used to prohibit label transitions in the target feature sequence that violate the semantic logic of the contract; and the category to which the contract performance entity in the contract text belongs is determined based on the target labeled sequence that conforms to the label transition matrix.
[0017] In this embodiment, a targeted modeling approach is adopted to perform character-level boundary, term-level semantics, and clause-level logical structure modeling on the semantic vector sequence of contract text. By fusing feature mapping under the constraints of multi-granularity domain knowledge, a highly discriminative fusion feature sequence is generated. This achieves the goal of accurately distinguishing the boundaries of professional terms and identifying the logical connections and contextual dependencies within clauses. This achieves the technical effect of enhancing the semantic consistency representation of terms and structural semantics while maintaining fine-grained character-level perception. In turn, it solves the technical problem in the prior art that the accuracy of identifying contract performance entity categories is low due to semantic shifts in professional terms, complex nested clause structures, and significant cross-sentence dependencies in contract text. Attached Figure Description
[0018] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments of this application and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings:
[0019] Figure 1 This is a flowchart of an optional method for determining the type of contract performance entity according to an embodiment of this application;
[0020] Figure 2 This is an optional overall method flowchart according to an embodiment of this application;
[0021] Figure 3 This is a schematic diagram of an optional residual self-attention module according to an embodiment of this application;
[0022] Figure 4 This is a schematic diagram of an optional contract performance entity extraction model architecture according to an embodiment of this application;
[0023] Figure 5 This is a flowchart of an optional specific implementation method according to an embodiment of this application;
[0024] Figure 6 This is a schematic diagram of an optional device for determining the type of contract performance entity according to an embodiment of this application. Detailed Implementation
[0025] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present application, and not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present application.
[0026] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0027] According to an embodiment of this application, a method embodiment for determining the category of a contract performance entity is provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than that shown here.
[0028] According to the embodiments of this application, a system for determining the category of contract performance entities (hereinafter referred to as the system) can be used as the execution subject of the method for determining the category of contract performance entities in the embodiments of this application. The system for determining the category of contract performance entities can be a software system or an embedded system combining software and hardware. Of course, the execution subject of the method in the embodiments of this application can also be other forms of execution subject, such as devices or equipment. Those skilled in the art should know that this application does not particularly limit the specific form of the execution subject.
[0029] Figure 1 This is a flowchart of an optional method for determining the type of contract performance entity according to an embodiment of this application, such as... Figure 1 As shown, the method includes the following steps:
[0030] Step S101: Obtain the semantic vector sequence of the contract text.
[0031] Optionally, by inputting the preprocessed contract text into a pre-trained language model fine-tuned for civil aviation corpora, the model outputs corresponding contextual semantic vectors for each character, forming a sequence. Each position in the sequence corresponds to the semantic representation of the t-th character in the contract text, and the vectors incorporate the semantic role of the character within the entire sentence. This achieves a non-linear, context-aware mapping of the contract text from raw characters to a semantic space, solving the problem that traditional bag-of-words or rule-matching methods cannot capture semantic context.
[0032] Step S102: Based on the pre-built domain knowledge base, perform multi-dimensional semantic enhancement on the semantic vector sequence to generate a fusion feature sequence of the contract text. The multi-dimensional semantic enhancement includes targeted modeling of character-level boundaries, term-level semantics, and clause-level logical structure of the semantic vector sequence.
[0033] Optionally, the pre-built domain knowledge base refers to pre-organized structured knowledge resources directly related to the performance of civil aviation contracts, including professional terminology definitions, terminology mapping rules, clause logic templates, and format specifications.
[0034] Optionally, multi-dimensional semantic enhancement refers to the targeted enhancement of the original semantic vector sequence from three dimensions: character level, term level, and clause level, through knowledge-guided feature transformation, rather than generalized processing.
[0035] Optionally, the fused feature sequence is a unified representation sequence formed by splicing and nonlinearly integrating features of different semantic granularities in the vector dimension after the above three-dimensional enhancement, which has richer discriminative power.
[0036] Optionally, using the domain knowledge base as a constraint, a three-way linear mapping is performed on the semantic vector sequence:
[0037] Character-level boundary modeling: By using rules in the knowledge base, the linear transformation matrix is guided to learn boundary-sensitive features such as numbers, units, and time nodes, thereby enhancing the ability to recognize the boundaries of structured segments;
[0038] Terminology-level semantic modeling: By using specialized terms in the knowledge base, a linear transformation is trained to distinguish between specialized terms and ordinary number combinations, and to accurately extract domain-specific semantics;
[0039] Clause-level logical structure modeling: Based on clause logical patterns such as "if...then..." and "requires...to..." in the knowledge base, semantic vector sequences are trained so that the model can recognize condition-responsibility structures and thus establish semantic associations.
[0040] Step S103: Perform target processing operations on the fused feature sequence to obtain the target feature sequence. The target processing operations include bidirectional sequence encoding, global feature enhancement, residual connection, and random deactivation. Bidirectional sequence encoding represents character-level features with bidirectional contextual dependencies generated by forward and inverse neural networks. Global feature enhancement is used to highlight key feature sequences and suppress redundant feature sequences. Random deactivation is used to set some feature dimensions in the feature sequence to zero.
[0041] Optionally, bidirectional sequence encoding refers to traversing the sequence from left to right and from right to left using neural networks in two directions (forward and backward), so that the features of each character not only include the preceding context but also the semantic constraints of the following context.
[0042] Optionally, global feature enhancement refers to dynamically adjusting the importance weights of features at each position in the sequence through attention mechanisms and other means, so that key contract performance entities are more activated and non-keywords are suppressed.
[0043] Optionally, residual connection refers to adding the original fused feature sequence to the processed features to reduce the loss of original semantic information in deep networks during transformation.
[0044] Optionally, random deactivation refers to forcibly setting the values of some dimensions in the feature vector to zero with a fixed probability during the training phase, simulating a feature loss scenario and improving the robustness of the model.
[0045] Step S104: Global label decoding is performed on the target feature sequence to obtain the target labeled sequence that conforms to the label transition matrix. The label transition matrix is used to prohibit label transitions in the target feature sequence that violate contract semantic logic.
[0046] Optionally, global label decoding refers to inferring the optimal label sequence at the whole sequence level by combining the prediction scores of all characters with the label transition rules, rather than predicting each character independently.
[0047] Optionally, the label transfer matrix is a predefined constraint matrix, where each element represents a compliance score for a transfer from label i to label j. For example, illegal transfers that violate the semantic logic of civil aviation contracts, such as "B-Responsible Entity" followed by "I-Amount" or "O" followed by "I-Subject," are prohibited. The labeling process uses the BIO labeling model, which clarifies the labeling rules for various entities (B represents the beginning of an entity, I represents the middle / end of an entity, and O represents a non-entity).
[0048] Optionally, the target feature sequence is input into a linear layer to predict the original score of each character belonging to each label, forming an emission matrix. Combined with a pre-defined label transition matrix, a dynamic programming algorithm is used to search for the path with the highest total score among all compliant label sequences, outputting a unique optimal label sequence.
[0049] Step S105: Determine the category to which the contract performance entity belongs in the contract text based on the target label sequence that conforms to the label transition matrix.
[0050] Optionally, the target label sequence is traversed to identify all consecutive label blocks that begin with "B-" followed by "I-", and the corresponding original character fragments are extracted as entity words and mapped to the corresponding contract performance entity category according to the "B-" label prefix.
[0051] Figure 2 This is an optional overall method flowchart according to an embodiment of this application. Figure 2 As shown, the core entities for contract performance in the civil aviation field are first defined and a knowledge base is constructed. Then, contract data is collected, the text is preprocessed, and a civil aviation contract performance entity extraction model based on deep semantic enhancement is constructed. Finally, the model is validated and optimized, and entities are extracted from new contract samples.
[0052] In one optional embodiment, obtaining the semantic vector sequence of the contract text includes: preprocessing the contract text to generate a contract character sequence, wherein the preprocessing includes data cleaning, structured parsing and format standardization of the contract text; and encoding the contract character sequence based on a pre-trained model to obtain the semantic vector sequence.
[0053] Optionally, contract text refers to formal legal documents in the civil aviation field involving aircraft procurement, leasing, maintenance support, ground services, etc. They usually contain a large number of professional terms, nested clauses, numerical amounts, time limits and organization names, and are diverse in format and prone to noise.
[0054] Optionally, preprocessing refers to the standardization operations performed to adapt the original contract text to the subsequent model input, including three sub-steps: data cleaning, structured parsing, and format standardization, with the aim of eliminating interference, unifying expression, and clarifying structure.
[0055] Optionally, the contract character sequence refers to the ordered character sequence formed by disassembling the contract text into individual Chinese characters, punctuation marks, or numeric characters after preprocessing, which serves as the input basis for the pre-trained model.
[0056] Optionally, the system automatically identifies and removes headers, footers, page numbers, page breaks, tabs, redundant blank lines, and meaningless symbols from the contract, retaining only the plain text content. It also merges and corrects mixed Chinese and English text, abnormal spaces, and isolated punctuation, contributing to text continuity. Then, a preset rule engine is used to identify the contract title, contracting parties, signing date, clause numbers, chapter structure, and table areas, dividing the complete contract into logical modules and providing a structured context for subsequent feature modeling. Finally, the system standardizes the text encoding format, converting full-width numbers / symbols to half-width, normalizing monetary expressions, and uniformly abbreviating and expanding technical terms to reduce semantic confusion caused by diverse expressions.
[0057] Optionally, by systematically eliminating noise, ambiguity, and format confusion in the original contract text, the contract character sequence input to the model can be consistent, readable, and semantically pure, providing stable and standardized input for subsequent encoding stages and reducing semantic vector distortion caused by variant spellings of the same word or structural confusion.
[0058] Optionally, the pre-trained model refers to a deep neural network model that has been pre-trained on a large-scale general corpus and has language understanding capabilities. The specific structure of the pre-trained model used in this embodiment is as follows:
[0059] Input layer: Receives the preprocessed contract character sequence. Each character is mapped to an initial vector at the character level, and positional encoding is superimposed to preserve character order information.
[0060] Embedding layer: Employs character-level or sub-word-level embedding methods to generate initial semantic representations, which helps to fully encode compound terms rather than perform incorrect segmentation;
[0061] Encoding layer: Consists of 12 (or 24) identical Transformer encoder layers stacked together, each layer containing two sub-modules:
[0062] Multi-head self-attention mechanism: Calculates the relevance weight of each character with all other characters in the sequence, and automatically identifies semantic associations across sentences;
[0063] Feedforward neural networks: perform nonlinear transformations and feature reorganization on the attention output to enhance semantic expressive power;
[0064] Output layer: Each encoder outputs a high-dimensional semantic vector, and the final output is a sequence of character-level semantic vectors of the 12th (last) layer. Each vector dimension contains the deep semantic information of the character in the complete contract context.
[0065] Optionally, the semantic vector sequence refers to mapping each character in the contract character sequence into a high-dimensional real number vector through a pre-trained model. Moreover, the vector not only represents the character itself, but also integrates the semantic role of the character in the context of the whole sentence, forming a context-aware semantic representation.
[0066] In an optional embodiment, before performing multi-dimensional semantic enhancement on the semantic vector sequence based on a pre-built domain knowledge base to generate a fusion feature sequence of contract text, the method further includes: sorting out historical contract texts and associated industry standards, extracting organizations, objects, resources, and regulatory entities related to performance obligations, forming multiple contract performance entity categories; constructing a domain dictionary, an entity relation library, and a terminology mapping table based on multiple contract performance entity categories, wherein the domain dictionary is used to represent the standardized semantic units of professional terms and their respective contract performance entity categories, the entity relation library is used to represent the business semantic association rules between contract performance entities, and the terminology mapping table is used to represent the semantic equivalence relationships between synonyms, abbreviations, and full abbreviations; and constructing a domain knowledge base based on the domain dictionary, entity relation library, and terminology mapping table.
[0067] Optionally, the contract performance entity category refers to the semantic category abstracted from the contract semantic level to represent the core elements of the performance obligation. This application divides them into four categories: performance responsibility entity, performance subject entity, performance support entity, and performance supervision entity. Each category corresponds to a set of entity types with clear business semantic boundaries, as shown in Table 1.
[0068] Table 1
[0069]
[0070] Optionally, the domain dictionary is a structured set of terms, each entry containing standardized terminology and the corresponding contract performance entity category, used to provide the model with strong term-category mapping priors.
[0071] Optionally, the entity relation database is a set of triples used to represent fixed association rules between different performance entities in the business context.
[0072] Optionally, the term map is a set of semantically equivalent maps used to uniformly express diverse but semantically identical terms.
[0073] Optionally, the domain knowledge base is an integration of the above three types of structured resources. With "entity-relationship-term" as the core framework, it forms a domain knowledge hub with the ability to perform semantic classification, associative reasoning and unified expression, serving as an external constraint source for subsequent model enhancement.
[0074] Optionally, the domain dictionary, entity relation database, and terminology mapping table can be uniformly stored in a structured graph database or a JSON-formatted knowledge graph, and an index and query interface can be established. For example, when the model encounters "abbreviation X", it can automatically query the mapping table to find that it is equivalent to "a certain aviation administration authority", then use the dictionary to find that its category is "compliance supervision entity", and use the relation database to find that it often co-occurs with words such as "airworthiness standards", "approval", and "regulation", thereby providing multi-dimensional semantic clues for feature fusion.
[0075] In one optional embodiment, based on a pre-built domain knowledge base, multi-dimensional semantic enhancement is performed on the semantic vector sequence to generate a fused feature sequence of the contract text. This includes: performing character-level boundary-aware mapping on the semantic vector sequence using structured parameters in the domain dictionary as constraints to obtain a character-level feature sequence, wherein the character-level boundary-aware mapping is used to capture the boundary starting position of the contract performance entity; performing terminology-level semantic orientation enhancement on the semantic vector sequence using professional terms in the domain dictionary as semantic anchors to obtain a terminology-level feature sequence, wherein the terminology-level semantic orientation enhancement is used to strengthen the distinction between terms and general vocabulary in the semantic space; performing clause-level structural modeling on the semantic vector sequence using business semantic associations in the entity relation database as logical guidance to obtain a clause-level feature sequence, wherein the clause-level structural modeling is used to improve the contextual consistency of the contract performance entity under complex semantic structures; and fusing the character-level feature sequence, the terminology-level feature sequence, and the clause-level feature sequence to obtain a fused feature sequence.
[0076] Optionally, based on the format specifications of structured parameters such as amount, time, number, and model in the domain dictionary, the original semantic vector sequence is transformed in a targeted manner so that the model can learn the distribution pattern at the character level.
[0077] Optionally, the semantic vector sequence can be semantically corrected using professional terms in the domain dictionary as supervision signals, thereby solving the problem of semantic shift of terms caused by the lack of domain semantics in general pre-trained models and improving the model's recognition accuracy and recall rate of industry-specific terms.
[0078] Optionally, based on the business rules in the entity relation database, the model dynamically learns which entity combinations have strong semantic relationships during the encoding process, thereby solving the problem that traditional models cannot understand the implicit logical relationships in the clauses. This enables the model to understand the "responsibility-object-basis" ternary structure and maintain the contextual consistency of entity semantics when facing nested sentences, conditional sentences, and cross-sentence references, thus improving the recognition stability under complex clauses.
[0079] Optionally, the character-level feature sequence, term-level feature sequence, and clause-level feature sequence are concatenated along the feature dimension to form a joint representation with expanded dimensions. Subsequently, a lightweight non-linear transformation layer is used for feature compression and fusion, outputting the final fused feature sequence, where each vector simultaneously contains:
[0080] Character level: Whether it is the beginning of a number / unit / encoding;
[0081] Terminology level: Whether it is a technical term and its category;
[0082] Clause level: Whether it is on the semantic association path of responsibility-condition-object.
[0083] Optionally, the system achieves multi-scale semantic integration from characters to semantics and from local to global through a fusion mechanism, enabling the model to have comprehensive judgment capabilities at each position, providing an information-rich, semantically clear, and discriminative input foundation for subsequent bidirectional sequence encoding and global label decoding.
[0084] This application first receives a preprocessed sequence of contract characters. ,in, This refers to a single Chinese character or sub-word in the contract text. (Character sequence) The data is directly input into a pre-trained model that has been fine-tuned with civil aviation corpus to obtain a sequence of context-related deep semantic vectors: .in, This represents the contextual semantic representation of the pre-trained model's output for the entire character sequence, each... , where is the context-related semantic vector of the t-th character.
[0085] The mapping formula for term / lexical level mapping is formula (1):
[0086] (1)
[0087] in, Represents a linear transformation matrix. This indicates the bias term.
[0088] The mapping formula for syntax / clause level mapping is formula (2):
[0089] (2)
[0090] in, Represents a linear transformation matrix. This indicates the bias term.
[0091] The mapping formula for character-level mapping is formula (3):
[0092] (3)
[0093] in, Represents a linear transformation matrix. Bias term.
[0094] The three features are then concatenated along the feature dimension to form a joint representation, and the joint formula is formula (4):
[0095] (4)
[0096] The final fusion expression is obtained through nonlinear mapping and is expressed as formula (5):
[0097] (5)
[0098] in, The linear transformation matrix representing the final fusion result. The bias term representing the final fusion result. The function represents a non-linear activation function.
[0099] In one optional embodiment, target processing is performed on the fused feature sequence to obtain a target feature sequence, including: encoding the fused feature sequence through a bidirectional temporal neural network to obtain a temporal feature sequence; performing global feature enhancement on the temporal feature sequence to obtain the global dependencies of each position within the temporal feature sequence, generating a weighted feature sequence; performing residual concatenation between the weighted feature sequence and the temporal feature sequence to obtain a residual output sequence; and performing random deactivation processing on the residual output sequence to obtain the target feature sequence.
[0100] Optionally, a bidirectional temporal neural network refers to a recurrent neural network structure that can encode sequences from both forward and backward directions simultaneously. Its core capability is to capture complete contextual information for each character in the sequence, including preceding dependencies and following constraints.
[0101] Optionally, after bidirectional encoding, the temporal feature sequence is a high-dimensional vector sequence corresponding to each character position, which integrates the semantics of the preceding and following context. Each vector not only expresses the semantics of the current character, but also carries the semantic role of the current character in the entire contract terms.
[0102] Optionally, after inputting the fused feature sequence into a bidirectional temporal neural network, the state is passed forward character by character, starting from the first character, encoding the preceding context. Then, starting from the last character, the state is passed backward, encoding the following semantics. Finally, the forward and backward hidden states are concatenated to form a temporal feature sequence, where each position contains the complete contextual semantics of the current character within the entire contract. Through bidirectional encoding, the features of each character become context-aware semantic nodes, laying a temporal foundation for subsequent global modeling.
[0103] Specifically, the formula for generating the positive hidden state is formula (6):
[0104] (6)
[0105] in, for The feature at time step t, This is the positive hidden state of the previous time step. This is the current time step's positive hidden state.
[0106] The formula for generating the reverse hidden state is formula (7):
[0107] (7)
[0108] in, The hidden state is reversed for the next time step. The hidden state is reversed at the current time step.
[0109] By concatenating features, a feature representation containing a complete context is generated. The formula is formula (8):
[0110] (8)
[0111] Optionally, the goal of global feature enhancement is to automatically identify and enhance semantic locations that are strongly related to the fulfilling entity, while suppressing irrelevant or redundant information to achieve attention focus.
[0112] Optionally, the weighted feature sequence is an output sequence after global enhancement, in which each positional feature is assigned a different weight, which is determined by the semantic contribution of the feature in the global context.
[0113] Optionally, a global feature enhancement mechanism is applied to the temporal feature sequence to calculate the semantic correlation between any two positions in the sequence. Based on the correlation matrix, an attention weight is assigned to the feature at each position. The original temporal features are aggregated according to the weights to generate a weighted feature sequence, in which the features of key performance elements are amplified. This solves the problem that the model cannot focus on the core performance elements in long texts, enabling the model to automatically identify which words are key performance points and which are background noise, thereby improving the accuracy and anti-interference ability of entity recognition.
[0114] Optionally, residual connections can directly add the input features to the processed output features, preserving the integrity of the original information and reducing semantic loss caused by excessive transformation in deep networks. This alleviates the gradient vanishing and semantic degradation problems in deep networks and helps the model retain the underlying structural information of the original semantics after global enhancement. It also reduces the loss of context due to over-focusing, making the feature representation both focused and stable.
[0115] Figure 4 This is a schematic diagram of an optional residual self-attention module according to an embodiment of this application. Figure 4 As shown, first construct the query Q, key K, and value V matrix, using formula (9):
[0116] (9)
[0117] in, The self-attention calculation process is given by formula (10): (The transformation matrix is learnable.)
[0118] (10)
[0119] in, Indicate the dimensions of matrices Q and K. It is a penalty factor used to limit the inner product of matrices Q and K from becoming too large.
[0120] Figure 3 This is a schematic diagram of an optional residual self-attention module according to an embodiment of this application. Figure 3 As shown, after feature A is input into the SA and RA modules respectively, feature fusion and residual connection are performed to obtain feature B.
[0121] Optionally, random deactivation refers to setting the feature values of certain dimensions in the residual output sequence to zero with a fixed probability during the model training phase, simulating a scenario where some information is lost, thus forcing the model to not rely on a single feature path.
[0122] Optionally, random deactivation is a regularization technique that improves the model's generalization ability by simulating feature loss. In civil aviation contracts, the same entity may have multiple expressions. Random deactivation forces the model to learn robust feature representations across expressions, rather than memorizing specific word forms, reducing overfitting and enhancing the model's adaptability to new contract texts.
[0123] Specifically, the formula for random inactivation is formula (11):
[0124] (11)
[0125] In one optional embodiment, global label decoding is performed on the target feature sequence to obtain a target labeled sequence conforming to the label transition matrix. This includes: mapping the target feature sequence to a target score matrix of each sequence label through a linear transformation, wherein the sequence label is used to indicate the starting position, intermediate position, and non-entity position of the contract performance entity; constructing a label transition matrix based on sequence label constraint rules and domain semantic constraints, wherein the sequence label constraint rules are used to define compliant transition paths for sequence label, and the domain semantic constraints are used to prohibit label combinations that conflict with contract semantics; and determining the target labeled sequence using a dynamic programming algorithm based on the target score matrix and the label transition matrix.
[0126] Optionally, sequence labeling tags are used to identify a discrete set of tags for each character's semantic role in the contract. This application uses three basic types of tags:
[0127] B-Category: Indicates the starting position of a certain type of contract performance entity, such as B-RE-RP indicating the starting position of "performance responsibility entity - right holder";
[0128] I-Category: Indicates the middle or end position of a certain type of entity, such as I-TA-LA indicating the internal characters of "Entity of Performance - Leased Entity";
[0129] O: Represents non-entity locations, such as punctuation marks, conjunctions, and irrelevant semantic blocks.
[0130] This application defines more than 20 tags, covering four types of performance entities (liability, subject matter, support, and supervision) and their subcategories.
[0131] Optionally, sequence labeling constraints refer to syntactic transition restrictions defined by the BIO labeling system itself. For example, I-classes cannot appear after "O"; B-classes must be followed by the same class "I-class" or "O" and cannot jump to other classes.
[0132] Optionally, domain semantic constraints refer to semantic rationality restrictions defined by the business logic of civil aviation contracts, used to prohibit tag combinations that violate industry common sense.
[0133] Optionally, the label transition matrix represents the allowed score for transitioning from label i to label j. Valid transitions are assigned positive values, and invalid transitions are assigned extremely low values to automatically exclude illegal paths during decoding.
[0134] Optionally, the system first enforces the encoding of all syntactic constraints according to the BIO specification. Then, based on the 156 entity relationship and contract clause logics sorted out by domain experts, it constructs semantic rationality constraint rules and finally encodes all rules into a transition matrix.
[0135] Specifically, this application in feature representation A conditional random field is introduced above, and combined with label transition constraints in the civil aviation field, a globally optimal label sequence inference is performed. First, a linear transformation is used to map the feature vector of each position to the emission score of each candidate label. Let the label set be... Then the label at position t is... The launch score can be expressed as formula (12):
[0136] (12)
[0137] in, T represents the total number of characters. Indicates the number of tags. Represents a linear transformation matrix. This indicates the bias amount.
[0138] Based on this, construct the label transition matrix. ,in, Indicates from the label Transfer to label The score.
[0139] Alternatively, dynamic programming is an optimization method that finds the globally optimal path in the state space. In this application, the goal is to find a label sequence with the highest total score, which should maximize the local score of each character while ensuring that adjacent label transitions are legal.
[0140] Specifically, given the input sequence With label sequence The total score for the entire labeled path is defined by formula (13):
[0141] (13)
[0142] Where T represents the total number of label sequences, Indicates from the label Transfer to label The score, Indicates the first Each position corresponds to a label The launch score.
[0143] Conditional random field layers under given input Time for label sequence The conditional probability is given by formula (14):
[0144] (14)
[0145] in, This represents the set of all legal label sequences that conform to the civil aviation contract labeling rules and domain constraints.
[0146] By maximizing the true label sequence The log-likelihood is used to estimate the model parameters, i.e., formula (15):
[0147] (15)
[0148] Finally, the normalization factor is calculated, and the sequence with the highest score among all legal label paths is found using the Viterbi algorithm, as shown in formula (16):
[0149] (16)
[0150] In one optional embodiment, determining the category to which the contract performance entity in the contract text belongs based on the target annotation sequence conforming to the label transition matrix includes: extracting consecutive sequence label tags from the target annotation sequence and concatenating the extracted sequence label tags to obtain an entity string; mapping the entity string to a standard string based on a terminology mapping table; and determining the contract performance entity category to which the standard string belongs as the category to which the contract performance entity in the contract text belongs based on a domain dictionary.
[0151] Optionally, the target label sequence that conforms to the label transition matrix refers to the complete label sequence output after global label decoding that satisfies the BIO specification and domain semantic constraints, where each label corresponds to the semantic role of a character in the text.
[0152] Optionally, sequence label refers to a label used to identify entity boundaries and categories, with the prefix indicating position and the suffix indicating entity category.
[0153] Optionally, an entity string refers to the original text fragment formed by concatenating consecutive "B-" characters belonging to the same entity category with the original characters corresponding to the subsequent "I-" tags, and it has not yet been standardized.
[0154] Optionally, the target label sequence is traversed to identify all label subsequences that begin with "B-", are followed by consecutive "I-", and end with "O" or the end of the sequence. For each identified subsequence, its start position, end position, entity category label, and the concatenated original string are recorded as input for subsequent processing.
[0155] Optionally, based on a domain dictionary, the final mapping from standardized text to business semantic categories is achieved, so that the string output by the model is truly transformed into a contract performance entity with clear business meaning.
[0156] Figure 4 This is a schematic diagram of an optional contract performance entity extraction model architecture according to an embodiment of this application. For example... Figure 4 As shown, the input character X, for example, "A certain airline leases three aircraft of a certain type," is segmented and then input into the deep semantic understanding and feature fusion module. This module obtains a sequence vector through a pre-trained model and then inputs it into a multi-feature fusion module to obtain a fused representation. The fused features are then input into the context sequence modeling and feature optimization module, which includes multiple forward and inverse long short-term memory networks. Afterward, the input enters the global label decoding and output module, which includes a residual self-attention module and a conditional random field module. Finally, the optimal sequence label is output, indicating whether the input character X is the beginning, middle / end, or non-entity of an entity.
[0157] Figure 5 This is a flowchart of an optional specific implementation method according to an embodiment of this application. For example... Figure 5 As shown, the process begins by defining contract performance entities and building a knowledge base, collecting and preprocessing contract data, establishing a performance entity identification model, and evaluating the model. After evaluation, the model parameters are adjusted based on the updated parameters, and finally, the model is saved for use in identifying new contract performance entities.
[0158] Figure 6 This is a schematic diagram of an optional device for determining the type of contract performance entity according to an embodiment of this application. According to another aspect of an embodiment of this application, a device for determining the type of contract performance entity is also provided, including: an acquisition unit 601, a first processing unit 602, a second processing unit 603, a third processing unit 604, and a determination unit 605.
[0159] The system includes: an acquisition unit 601 for acquiring a semantic vector sequence of the contract text; a first processing unit 602 for performing multi-dimensional semantic enhancement on the semantic vector sequence based on a pre-built domain knowledge base to generate a fused feature sequence of the contract text, wherein the multi-dimensional semantic enhancement includes targeted modeling of character-level boundaries, term-level semantics, and clause-level logical structure of the semantic vector sequence; and a second processing unit 603 for performing target processing operations on the fused feature sequence to obtain a target feature sequence, wherein the target processing operations include bidirectional sequence encoding, global feature enhancement, residual connections, and random deactivation processing, and the bidirectional sequence encoding table... The system generates character-level features with bidirectional contextual dependencies through forward and inverse neural networks. Global feature enhancement is used to highlight key feature sequences and suppress redundant feature sequences. Random deactivation is used to set some feature dimensions in the feature sequence to zero. The third processing unit 604 is used to perform global label decoding on the target feature sequence to obtain a target labeled sequence that conforms to the label transition matrix. The label transition matrix is used to prohibit label transitions in the target feature sequence that violate the semantic logic of the contract. The determination unit 605 is used to determine the category to which the contract performance entity belongs in the contract text based on the target labeled sequence that conforms to the label transition matrix.
[0160] Optionally, the acquisition unit 601 includes: a preprocessing subunit for preprocessing the contract text to generate a contract character sequence, wherein the preprocessing includes data cleaning, structured parsing and format standardization of the contract text; and a pretraining subunit for encoding the contract character sequence based on a pretrained model to obtain a semantic vector sequence.
[0161] Optionally, the device for determining the contract performance entity category further includes: an extraction unit, used to sort through historical contract texts and related industry standards, extract organizations, objects, resources and regulatory entities related to the performance obligations, and form multiple contract performance entity categories; a first construction unit, used to construct a domain dictionary, an entity relation library and a terminology mapping table based on multiple contract performance entity categories, wherein the domain dictionary is used to represent the standardized semantic units of professional terms and their respective contract performance entity categories, the entity relation library is used to represent the business semantic association rules between contract performance entities, and the terminology mapping table is used to represent the semantic equivalence relationships between synonyms, abbreviations and full abbreviations; and a second construction unit, used to construct a domain knowledge base based on the domain dictionary, entity relation library and terminology mapping table.
[0162] Optionally, the first processing unit 602 includes: a first processing subunit, configured to perform character-level boundary-aware mapping on the semantic vector sequence using structured parameters in the domain dictionary as constraints to obtain a character-level feature sequence, wherein the character-level boundary-aware mapping is used to capture the boundary starting position of the contract performance entity; a second processing subunit, configured to perform terminology-level semantic orientation enhancement on the semantic vector sequence using professional terms in the domain dictionary as semantic anchors to obtain a terminology-level feature sequence, wherein the terminology-level semantic orientation enhancement is used to strengthen the distinction between terms and general vocabulary in the semantic space; a third processing subunit, configured to perform clause-level structural modeling on the semantic vector sequence using business semantic associations in the entity relation database as logical guidance to obtain a clause-level feature sequence, wherein the clause-level structural modeling is used to improve the contextual consistency of the contract performance entity under complex semantic structures; and a fusion subunit, configured to fuse the character-level feature sequence, the terminology-level feature sequence, and the clause-level feature sequence to obtain a fused feature sequence.
[0163] Optionally, the second processing unit 603 includes: a first processing subunit, used to encode the fused feature sequence through a bidirectional temporal neural network to obtain a temporal feature sequence; a second processing subunit, used to perform global feature enhancement on the temporal feature sequence to obtain the global dependencies of each position within the temporal feature sequence and generate a weighted feature sequence; a third processing subunit, used to perform residual concatenation between the weighted feature sequence and the temporal feature sequence to obtain a residual output sequence; and a fourth processing subunit, used to perform random deactivation processing on the residual output sequence to obtain a target feature sequence.
[0164] Optionally, the third processing unit 604 includes: a processing subunit, used to map the target feature sequence to a target score matrix of each sequence label through a linear transformation, wherein the sequence label is used to indicate the starting position, intermediate position, and non-entity position of the contract performance entity; a construction subunit, used to construct a label transition matrix based on sequence label constraint rules and domain semantic constraints, wherein the sequence label constraint rules are used to define compliant transition paths of sequence label labels, and the domain semantic constraints are used to prohibit label combinations that conflict with contract semantics; and a determination subunit, used to determine the target label sequence through a dynamic programming algorithm based on the target score matrix and the label transition matrix.
[0165] Optionally, the determining unit 605 includes: an extraction subunit, used to extract continuous sequence label tags from the target label sequence and concatenate the extracted sequence label tags to obtain an entity string; a mapping subunit, used to map the entity string to a standard string based on a terminology mapping table; and a determining subunit, used to determine the contract performance entity category to which the standard string belongs as the category to which the contract performance entity belongs in the contract text, based on a domain dictionary.
[0166] According to another aspect of the embodiments of this application, a computer-readable storage medium is also provided, which stores a computer program, wherein when the computer program is executed, it causes the device where the computer-readable storage medium is located to perform the above-described method for determining the type of contract performance entity.
[0167] According to another aspect of the embodiments of this application, an electronic device is also provided, including one or more processors and a memory, the memory being used to store one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors cause the one or more processors to perform the above-described method for determining the type of contract performance entity.
[0168] According to another aspect of the embodiments of this application, a computer program product is also provided, including a computer program or instructions, which, when executed by a processor, implement the above-described method for determining the type of contract performance entity.
[0169] The sequence numbers of the embodiments in this application are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.
[0170] In the above embodiments of this application, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.
[0171] In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The device embodiments described above are merely illustrative; for example, the division of units can be a logical functional division, and in actual implementation, there may be other division methods. For instance, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the displayed or discussed mutual coupling, direct coupling, or communication connection may be through some interfaces; the indirect coupling or communication connection between units or modules may be electrical or other forms.
[0172] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0173] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0174] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as a USB flash drive, read-only memory (ROM), random access memory (RAM), portable hard drive, magnetic disk, or optical disk.
[0175] The above description is only a preferred embodiment of this application. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of this application, and these improvements and modifications should also be considered within the scope of protection of this application.
Claims
1. A method for determining the category of a contract performance entity, characterized in that, include: Obtain the semantic vector sequence of the contract text; Based on a pre-built domain knowledge base, the semantic vector sequence is subjected to multi-dimensional semantic enhancement to generate a fusion feature sequence of the contract text. The multi-dimensional semantic enhancement includes targeted modeling of character-level boundaries, term-level semantics, and clause-level logical structure of the semantic vector sequence. The fused feature sequence is subjected to target processing operations to obtain a target feature sequence. The target processing operations include bidirectional sequence encoding, global feature enhancement, residual connection, and random deactivation. The bidirectional sequence encoding represents character-level features with bidirectional contextual dependencies generated by forward and inverse neural networks. The global feature enhancement is used to highlight key feature sequences and suppress redundant feature sequences. The random deactivation is used to set some feature dimensions in the feature sequence to zero. Global label decoding is performed on the target feature sequence to obtain a target labeled sequence that conforms to the label transition matrix, wherein the label transition matrix is used to prohibit label transitions in the target feature sequence that violate contract semantic logic; Based on the target label sequence that conforms to the label transition matrix, the category to which the contract performance entity in the contract text belongs is determined.
2. The method according to claim 1, characterized in that, Obtain the semantic vector sequence of the contract text, including: The contract text is preprocessed to generate a contract character sequence, wherein the preprocessing includes data cleaning, structured parsing and format standardization of the contract text; The contract character sequence is encoded based on a pre-trained model to obtain the semantic vector sequence.
3. The method according to claim 1, characterized in that, Before performing multi-dimensional semantic enhancement on the semantic vector sequence based on a pre-built domain knowledge base to generate the fused feature sequence of the contract text, the method further includes: By reviewing historical contract texts and related industry standards, we extracted organizations, subjects, resources, and regulatory entities related to performance obligations, forming multiple categories of contract performance entities. Based on the multiple contract performance entity categories, a domain dictionary, an entity relation library, and a terminology mapping table are constructed. The domain dictionary is used to represent the standardized semantic units of professional terms and their respective contract performance entity categories. The entity relation library is used to represent the business semantic association rules between contract performance entities. The terminology mapping table is used to represent the semantic equivalence relationships between synonyms, abbreviations, and full abbreviations. The domain knowledge base is constructed based on the domain dictionary, the entity relation database, and the terminology mapping table.
4. The method according to claim 3, characterized in that, Based on a pre-built domain knowledge base, the semantic vector sequence is enhanced with multi-dimensional semantics to generate a fused feature sequence of the contract text, including: Using the structured parameters in the domain dictionary as constraints, the semantic vector sequence is subjected to character-level boundary-aware mapping to obtain a character-level feature sequence, wherein the character-level boundary-aware mapping is used to capture the boundary start position of the contract performance entity; Using the specialized terms in the domain dictionary as semantic anchors, the semantic vector sequence is enhanced with term-level semantic orientation to obtain a term-level feature sequence. The term-level semantic orientation enhancement is used to strengthen the distinction between terms and general vocabulary in the semantic space. Guided by the business semantic associations in the entity relation database, the semantic vector sequence is modeled at the clause level to obtain a clause-level feature sequence. The clause-level structure modeling is used to improve the contextual consistency of the contract performance entity under complex semantic structures. The character-level feature sequence, the term-level feature sequence, and the clause-level feature sequence are fused to obtain a fused feature sequence.
5. The method according to claim 1, characterized in that, The fused feature sequence is subjected to target processing operations to obtain a target feature sequence, including: The fused feature sequence is encoded by a bidirectional temporal neural network to obtain a temporal feature sequence; Global feature enhancement is performed on the time-series feature sequence to obtain the global dependencies of each position within the time-series feature sequence, and a weighted feature sequence is generated. The weighted feature sequence and the time-series feature sequence are concatenated using residuals to obtain a residual output sequence; Random deactivation is performed on the residual output sequence to obtain the target feature sequence.
6. The method according to claim 1, characterized in that, Global label decoding is performed on the target feature sequence to obtain a target labeled sequence that conforms to the label transition matrix, including: The target feature sequence is mapped to a target score matrix of each sequence label through a linear transformation, wherein the sequence label is used to indicate the starting position, intermediate position and non-entity position of the contract performance entity; Based on sequence labeling constraint rules and domain semantic constraints, a tag transition matrix is constructed, wherein the sequence labeling constraint rules are used to define compliant transition paths for sequence label tags, and the domain semantic constraints are used to prohibit tag combinations that conflict with contract semantics. Based on the target score matrix and the label transition matrix, the target label sequence is determined by a dynamic programming algorithm.
7. The method according to claim 3, characterized in that, Based on the target annotation sequence that conforms to the label transition matrix, the category to which the contract performance entity in the contract text belongs is determined, including: Extract consecutive sequence label tags from the target label sequence, and concatenate the extracted sequence label tags to obtain an entity string; Based on the terminology mapping table, the entity string is mapped to a standard string; Based on the domain dictionary, the contract performance entity category to which the standard string belongs is determined as the category to which the contract performance entity in the contract text belongs.
8. A device for determining the type of contract performance entity, characterized in that, include: The acquisition unit is used to acquire the semantic vector sequence of the contract text; The first processing unit is used to perform multi-dimensional semantic enhancement on the semantic vector sequence based on a pre-built domain knowledge base to generate a fusion feature sequence of the contract text. The multi-dimensional semantic enhancement includes targeted modeling of character-level boundaries, term-level semantics, and clause-level logical structure of the semantic vector sequence. The second processing unit is used to perform target processing operations on the fused feature sequence to obtain a target feature sequence. The target processing operations include bidirectional sequence encoding, global feature enhancement, residual connection, and random deactivation. The bidirectional sequence encoding represents character-level features with bidirectional contextual dependencies generated by forward and inverse neural networks. The global feature enhancement is used to highlight key feature sequences and suppress redundant feature sequences. The random deactivation is used to set some feature dimensions in the feature sequence to zero. The third processing unit is used to perform global label decoding on the target feature sequence to obtain a target labeled sequence that conforms to the label transition matrix, wherein the label transition matrix is used to prohibit label transitions in the target feature sequence that violate contract semantic logic; The determining unit is used to determine the category to which the contract performance entity in the contract text belongs based on the target label sequence that conforms to the label transition matrix.
9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, wherein when the computer program is executed, the device in which the computer-readable storage medium is located performs the method for determining the type of contract performance entity as described in any one of claims 1 to 7.
10. An electronic device, characterized in that, It includes one or more processors and a memory for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors cause the one or more processors to perform the method for determining the type of contract performance entity as described in any one of claims 1 to 7.
11. A computer program product, characterized in that, Includes a computer program or instructions that, when executed by a processor, implement the method for determining the type of contract performance entity as described in any one of claims 1 to 7.