Civil aviation knowledge graph construction method and device and electronic equipment

By constructing a civil aviation knowledge graph and combining entity recognition, relation extraction, and knowledge fusion technologies, the problem of integrating multi-source data in the civil aviation field has been solved, enabling comprehensive organization and query services of civil aviation knowledge and improving the knowledge management capabilities of civil aviation airport construction.

CN116720582BActive Publication Date: 2026-06-23CHINA INT ENG CONSULTING CORP

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA INT ENG CONSULTING CORP
Filing Date
2023-05-04
Publication Date
2026-06-23

Smart Images

  • Figure CN116720582B_ABST
    Figure CN116720582B_ABST
Patent Text Reader

Abstract

The application provides a civil aviation knowledge graph construction method and device and electronic equipment. The method provides a data basis for civil aviation field multi-source knowledge fusion by acquiring civil aviation related data and civil aviation standard specification data. Entity recognition and relationship extraction are performed on the civil aviation related data to construct a first knowledge graph, and a second knowledge graph is constructed according to the entry information of the civil aviation standard specification data, thereby constructing knowledge graphs of different data sources in the civil aviation field. Different sources of knowledge graphs are fused to obtain a civil aviation knowledge graph, and the same entity description is processed to make different knowledge graphs interrelated.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of knowledge graph technology, and in particular to a method, apparatus and electronic device for constructing a civil aviation knowledge graph. Background Technology

[0002] With the increasing demand for intelligentization across various industries, the need to apply artificial intelligence to process and understand data is growing. Simultaneously, knowledge graph technology has also experienced rapid development. Domain knowledge graphs are used to organize, analyze, and mine knowledge within a specific domain and are already widely used in fields such as medicine and finance, but their application in the construction of civil aviation airports is relatively rare. Summary of the Invention

[0003] In view of this, the purpose of this application is to propose a method, apparatus and electronic device for constructing a civil aviation knowledge graph.

[0004] The first aspect of this application provides a method for constructing a civil aviation knowledge graph, including:

[0005] Obtain relevant civil aviation data and civil aviation standards and specifications;

[0006] Entity recognition and relationship extraction are performed on the civil aviation-related data, and a first knowledge graph is constructed based on the obtained entities and relationships;

[0007] A second knowledge graph is constructed based on the term information in the aforementioned civil aviation standards and specifications data;

[0008] The first knowledge graph and the second knowledge graph are fused to obtain a civil aviation knowledge graph.

[0009] A second aspect of this application provides an apparatus for constructing a civil aviation knowledge graph, comprising:

[0010] The data acquisition module is configured to acquire civil aviation-related data and civil aviation standard and specification data;

[0011] The first construction module is configured to perform entity recognition and relation extraction on the civil aviation-related data, and construct a first knowledge graph based on the obtained entities and relations.

[0012] The second construction module is configured to construct a second knowledge graph based on the term information in the civil aviation standard specification data.

[0013] The knowledge fusion module is configured to fuse the first knowledge graph and the second knowledge graph to obtain a civil aviation knowledge graph.

[0014] A third aspect of this application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable by the processor, wherein the processor implements the method described above when executing the computer program.

[0015] A fourth aspect of this application also provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the methods described above.

[0016] As described above, the civil aviation knowledge graph construction method, apparatus, electronic device, and storage medium provided in this application provide a data foundation for multi-source knowledge fusion in the civil aviation field by acquiring civil aviation-related data and civil aviation standard and specification data. Entity recognition and relationship extraction are performed on the civil aviation-related data to construct a first knowledge graph, and term information from the civil aviation standard and specification data is used to construct a second knowledge graph, thus constructing knowledge graphs from different data sources in the civil aviation field. Entity fusion is then performed on the knowledge graphs from different sources to obtain the civil aviation knowledge graph. Identical entity descriptions are standardized to enable connections between different knowledge graphs. Based on the civil aviation knowledge graph, services such as visualization and relational query can be provided, as well as graph-based knowledge question answering and causal tracing of civil aviation airport construction events. Attached Figure Description

[0017] To more clearly illustrate the technical solutions in this application or related technologies, the drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the drawings described below are only embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0018] Figure 1 This is a flowchart illustrating the method for constructing a civil aviation knowledge graph according to an embodiment of this application.

[0019] Figure 2 This is a schematic diagram of civil aviation-related data and civil aviation standard and specification data in an embodiment of this application;

[0020] Figure 3 This is a schematic diagram of the structure of the civil aviation knowledge graph construction device according to an embodiment of this application;

[0021] Figure 4 This is a schematic diagram of the hardware structure of an electronic device according to an embodiment of this application. Detailed Implementation

[0022] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with specific embodiments and the accompanying drawings.

[0023] It should be noted that, unless otherwise defined, the technical or scientific terms used in the embodiments of this application should have the ordinary meaning understood by one of ordinary skill in the art to which this application pertains. The terms "first," "second," and similar terms used in the embodiments of this application do not indicate any order, quantity, or importance, but are merely used to distinguish different components. Terms such as "comprising" or "including" mean that the element or object preceding the word encompasses the elements or objects listed after the word and their equivalents, without excluding other elements or objects. Terms such as "connected" or "linked" are not limited to physical or mechanical connections, but can include electrical connections, whether direct or indirect. Terms such as "upper," "lower," "left," and "right" are only used to indicate relative positional relationships; when the absolute position of the described object changes, the relative positional relationship may also change accordingly.

[0024] Civil aviation knowledge encompasses a wide range of fields, including geography, humanities, architecture, and aviation. Its sources include both structured data, such as a comparison table of current and revised airport master plans, and semi-structured and unstructured data, such as airport construction master plan review reports. Existing civil aviation knowledge graphs only extract and construct knowledge graphs from a subset of data sources, making it difficult to effectively organize knowledge across the entire civil aviation industry. Civil aviation knowledge includes both factual industry knowledge such as airport construction and route planning, as well as cognitive knowledge formed through scientific research in the field. Simply extracting and summarizing one aspect is insufficient to form a comprehensive understanding of civil aviation knowledge. To address this issue, the fusion of multi-source knowledge is necessary. However, the fusion of heterogeneous civil aviation data from multiple sources still faces technical bottlenecks, objectively hindering the integration of civil aviation knowledge. Therefore, this application proposes a method for constructing a civil aviation knowledge graph, and the embodiments of this application are described in detail below with reference to the accompanying drawings.

[0025] Figure 1 A flowchart of a method for constructing a civil aviation knowledge graph is provided, such as... Figure 1 The method for constructing a civil aviation knowledge graph includes the following steps:

[0026] Step 102: Obtain relevant civil aviation data and civil aviation standards and specifications data.

[0027] Specifically, civil aviation-related data mainly comes from the site selection consultation and review reports of various airports. These reports contain a series of factual entities related to airport construction, such as terminals, runways, and engineering costs, along with their attributes and relationships. They possess characteristics of both physicality and dynamism, and the types of entities described can be categorized into five dimensions: natural environment, engineering construction, engineering economics, airport structure, and social entities. The natural environment includes natural entities such as meteorological conditions, engineering geological conditions, and seismic conditions of the airport's location; engineering construction includes factors required for airport construction, such as air traffic volume and earthwork volume; engineering economics includes various costs required for airport construction, such as terminal area engineering fees; airport structure includes various man-made engineering projects, such as runways, terminals, and parking lots; and social entities include civil aviation airport management agencies and related individuals.

[0028] Civil aviation standards and specifications data include concepts, norms, and industry standards in the field of civil aviation airport construction. It can also be referred to as the discipline of civil aviation airport construction knowledge. It belongs to cognitive knowledge and has the characteristics of abstractness and universality. It is mainly based on the site selection specifications and general construction guidelines in the field of civil aviation airport construction, such as the "MH5001 Civil Airport Flight Area Technical Standards", "General Airport Construction Specifications", and "Civil Transport Airport Site Selection Specifications".

[0029] For example, Table 1 provides specific examples of civil aviation-related data and civil aviation standard and specification data.

[0030] Table 1. Examples of Civil Aviation-Related Data and Civil Aviation Standards and Specifications

[0031]

[0032] By acquiring data from the aforementioned different sources, and integrating various civil aviation business data and subject knowledge data, the association of knowledge related to civil aviation airport construction was achieved, providing a multi-source data foundation for the subsequent construction of a comprehensive knowledge graph.

[0033] Step 104: Perform entity recognition and relation extraction on the civil aviation-related data, and construct a first knowledge graph based on the obtained entities and relations.

[0034] Specifically, in a knowledge graph, actual things, objects, or concepts are called entities, and they are the basic building blocks of the knowledge graph. The "entity-relationship-entity" triple is the basic pattern used to represent entities and the relationships between them, and it is also the smallest unit of a knowledge graph. Structured data tables can be directly converted into triples. For unstructured data, entities and relations need to be extracted. The common approach is to first extract entities, then classify relations based on the extracted entities, and finally construct the first knowledge graph using these entities and relations.

[0035] Step 106: Construct a second knowledge graph based on the term information in the civil aviation standard and specification data.

[0036] Because civil aviation standard specifications contain structured information, a second knowledge graph can be directly constructed based on the hierarchical relationships, superior-inferior relationships, and reference relationships between terms, without the need for complex entity and relationship extraction to obtain entity information and the relationships between entities.

[0037] Step 108: Perform knowledge fusion on the first knowledge graph and the second knowledge graph to obtain a civil aviation knowledge graph.

[0038] Specifically, due to the different information sources of the first and second knowledge graphs, their knowledge description systems also differ slightly. Entities with the same semantics may have different expressions in different knowledge graphs, and entities with the same name may represent different objects. Therefore, it is necessary to perform knowledge fusion on the first and second knowledge graphs, unifying the descriptions of the same entity or concept from different data sources, enabling different knowledge graphs to communicate with each other, and ultimately integrating them into a single knowledge graph. Knowledge fusion can also be called ontology matching, ontology alignment, entity disambiguation, etc., but its essential purpose is the same. In this embodiment, knowledge fusion is not simply about splicing the two knowledge graphs together, but rather about further discovering entities with equivalent relationships in the two knowledge graphs, unifying and aligning them, merging them into one entity, and obtaining the civil aviation knowledge graph after fusion.

[0039] Based on steps 102 to 108 above, this embodiment provides a method for constructing a civil aviation knowledge graph. By acquiring civil aviation-related data and civil aviation standard and specification data, it provides a data foundation for multi-source knowledge fusion in the civil aviation field. Entity recognition and relationship extraction are performed on the civil aviation-related data to construct a first knowledge graph, and term information from the civil aviation standard and specification data is used to construct a second knowledge graph, thus constructing knowledge graphs from different data sources in the civil aviation field. The knowledge graphs from different sources are then fused to obtain the civil aviation knowledge graph. Identical entity descriptions are standardized to allow different knowledge graphs to be interconnected. Based on the civil aviation knowledge graph, services such as visualization and related queries can be provided, as well as graph-based question answering and causal tracing services for civil aviation airport construction events.

[0040] In some embodiments, the entity identification and relationship extraction of the civil aviation-related data includes:

[0041] A trained bidirectional long short-term memory neural network (BiLSTM) combined with a conditional random field (CRF) is used to perform entity recognition on the civil aviation-related data to obtain multiple entities;

[0042] Template matching and co-occurrence analysis are used to extract relationships between multiple entities to obtain the relationships between them.

[0043] In this embodiment, since the civil aviation-related data mainly comes from unstructured descriptive texts such as "Airport Construction Master Plan Review Reports," the following extraction method is required to accurately extract entities and relationships. Entity extraction employs a combination of a Bi-directional Long Short-Term Memory (BiLSTM) neural network and a Conditional Random Field (CRF). BiLSTM is a bidirectional recurrent neural network capable of capturing long-range contextual information from text, thus aiding in entity location. However, it lacks sentence-level feature analysis capabilities, requiring the assistance of CRF to improve the final annotation accuracy. Combining BiLSTM with CRF leverages the advantages of both, enhancing the accuracy of entity extraction. Entity types are predefined during entity recognition. After labeling a portion of the data according to these predefined entity types, the BiLSTM is trained. The trained BiLSTM can then identify entities in the input civil aviation-related data. After obtaining multiple entities through entity recognition, template matching and co-occurrence analysis are used to extract relationships between entities.

[0044] In some embodiments, the step of using template matching and co-occurrence analysis to extract relationships between multiple entities to obtain the relationships between the entities includes:

[0045] The co-occurrence analysis method is used to identify associated entities whose co-occurrence frequency exceeds a preset frequency threshold; the template matching method is used to add relationships to the associated entities.

[0046] Specifically, co-occurrence analysis determines the relationships between entities by constructing a co-occurrence matrix. Its basic assumption is that "closely related entities will appear simultaneously in multiple segments of the text." First, it uses statistical methods to extract the frequency of each entity's appearance in the text. Then, it analyzes the co-occurrence ratio of different entities. When the co-occurrence ratio of two entities exceeds a certain threshold, a relationship is considered to exist between them. The co-occurrence matrix is ​​shown in Table 2. The entities "Shanghai" and "Hongqiao Airport" co-occur twice, and the entities "Tianjin" and "Binhai Airport" co-occur three times. If the preset frequency threshold is 1, then "Shanghai" and "Hongqiao Airport" are related entities, and "Tianjin" and "Binhai Airport" are also related entities. This co-occurrence analysis method identifies all related entities in civil aviation-related data.

[0047] Table 2 Entity Matrix

[0048] Hubei Shanghai Guangzhou Tianjin Hongqiao Airport 0 2 0 0 Binhai Airport 0 0 0 3 Yancheng Airport 0 0 0 0

[0049] Template matching refers to constructing entity relationship templates based on contextual semantics, syntax, and parts of speech, according to the characteristics of entities and the relationships between them. Then, two given entities are matched against the templates to complete relationship extraction. In the extraction process, relationship templates for relationships between entities are first constructed based on the textual descriptions of the corpus. Constructing these templates requires detailed descriptions of different civil aviation airport construction objects, defining a descriptive system of knowledge entities and relationships for civil aviation airport construction. For example, in the knowledge entity-relationship system for civil aviation airport construction, relationships between natural objects include the hierarchical relationships between lakes, rivers, and oceans, and the hierarchical relationships between different meteorological conditions such as temperature and precipitation. Relationships between engineering construction and engineering economics include the cost attribute relationship between earthwork engineering and earthwork engineering costs, and the same object relationship between water supply in public facilities conditions and water supply in public facilities engineering costs. Relationships between social objects include the employment relationship between people and institutions, and the jurisdictional relationship between institutions and regions. Regarding the relationships between civil aviation airport construction objects, this includes the management relationships between institutions and natural objects and airport components.

[0050] Based on the knowledge entity-relationship system for civil aviation airport construction, relationship templates can be constructed. For example, to express the "location" relationship between an airport and a region, a relationship template can be built where [Airport Name] is located in [Region Name]. For instance, if the related entities are "Daxing Airport" and "Beijing," when extracting relationships between entities from texts where these related entities co-occur, this template can be used to match "Daxing Airport is located in Beijing," and the "location relationship" between the airport name "Daxing Airport" and the region name "Beijing" can be extracted. Examples of entity relationship templates are shown in Table 3. Table 3 provides four types of relationships between entities: "located," "jurisdiction," "appointment," and "reference," and correspondingly provides four template forms. By matching the text according to the template, if a match is successful, the corresponding entity relationship can be determined, completing the extraction of entity relationships. It should be noted that Table 3 is only for illustrative purposes and has no limiting effect. The number and type of templates can be set according to actual matching needs.

[0051] Table 3. Example template of relationships between entities

[0052] relation template lie in [Airport Name] is located in [Region Name] Jurisdiction [Organization Name] has jurisdiction over [Airport Name] Appointment [Individual Name] Employed at / Works in [Organization Name] Quote See [Referenced Object Name] for details.

[0053] It should be noted that the relation extraction method used in this embodiment is not applicable to other domains or corpora, but it has unique advantages in this embodiment. This advantage mainly stems from the fact that most of the relationships between entities extracted from civil aviation-related data can be directly distinguished by entity type. For example, the entity types "airport name" and "region name" have an obvious "location" relationship in the airport construction scenario. However, in other scenarios, the two types of entities may have various complex relationships, requiring more complex deep learning-based relation extraction algorithms. Therefore, given the characteristics of civil aviation-related data, adopting the above relation extraction method reduces the computational cost of relation extraction and also saves time.

[0054] In some embodiments, the training method of the bidirectional long short-term memory neural network BiLSTM includes:

[0055] Obtain civil aviation-related data;

[0056] The civil aviation-related data is analyzed and processed to obtain entity types for constructing the first knowledge graph. The entity types include at least natural environment, engineering construction, engineering economics, airport structure, and social objects.

[0057] A portion of the aforementioned civil aviation-related data was identified as sample data for civil aviation airport construction.

[0058] The civil aviation airport construction sample data is labeled based on the entity type to obtain labeled sample data;

[0059] The bidirectional long short-term memory neural network BiLSTM is trained based on the labeled sample data.

[0060] Specifically, the civil aviation-related data in this embodiment mainly comes from the site selection consultation and review reports of various airports. These reports contain a series of factual entities related to airport construction, such as terminals, runways, and engineering costs, along with their attributes and relationships, exhibiting characteristics of both entity and dynamic nature. The civil aviation-related data is analyzed and processed, resulting in five types of entities: natural environment, engineering construction, engineering economics, airport structure, and social objects. A portion of the civil aviation-related data is used as sample data for civil aviation airport construction, and the sample data is labeled according to the five preset entity types. The labeled sample data is then used to train a Bidirectional Long Short-Term Memory (BiLSTM) neural network to obtain a trained BiLSTM.

[0061] In some embodiments, constructing a second knowledge graph based on the term information of the civil aviation standard specification data includes:

[0062] Based on the term information, determine the subordinate relationships, association relationships, and reference relationships between different entity words, and construct the second knowledge graph based on the entity words, the subordinate relationships, the association relationships, and the reference relationships.

[0063] Specifically, before constructing the second knowledge graph, it is necessary to first construct a knowledge system of civil aviation airport construction standards and specifications. This knowledge system includes hierarchical relationships between terms, citation relationships between different standards and specifications, and citation relationships between different chapters within the same standard document. The relationships between terms are organized according to the chapter distribution of keywords. For example, in the "Civil Transport Airport Site Selection Specification," pre-selection can be divided into pre-selection site determination and pre-selection site analysis sub-items. Terms can also be associated through the same descriptive object. For instance, the meteorological condition factor in site determination in the "General Aviation Airport Construction Specification" and the meteorological condition analysis in pre-selection site analysis in the "Civil Transport Airport Site Selection Specification" can be associated through the same descriptive object, "meteorological conditions." Elements can also be associated through definitions, rules, and conditions. For example, based on the definitions of different types of airports, concepts such as commercial passenger flight missions and monthly takeoffs and landings can be linked. By constructing a knowledge system of civil aviation airport construction standards and specifications, the subordinate and related relationships between different entity words, as well as the reference relationships between different standards and specifications, can be determined, and a second knowledge graph can then be constructed.

[0064] In some embodiments, the knowledge fusion of the first knowledge graph and the second knowledge graph includes:

[0065] All triples in the first knowledge graph and the second knowledge graph are divided into three categories: first-class triples, second-class triples, and third-class triples. The first-class triples include head entity-inclusion relation-tail entity triples, the second-class triples include entity-attribute-attribute value triples, and the third-class triples include all triples except the first-class triples and the second-class triples.

[0066] Embedding learning is performed based on the first type of triplet, the second type of triplet, and the third type of triplet, respectively.

[0067] Joint learning is performed based on the first type of triples, the second type of triples, and the third type of triples that have undergone embedding learning, to obtain the embedding vectors corresponding to all triples in the first knowledge graph and the second knowledge graph.

[0068] Entity alignment is performed based on the embedding vectors corresponding to all the triples to complete the knowledge fusion.

[0069] Specifically, a knowledge graph can be viewed as a knowledge network composed of multiple triples of the type "entity-relationship-entity" or "entity-attribute-attribute value". Compared to "entity-relationship-entity" triples, attributes and attribute values ​​can better characterize the intrinsic features of entities and can therefore be used indirectly to identify entities. For example, consider two entities A and B: A: Longitude, 11.258, Latitude, 69.342; B: Longitude: 11.258B, Latitude: 69.342. Since entities A and B have the same attributes and attribute values, they are highly likely to be the same entity. Besides applying entities, attributes, and attribute values, in the civil aviation field, based on the data characteristics of the first and second knowledge graphs, both graphs contain a large number of "head entity-relationship-tail entity" triples with "containment" as the relation, such as: "public facility conditions-containment-water supply", "public facility conditions-containment-electricity supply", and "public facility conditions-containment-heating supply". Entities containing similar entities often have a certain correlation or are the same entity. Therefore, this embodiment aligns entities based on entity attribute values ​​and the embeddings of entities with special relationships (such as "inclusion" relationships). The first and second knowledge graphs are treated as a complete, unfused knowledge graph. First, triples are divided: entity-inclusion-entity triples and entity-attribute-attribute-value triples are classified as first-class triples and second-class triples, respectively. Other triples are classified as third-class triples, which include all entity-relation-entity triples excluding the entity-inclusion-entity triples. Then, embedding learning is performed on the first-class, second-class, and third-class triples to obtain the first embedding vector of entities in the first-class triples, the second embedding vector of entities in the second-class triples, and the third embedding vector of entities in the third-class triples. Since we divide the concatenated first and second knowledge graphs into three parts for triple embedding learning, some entities or attributes may appear in multiple types of triples simultaneously. Therefore, for the same entity or attribute, we may obtain up to three different embeddings, requiring joint learning of multiple embeddings to obtain a unified embedding vector. In this embodiment, joint learning is performed based on the first, second, and third types of triples to unify the vector space, obtaining the embedding vectors corresponding to all triples in the first and second knowledge graphs. Finally, entity alignment is performed based on the embedding vectors corresponding to all triples, unifying entities with the same semantics and completing the fusion of knowledge graphs.

[0070] In some embodiments, the embedding learning based on the first type of triplet, the second type of triplet, and the third type of triplet includes:

[0071] In the first type of triplet, different tail entities corresponding to the same head entity are combined to obtain a combined tail entity. Based on the first type of triplet, the TransE algorithm is used to determine the first loss function, and the first embedding vector is determined by minimizing the first loss function. The combined tail entity is encoded by a neural network model.

[0072] In the second type of triplet, the TransE algorithm is used to determine the second loss function based on the second type of triplet, and the second embedding vector is determined by minimizing the second loss function, wherein the attribute value is encoded by a neural network model;

[0073] In the third type of triplet, based on the third type of triplet and the frequency of each relation in the third type of triplet, the TransE algorithm is used to determine the third loss function, and the third embedding vector is determined by minimizing the third loss function.

[0074] Specifically, for the first type of triples, a given head entity often contains multiple tail entities. The traditional TransE algorithm is only suitable for handling one-to-one relationships and cannot handle one-to-many relationships. If the traditional TransE algorithm is used to process the relationship between head and tail entities, the tail entity corresponding to the head entity "Public Facility Conditions" might only include one of the tail entities "Water Supply / Electricity / Heating," failing to map "Water Supply / Electricity / Heating" to a single head entity "Public Facility Conditions." To address this issue, this embodiment constructs a corresponding combined tail entity for each head entity with an "inclusion" relationship. This combined tail entity is the set of all "included" entities corresponding to that head entity. For example, for the following triple set: "Public Facility Conditions - Includes - Water Supply," "Public Facility Conditions - Includes - Electricity," and "Public Facility Conditions - Includes - Heating," a combined tail entity "Water Supply / Electricity / Heating" is constructed, and then the three triples are combined into a single triple "Public Facility Conditions - Includes - Water Supply / Electricity / Heating."

[0075] Then, based on the TransE algorithm, the "containment" relationship is understood as a path from a given head entity to a contained tail entity. All combined tail entities pointed to by "containment" relationships are encoded using a BERT neural network model. Encoding via the BERT neural network model is more effective than simple one-hot encoding in identifying the relationships between entities, thus bridging the distance between similar entities (such as water supply / electricity supply / heat supply) in the vector space. The first loss function, Loss1, is defined as follows:

[0076]

[0077]

[0078] in, This is a string concatenated by separating entity words contained in a combined tail entity with commas, such as "water supply, power supply, heating". For negative examples, combine the tail entity strings. T r T represents the set of all positive triples, which is the set of all triples that actually exist after the first and second knowledge graphs are concatenated. r ′ The set represents all negative triples, which can be obtained by arbitrarily replacing the head or tail entity with other entities. γ represents the distance between positive and negative samples, and f() is the distance function, which can be either the first norm or the second norm. h1 is the embedding vector of the head entity in the first type of triple, and r1 is the embedding vector of the inclusion relation. Minimizing the first loss function yields the first embedding vector h1.

[0079] For the second type of triples, the TransE algorithm is directly applied, treating the attribute as a path from the given head entity to the attribute value. However, since there are cases in reality where the attribute values ​​have the same semantics but different textual descriptions, the BERT neural network model is still used to encode the attribute values. The second loss function, Loss2, is defined as follows:

[0080]

[0081]

[0082] Among them, T r T represents the set of all positive triples, which is the set of all triples that actually exist after the first and second knowledge graphs are concatenated. r ′ This represents the set of all negative triples, which can be obtained by arbitrarily replacing the head or tail entity with other entities. Indicates the attribute value. Let γ represent the negative example attribute value, γ represent the distance between positive and negative examples, and f() be the distance function, which can be either the first norm or the second norm. h2 is the embedding vector of the head entity in the second type of triple, and r2 is the embedding vector of the attribute. After minimizing the second loss function, the second embedding vector h2 is obtained.

[0083] It should be noted that the above BERT neural network model is only illustrative and has no limiting effect. Those skilled in the art can replace the BERT neural network model with other types of neural network models according to actual coding needs.

[0084] For the third type of triples, TransE is also used to learn the embeddings of the third type of triples. To ensure the accuracy of the learned embeddings, triples that appear in both the first and second knowledge graphs are assigned higher weights. The third loss function, Loss3, is defined as follows:

[0085]

[0086]

[0087] Among them, T r T′ represents the set of all positive triples, which is the set of all actual triples that exist after the first and second knowledge graphs are concatenated. r This represents the set of all negative triples, which can be obtained by arbitrarily replacing the head or tail entity with other entities. Represents the tail entity in a third type of triplet. Let be the negative tail entity, γ represent the distance between positive and negative samples, and f() be the distance function, which can be either the first norm or the second norm. h3 is the embedding vector of the head entity in the third-class triplet, and r3 is the embedding vector of the relation in the third-class triplet. The embedding vector of the tail entity in the third type of triplet is represented by h3. After minimizing the third loss function, the third embedding vector h3 is obtained.

[0088] In some embodiments, the joint learning based on the first type of triples, the second type of triples, and the third type of triples that have undergone embedding learning to obtain the embedding vectors corresponding to all triples in the first knowledge graph and the second knowledge graph includes:

[0089] A fourth loss function is determined based on the first embedding vector, the second embedding vector, and the third embedding vector;

[0090] The total loss function is determined based on the first loss function, the second loss function, the third loss function, and the fourth loss function. The total loss function is minimized to obtain the embedding vectors corresponding to all triples in the first knowledge graph and the second knowledge graph.

[0091] Specifically, the fourth loss function, Loss4, is defined as follows:

[0092]

[0093] Where G1 represents the first knowledge graph, G2 represents the second knowledge graph, h1 represents the first embedding vector, h2 represents the second embedding vector, and h3 represents the third embedding vector; the total loss function Loss is defined as follows:

[0094] Loss=Loss1+Loss2+Loss3+Loss4 (8)

[0095] After minimizing the total loss function, we obtain the embedding vectors corresponding to all triples in the first knowledge graph and the second knowledge graph.

[0096] In some embodiments, the entity alignment based on the embedding vectors corresponding to all triples to complete the knowledge fusion includes:

[0097] For each first entity in the first knowledge graph, the second entity in the second knowledge graph whose embedding vector has the highest similarity to the first entity and exceeds a preset similarity threshold is taken as the entity to be fused from the first entity.

[0098] The first entity is aligned with the entity to be merged to complete the knowledge fusion.

[0099] Specifically, through the aforementioned three types of triplet embedding learning and joint learning, semantically identical entities in the first and second knowledge graphs are embedded into vectors with high similarity. Therefore, the resulting embedding vectors can be used for entity alignment. For the first entity h1 in the first knowledge graph and its corresponding embedding vector h1, by calculating the similarity between the two vectors, the second entity h2 corresponding to the vector h2 in the second knowledge graph that is most similar to h1 is found.

[0100]

[0101] Among them, h candidate Let h represent the entity to be merged, and G2 represent the second knowledge graph. However, the entity to be merged h... candidate It is not necessarily an entity equivalent to the first entity h1; further comparison of the similarity cos(h) between the entity to be merged and the first entity is needed. candidate The similarity threshold is set to 0.9 in this embodiment. If the similarity between the entity to be fused and the first entity is greater than 0.9, then the entity to be fused in the graph G2 is considered to be h1). candidate It is an equivalent entity to the first entity h1 in the first knowledge graph G1, and can be further merged into a single entity.

[0102] Figure 2This diagram illustrates civil aviation-related data and civil aviation standards and specifications. In some embodiments, in addition to the entity alignment described above, a relationship can be established between civil aviation airport construction objects and standards and specifications, providing a basis for constructing a civil aviation knowledge graph. Based on the hierarchical classification of the civil aviation standards and specifications knowledge system, the relationship between civil aviation objects and their related standards and specifications in the civil aviation knowledge entity-relationship system can be established, for example, as shown below. Figure 2 As shown, for examples of public facilities conditions in engineering construction (such as "water supply"), they can be associated with the term "5.3.10 The analysis of site public facilities conditions should explain the source of power supply, water supply, heating, gas supply, communication and other conditions and the preliminary construction plan" in the standard and specification knowledge system.

[0103] In some embodiments, if multiple data sources describe the same information simultaneously in civil aviation-related data, such as flight volume information for the same flight from multiple airports, it is necessary to determine the credibility of each data source. Typically, the credibility of a data source is determined by the typical attribute values ​​of that information. For example, flight volume information typically ranges from tens of thousands; if a data source's range is only in the thousands, then that data source is considered unreliable, has poor credibility, is unreasonable, and cannot be used for subsequent knowledge graph construction.

[0104] It should be noted that the method in this embodiment can be executed by a single device, such as a computer or server. The method can also be applied in a distributed scenario, where multiple devices cooperate to complete the task. In such a distributed scenario, one of these devices may execute only one or more steps of the method in this embodiment, and the multiple devices will interact with each other to complete the method described.

[0105] It should be noted that the above description describes some embodiments of this application. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recorded in the claims can be performed in a different order than that shown in the above embodiments and still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require a specific or sequential order to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

[0106] This application also provides a device for constructing a civil aviation knowledge graph.

[0107] refer to Figure 3 The apparatus for constructing the civil aviation knowledge graph includes:

[0108] Data acquisition module 302 is configured to acquire civil aviation-related data and civil aviation standard and specification data;

[0109] The first construction module 304 is configured to perform entity recognition and relation extraction on the civil aviation-related data, and construct a first knowledge graph based on the obtained entities and relations.

[0110] The second construction module 306 is configured to construct a second knowledge graph based on the term information of the civil aviation standard specification data.

[0111] The knowledge fusion module 308 is configured to perform knowledge fusion on the first knowledge graph and the second knowledge graph to obtain a civil aviation knowledge graph.

[0112] In some embodiments, the first construction module 304 is further configured to use a trained bidirectional long short-term memory neural network (BiLSTM) combined with a conditional random field (CRF) to perform entity recognition on the civil aviation-related data to obtain a plurality of entities; and to use template matching and co-occurrence analysis to extract relationships from the plurality of entities to obtain the relationships.

[0113] In some embodiments, the first construction module 304 is further configured to use the co-occurrence analysis method to determine the associated entities whose co-occurrence frequency exceeds a preset frequency threshold among the plurality of entities; and to use the template matching method to add relationships to the associated entities.

[0114] In some embodiments, the system further includes a model training module 310, which is configured to: acquire civil aviation-related data; analyze and process the civil aviation-related data to obtain entity types for constructing a first knowledge graph, wherein the entity types include at least natural environment, engineering construction, engineering economics, airport composition, and social objects; determine a portion of the civil aviation-related data as civil aviation airport construction sample data; annotate the civil aviation airport construction sample data based on the entity types to obtain annotated sample data; and train the bidirectional long short-term memory neural network BiLSTM based on the annotated sample data.

[0115] In some embodiments, the second construction module 306 is further configured to determine the subordinate relationships, association relationships, and reference relationships between different entity words based on the term information, and to construct the second knowledge graph based on the subordinate relationships, the association relationships, and the reference relationships.

[0116] In some embodiments, the knowledge fusion module 308 is further configured to divide all triples in the first knowledge graph and the second knowledge graph to obtain a first type of triple, a second type of triple, and a third type of triple. The first type of triple includes a head entity-inclusion relation-tail entity triple, the second type of triple includes an entity-attribute-attribute value triple, and the third type of triple includes all triples except the first type of triple and the second type of triple.

[0117] Embedding learning is performed based on the first type of triplet, the second type of triplet, and the third type of triplet, respectively.

[0118] Joint learning is performed based on the first type of triples, the second type of triples, and the third type of triples that have undergone embedding learning, to obtain the embedding vectors corresponding to all triples in the first knowledge graph and the second knowledge graph.

[0119] Entity alignment is performed based on the embedding vectors corresponding to all the triples to complete the knowledge fusion.

[0120] In some embodiments, the knowledge fusion module 308 is further configured to combine different tail entities corresponding to the same head entity in the first type of triplet to obtain a combined tail entity, determine a first loss function based on the first type of triplet using the TransE algorithm, and determine a first embedding vector by minimizing the first loss function, wherein the combined tail entity is encoded by a neural network model.

[0121] In the second type of triplet, the TransE algorithm is used to determine the second loss function based on the second type of triplet, and the second embedding vector is determined by minimizing the second loss function, wherein the attribute value is encoded by a neural network model;

[0122] In the third type of triplet, based on the third type of triplet and the frequency of each relation in the third type of triplet, the TransE algorithm is used to determine the third loss function, and the third embedding vector is determined by minimizing the third loss function.

[0123] In some embodiments, the knowledge fusion module 308 is further configured to determine a fourth loss function based on the first embedding vector, the second embedding vector, and the third embedding vector;

[0124] The total loss function is determined based on the first loss function, the second loss function, the third loss function, and the fourth loss function. The total loss function is minimized to obtain the embedding vectors corresponding to all triples in the first knowledge graph and the second knowledge graph.

[0125] In some embodiments, the knowledge fusion module 308 is further configured to, for each first entity in the first knowledge graph, take the second entity in the second knowledge graph whose embedding vector with the highest similarity to the first entity and exceeds a preset similarity threshold as the entity to be fused for the first entity.

[0126] The first entity is aligned with the entity to be merged to complete the knowledge fusion.

[0127] For ease of description, the above devices are described in terms of function, divided into various modules. Of course, in implementing this application, the functions of each module can be implemented in one or more software and / or hardware.

[0128] The apparatus described above is used to implement the corresponding civil aviation knowledge graph construction method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiments, which will not be repeated here.

[0129] This application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the method for constructing a civil aviation knowledge graph as described in any of the preceding embodiments.

[0130] Figure 4 This embodiment illustrates a more specific hardware structure of an electronic device, which may include a processor 1010, a memory 1020, an input / output interface 1030, a communication interface 1040, and a bus 1050. The processor 1010, memory 1020, input / output interface 1030, and communication interface 1040 are interconnected internally via the bus 1050.

[0131] The processor 1010 can be implemented using a general-purpose CPU (Central Processing Unit), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this specification.

[0132] The memory 1020 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc. The memory 1020 can store the operating system and other applications. When the technical solutions provided in the embodiments of this specification are implemented by software or firmware, the relevant program code is stored in the memory 1020 and is called and executed by the processor 1010.

[0133] The input / output interface 1030 is used to connect input / output modules to realize information input and output. Input / output modules can be configured as components within the device (not shown in the figure) or externally connected to the device to provide corresponding functions. Input devices may include keyboards, mice, touchscreens, microphones, various sensors, etc., while output devices may include displays, speakers, vibrators, indicator lights, etc.

[0134] The communication interface 1040 is used to connect a communication module (not shown in the figure) to enable communication between this device and other devices. The communication module can communicate via wired means (such as USB, Ethernet cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).

[0135] Bus 1050 includes a pathway for transmitting information between various components of the device, such as processor 1010, memory 1020, input / output interface 1030, and communication interface 1040.

[0136] It should be noted that although the above-described device only shows the processor 1010, memory 1020, input / output interface 1030, communication interface 1040, and bus 1050, in specific implementations, the device may also include other components necessary for normal operation. Furthermore, those skilled in the art will understand that the above-described device may only include the components necessary for implementing the embodiments of this specification, and not necessarily all the components shown in the figures.

[0137] The electronic devices described above are used to implement the corresponding civil aviation knowledge graph construction method in any of the foregoing embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be repeated here.

[0138] This application also provides a non-transitory computer-readable storage medium that stores computer instructions for causing the computer to execute the civil aviation knowledge graph construction method as described in any of the above embodiments.

[0139] The computer-readable medium of this embodiment includes permanent and non-permanent, removable and non-removable media, and information storage can be implemented by any method or technology. Information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transfer medium that can be used to store information accessible by a computing device.

[0140] The computer instructions stored in the storage medium of the above embodiments are used to cause the computer to execute the civil aviation knowledge graph construction method as described in any of the above embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be repeated here.

[0141] Those skilled in the art should understand that the discussion of any of the above embodiments is merely exemplary and is not intended to imply that the scope of this application (including the claims) is limited to these examples; within the framework of this application, the technical features of the above embodiments or different embodiments can also be combined, the steps can be implemented in any order, and there are many other variations of different aspects of the embodiments of this application as described above, which are not provided in the details for the sake of brevity.

[0142] Additionally, to simplify the description and discussion, and to avoid obscuring the embodiments of this application, the well-known power / ground connections to integrated circuit (IC) chips and other components may or may not be shown in the provided drawings. Furthermore, the apparatus may be shown in block diagram form to avoid obscuring the embodiments of this application, and this also takes into account the fact that the details of the implementation of these block diagram apparatuses are highly dependent on the platform on which the embodiments of this application will be implemented (i.e., these details should be fully understood by those skilled in the art). While specific details (e.g., circuits) have been set forth to describe exemplary embodiments of this application, it will be apparent to those skilled in the art that the embodiments of this application can be implemented without these specific details or with variations thereof. Therefore, these descriptions should be considered illustrative rather than restrictive.

[0143] Although this application has been described in conjunction with specific embodiments thereof, many substitutions, modifications, and variations of these embodiments will be apparent to those skilled in the art from the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may be used with the embodiments discussed.

[0144] The embodiments of this application are intended to cover all such substitutions, modifications, and variations that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the embodiments of this application should be included within the protection scope of this application.

Claims

1. A method for constructing a civil aviation knowledge graph, characterized in that, include: Obtain relevant civil aviation data and civil aviation standards and specifications; Entity recognition and relationship extraction are performed on the civil aviation-related data, and a first knowledge graph is constructed based on the obtained entities and relationships; A second knowledge graph is constructed based on the term information in the aforementioned civil aviation standards and specifications data; The first knowledge graph and the second knowledge graph are fused to obtain a civil aviation knowledge graph; The knowledge fusion of the first knowledge graph and the second knowledge graph includes: All triples in the first knowledge graph and the second knowledge graph are divided into three categories: first-class triples, second-class triples, and third-class triples. The first-class triples include head entity-inclusion relation-tail entity triples, the second-class triples include entity-attribute-attribute value triples, and the third-class triples include all triples except the first-class triples and the second-class triples. Embedding learning is performed based on the first type of triplet, the second type of triplet, and the third type of triplet, respectively. Joint learning is performed based on the first type of triples, the second type of triples, and the third type of triples that have undergone embedding learning, to obtain the embedding vectors corresponding to all triples in the first knowledge graph and the second knowledge graph. Entity alignment is performed based on the embedding vectors corresponding to all the triples to complete the knowledge fusion. The embedding learning based on the first type of triplet, the second type of triplet, and the third type of triplet includes: In the first type of triplet, different tail entities corresponding to the same head entity are combined to obtain a combined tail entity. Based on the first type of triplet, the TransE algorithm is used to determine the first loss function, and the first embedding vector is determined by minimizing the first loss function. The combined tail entity is encoded by a neural network model. In the second type of triplet, the TransE algorithm is used to determine the second loss function based on the second type of triplet, and the second embedding vector is determined by minimizing the second loss function. The attribute values ​​in the second type of triplet are encoded by a neural network model. In the third type of triplet, based on the third type of triplet and the frequency of each relation in the third type of triplet in all triplets, the TransE algorithm is used to determine the third loss function, and the third embedding vector is determined by minimizing the third loss function.

2. The method according to claim 1, characterized in that, The entity identification and relationship extraction of the civil aviation-related data includes: A trained bidirectional long short-term memory neural network (BiLSTM) combined with a conditional random field (CRF) is used to perform entity recognition on the civil aviation-related data to obtain multiple entities; Template matching and co-occurrence analysis are used to extract relationships between multiple entities to obtain the relationships between them.

3. The method according to claim 2, characterized in that, The method of using template matching and co-occurrence analysis to extract relationships between multiple entities to obtain the relationships between entities includes: The co-occurrence analysis method is used to identify associated entities whose co-occurrence frequency exceeds a preset frequency threshold among the multiple entities; The template matching method is used to add relationships to the associated entities.

4. The method according to claim 1, characterized in that, The construction of the second knowledge graph based on the term information in the civil aviation standard and specification data includes: Based on the term information, determine the subordinate relationships, association relationships, and reference relationships between different entity words, and construct the second knowledge graph based on the entity words, the subordinate relationships, the association relationships, and the reference relationships.

5. The method according to claim 1, characterized in that, The step of jointly learning based on the first type of triples, the second type of triples, and the third type of triples after embedding learning to obtain the embedding vectors corresponding to all triples in the first knowledge graph and the second knowledge graph includes: A fourth loss function is determined based on the first embedding vector, the second embedding vector, and the third embedding vector; The total loss function is determined based on the first loss function, the second loss function, the third loss function, and the fourth loss function. The total loss function is minimized to obtain the embedding vectors corresponding to all triples in the first knowledge graph and the second knowledge graph.

6. The method according to claim 1, characterized in that, The entity alignment based on the embedding vectors corresponding to all the triples to complete the knowledge fusion includes: For each first entity in the first knowledge graph, the second entity in the second knowledge graph whose embedding vector has the highest similarity to the first entity and exceeds a preset similarity threshold is taken as the entity to be fused from the first entity. The first entity is aligned with the entity to be merged to complete the knowledge fusion.

7. A device for constructing a civil aviation knowledge graph, characterized in that, include: The data acquisition module is configured to acquire civil aviation-related data and civil aviation standard and specification data; The first construction module is configured to perform entity recognition and relation extraction on the civil aviation-related data, and construct a first knowledge graph based on the obtained entities and relations. The second construction module is configured to construct a second knowledge graph based on the term information in the civil aviation standard specification data. The knowledge fusion module is configured to fuse the first knowledge graph and the second knowledge graph to obtain a civil aviation knowledge graph; The knowledge fusion of the first knowledge graph and the second knowledge graph includes: All triples in the first knowledge graph and the second knowledge graph are divided into three categories: first-class triples, second-class triples, and third-class triples. The first-class triples include head entity-inclusion relation-tail entity triples, the second-class triples include entity-attribute-attribute value triples, and the third-class triples include all triples except the first-class triples and the second-class triples. Embedding learning is performed based on the first type of triplet, the second type of triplet, and the third type of triplet, respectively. Joint learning is performed based on the first type of triples, the second type of triples, and the third type of triples that have undergone embedding learning, to obtain the embedding vectors corresponding to all triples in the first knowledge graph and the second knowledge graph. Entity alignment is performed based on the embedding vectors corresponding to all the triples to complete the knowledge fusion. The embedding learning based on the first type of triplet, the second type of triplet, and the third type of triplet includes: In the first type of triplet, different tail entities corresponding to the same head entity are combined to obtain a combined tail entity. Based on the first type of triplet, the TransE algorithm is used to determine the first loss function, and the first embedding vector is determined by minimizing the first loss function. The combined tail entity is encoded by a neural network model. In the second type of triplet, the TransE algorithm is used to determine the second loss function based on the second type of triplet, and the second embedding vector is determined by minimizing the second loss function. The attribute values ​​in the second type of triplet are encoded by a neural network model. In the third type of triplet, based on the third type of triplet and the frequency of each relation in the third type of triplet in all triplets, the TransE algorithm is used to determine the third loss function, and the third embedding vector is determined by minimizing the third loss function.

8. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the method as described in any one of claims 1 to 6.