A method and system for extracting knowledge graph from software project data and answering questions

A software project, knowledge graph technology, applied in the field of computer software, can solve the problems of difficult analysis and mining, multi-source heterogeneity, lack of association, etc., and achieve the effect of good query effect

Active Publication Date: 2022-05-03
PEKING UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Aiming at the problems of multi-source heterogeneity, lack of correlation, and difficulty in analysis and mining of current software project data, the purpose of the present invention is to provide a method and system for extracting knowledge graphs from software project data and answering questions. The method and system provided by the present invention It can effectively automatically extract entities from multi-source heterogeneous data related to a software project, identify extensive associations between entities, form a knowledge graph, and provide automatic question answering support for it

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for extracting knowledge graph from software project data and answering questions
  • A method and system for extracting knowledge graph from software project data and answering questions

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0033] In this embodiment, the user needs to extract the knowledge map from the data of the open source software project Apache Lucene. Specific to various different types of data, including:

[0034] 82.4MB source code data;

[0035] 368MB git repository data;

[0036] 1.98GB defect report data;

[0037] 1.08GB mail data;

[0038] · 171MB StackOverflow Q&A document data.

[0039] Through module 1 and module 2, the present invention can automatically extract corresponding entities and association relationships from these data, and store them in the neo4j graph database. The following are some examples of extracted entities and relationships:

[0040] The class IndexReader is an entity, and the method maxDoc is also an entity. The former has an edge of type "declaration method" pointing to the latter;

[0041] The class AutomaticReader is an entity, and there is an edge of type "inheritance" pointing to the class IndexReader;

[0042] A developer entity named Alex can be...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a system for extracting a knowledge map from software project data and asking and answering questions. The method is as follows: for each type of software item data in the software item database, extract the entity and the association relationship between the entities from the type of software item data, and store them in a corresponding graph database; The traceability association technology performs associative processing on the data in each graph database to obtain the association relationship between entities of different types of software project data; according to the association relationship between entities of different types of software project data, in each graph database Add corresponding edges in , connect entities from different sources, and generate a knowledge graph of software project data; for the input natural language query sentence, query a matching connected subgraph from the knowledge graph as an answer. The invention solves the problems of lack of software project data association, serious information isolation phenomenon, and difficulty in simultaneous query and analysis.

Description

technical field [0001] The invention relates to a method and system for extracting a knowledge map from software project data and asking and answering questions, belonging to the technical field of computer software. Background technique [0002] Reuse of existing large-scale software projects is an important way to improve software productivity and software quality of software enterprises. The premise of successful software reuse is that the reuser can quickly and correctly learn and understand a large amount of relevant knowledge in software projects, such as domain concepts, system architecture, interface design, change history, and so on. This knowledge is contained in the multi-source heterogeneous data generated during the entire life cycle of a software project, such as: source code, requirements documents, design documents, version library, defect library, email records, forum discussions, technical blogs, etc. [0003] At present, a large number of researchers in t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/36G06F16/332G06F16/33G06F16/31G06F16/901G06F8/75
CPCG06F8/75
Inventor 谢冰林泽琦邹艳珍赵俊峰
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products