Implementation method of automatic construction of software engineering knowledge base based on semi-supervised learning

A semi-supervised learning and software engineering technology, applied in the field of automatic construction and realization, can solve problems such as the sparse relationship between concepts, the difficulty in reaching the number of concepts, and the difficulty in achieving high accuracy

Active Publication Date: 2021-06-15
SHANGHAI JIAOTONG UNIV
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The present invention aims at the fact that it is difficult to achieve high accuracy and large scale in relationship extraction in the prior art, the number of concepts is difficult to reach a large scale, the relationship between concepts is relatively sparse, and a large amount of investment is required to complete sample construction through manual labeling. In order to solve the problem of artificial energy, an automatic construction method of software engineering knowledge base based on semi-supervised learning is proposed. Through the semi-supervised automatic construction method, the artificial energy and time cost of building software engineering knowledge base can be reduced; The domain knowledge base has a larger scale and better quality, which solves the lack and insufficiency of the current software domain knowledge base

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Implementation method of automatic construction of software engineering knowledge base based on semi-supervised learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] like figure 1 As shown, this embodiment includes the following steps:

[0031] Step 1. Use the software engineering field tags provided on StackOverflow as the seed vocabulary, and obtain the concept data set provided by Wikipedia. Through the iterative propagation of the seed vocabulary tags, expand all the software engineering field concepts on Wikipedia and obtain the software engineering that includes the wiki structure A collection of domain knowledge.

[0032] The concept data set refers to: based on the original StackOverflow tags and Wikipedia concepts, both exist in the form of XML data sources. In this embodiment, JAVA is used as the programming language, and the seed vocabulary in the field of software engineering is obtained by using SAX tools to parse XML files respectively. and the Wikipedia concept dataset.

[0033] The label iterative propagation refers to: starting from the seed vocabulary in the field of software engineering constructed, it is propag...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An automatic construction method of software engineering knowledge base based on semi-supervised learning, which solves the problem that the knowledge base in the field of software engineering is relatively scarce at this stage, the number of concepts is difficult to reach a large scale, the relationship between concepts is relatively sparse, and a large amount of manual effort is required at the same time. Problem, the present invention is through: 1, according to tag dissemination, adopt Wikipedia and StackOverflow data source to expand the concept collection of software engineering field; The method of matching and rule matching automatically labels the positive and negative training data of relationship extraction; 4. Extract the relationship between concepts according to the iterative semi-supervised learning method, and combine the evaluation rules to optimize the extraction results of each iteration; 5. Using RDF language to standardize the construction of knowledge base can be realized.

Description

technical field [0001] The present invention relates to a technology in the field of software engineering, in particular to a method for realizing automatic construction of a software engineering knowledge base based on semi-supervised learning. Background technique [0002] In today's society, the Semantic Web is the main direction of future development. Constructing Web information that can be understood and processed by computers has become a very important task at this stage. The knowledge base (Knowledge Base), as a knowledge collection composed of concepts, entities, and relationships, makes it more and more important in application and industrial value in the environment of information retrieval and knowledge question answering. As an important branch of the knowledge base, the knowledge base in the field of software engineering also plays an irreplaceable role. Especially in the fields of defect prediction, semantic correlation calculation, text correctness analysis...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06N5/02G06F16/21
CPCG06N5/022G06F16/21
Inventor 董翔沈备军陈凯
Owner SHANGHAI JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products