Automatic construction implementation method of software engineering knowledge base based on semi-supervised learning

A semi-supervised learning and software engineering technology, applied in the field of automatic construction and realization, can solve problems such as relatively sparse relationships between concepts, a large amount of manual effort, and difficulty in achieving high accuracy, so as to improve the scale and quality of knowledge and reduce consumption Effect

Active Publication Date: 2017-06-20
SHANGHAI JIAO TONG UNIV
View PDF8 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The present invention aims at the fact that it is difficult to achieve high accuracy and large scale in relationship extraction in the prior art, the number of concepts is difficult to reach a large scale, the relationship between concepts is relatively sparse, and a large amount of investment is required to complete sample construction through manual labeling. In order to solve the problem of artificial energy, an automatic construction method of software engineering knowledge base based on semi-supervised learning is proposed. Through the semi-supervised automatic construction method, the artificial energy and time cost of building software engineering knowledge base can be reduced; The domain knowledge base has a larger scale and better quality, which solves the lack and insufficiency of the current software domain knowledge base

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic construction implementation method of software engineering knowledge base based on semi-supervised learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Such as figure 1 As shown, this embodiment includes the following steps:

[0031] Step 1. Use the software engineering field tags provided on StackOverflow as the seed vocabulary, and obtain the concept data set provided by Wikipedia, and iteratively propagate through the seed vocabulary tags to expand all the software engineering field concepts on Wikipedia and obtain the software engineering containing the wiki structure Domain knowledge collection.

[0032] The concept data set refers to: based on the original StackOverflow tags and Wikipedia concepts, both exist in the form of XML data sources. This embodiment uses JAVA as the programming language and uses SAX tools to parse the XML files to obtain the seed vocabulary of the software engineering field. And Wikipedia concept data set.

[0033] The iterative propagation of tags refers to: starting from the constructed seed vocabulary in the software engineering field, propagation is carried out in multiple iterations, and e...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an automatic construction implementation method of a software engineering knowledge base based on semi-supervised learning, and solves the problems that the knowledge base of the field of the software engineering is rare at present, the number of concepts is difficult to reach the degree of large scale, the concept-to-concept relation is sparse, and investment of a large amount of labor energy is required. The method is implemented by the steps that firstly, the concept set of the field of the software engineering is expanded by using the Wikipedia and StackOverflow data source according to tag propagation; secondly, machine learning characteristics for performing hyponymy relation extraction on the concepts of the field of the software engineering are constructed; thirdly, positive example and negative example training data of relation extraction are automatically marked by adopting the method of template matching and rule matching; fourthly, the concept-to-concept relation extraction work is performed according to the iterative semi-supervised learning method, and the extraction result of each time of iteration is optimized by combining the evaluation rules; and fifthly, standardized construction of the knowledge base is performed by using the RDF language.

Description

Technical field [0001] The present invention relates to a technology in the field of software engineering, in particular to a method for automatically constructing a software engineering knowledge base based on semi-supervised learning. Background technique [0002] Today's society is an era in which the Semantic Web is the main direction of future development. Constructing Web information that can be understood and processed by computers has become a very important task at this stage. The knowledge base (Knowledge Base), as a knowledge collection composed of concepts, entities, and relationships, makes it more and more important in application value and industrial value in the flourishing environment of information retrieval, knowledge question and answer, etc. The software engineering domain knowledge base, as an important branch of the knowledge base, also highlights its irreplaceable role. Especially in the fields of defect prediction, semantic relevance calculation, text co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N5/02G06F17/30
CPCG06N5/022G06F16/21
Inventor 董翔沈备军陈凯
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products