Automatic thesaurus construction method for specific software historical code library

A technology of automatic construction and code base, applied in the field of automatic construction of thesaurus, which can solve the problems of lack of pertinence and low accuracy of knowledge base

Active Publication Date: 2015-10-21
YANGZHOU UNIV
View PDF2 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The attention of developers and maintainers needs to understand the past system versions many times, and they often face such problems: which elements were defined by the past developers in the past system versions and what kind of differences existed between these elements relation
The above two better methods are to recommend related phrases horizontally, which is to build the knowledge base of the entire software field; but when we conduct code search or code maintenance for specific software systems, we should apply these knowledge bases Lack of pertinence will still result in inaccurate results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic thesaurus construction method for specific software historical code library
  • Automatic thesaurus construction method for specific software historical code library
  • Automatic thesaurus construction method for specific software historical code library

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The technical scheme of the present invention is described in detail below in conjunction with accompanying drawing:

[0035] Step 1). Extract the code and comments in the historical version library of the software system (this example uses the software system developed by java language) to generate an independent document corpus, and divide the corpus into a pure code document library and a pure annotation document library.

[0036] Step 2). Preprocess the pure code documents in the corpus, including tokenizing, removing stop words, and extracting elements (such as figure 2 Including identifiers, class names, method names, variable names), get words and phrases and their support in the code (Code-TF). In addition, in the process of tokenization, the inheritance relationship between classes (kind -of). Using the grammar of "+implements+" in java, based on the middle word "implements", the relationship between classes and interfaces (realize-of) is analyzed, and W\WG-...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an automatic thesaurus construction method for a specific software historical code library. The method provided by the invention uses an idea of knowledge base construction. All historical code libraries of a software system are refined, and a thesaurus (a knowledge base) belonging to the software system is abstracted, so that an efficient understanding of a code construction process of the software system is acquired. The method provided by the invention is mainly used for more accurate code search in a code search process. According to the invention, the method help software maintenance personnel and system developers know words or phrases used in previous versions of the system, and relationships among the used words, so that the system can be developed and maintained more effectively and consistency of words used in system codes is improved.

Description

technical field [0001] The invention proposes a method for automatically constructing a thesaurus aimed at a specific software history code base. It is mainly used in the process of software development and maintenance to understand the elements used in the code base of all past versions of the system and the relationship between them, which belongs to the field of software understanding. Background technique [0002] As software projects are developed, their complexity increases and so does their maintenance and understanding. The attention of developers and maintainers needs to understand the past system versions many times, and they often face such problems: which elements were defined by the past developers in the past system versions and what kind of differences existed between these elements relation. For other people who develop similar systems, they may search for systems with similar functions to the systems they are about to develop, to imitate the development of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/43
Inventor 孙小兵孙伟松李斌朱俊武杨辉
Owner YANGZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products