Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Method for Automatic Thesaurus Construction for Specific Software History Code Base

A technology for automatic construction and code base, applied in the field of automatic construction of thesaurus, can solve problems such as low accuracy and lack of pertinence in the knowledge base

Active Publication Date: 2018-02-27
YANGZHOU UNIV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The attention of developers and maintainers needs to understand the past system versions many times, and they often face such problems: which elements were defined by the past developers in the past system versions and what kind of differences existed between these elements relation
The above two better methods are to recommend related phrases horizontally, which is to build the knowledge base of the entire software field; but when we conduct code search or code maintenance for specific software systems, we should apply these knowledge bases Lack of pertinence will still result in inaccurate results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Method for Automatic Thesaurus Construction for Specific Software History Code Base
  • A Method for Automatic Thesaurus Construction for Specific Software History Code Base
  • A Method for Automatic Thesaurus Construction for Specific Software History Code Base

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The technical scheme of the present invention is described in detail below in conjunction with accompanying drawing:

[0035] Step 1). Extract the code and comments in the historical version library of the software system (this example uses the software system developed by java language) to generate an independent document corpus, and divide the corpus into a pure code document library and a pure annotation document library.

[0036] Step 2). Preprocess the pure code documents in the corpus, including tokenizing, removing stop words, and extracting elements (such as figure 2 Including identifiers, class names, method names, variable names), get words and phrases and their support in the code (Code-TF). In addition, in the process of tokenization, the inheritance relationship between classes (kind -of). Using the grammar of "+implements+" in java, based on the middle word "implements", the relationship between classes and interfaces (realize-of) is analyzed, and W\WG-...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention proposes a method for automatically constructing a thesaurus aimed at a specific software history code base. The method of the present invention uses the idea of ​​knowledge base construction. Refining all the historical code bases of the software system, extracting a lexicon (knowledge base) belonging to the software system, so as to obtain an efficient understanding of the code construction process of a software system. Mainly used for more accurate code search during code search. The present invention is beneficial for software maintenance personnel and system developers to understand the words or phrases used in the past versions of the system, as well as the certain relationship between the used words, to develop and maintain the system more effectively, and to promote the use of words in software codes consistency.

Description

technical field [0001] The invention proposes a method for automatically constructing a thesaurus aimed at a specific software history code base. It is mainly used in the process of software development and maintenance to understand the elements used in the code base of all past versions of the system and the relationship between them, which belongs to the field of software understanding. Background technique [0002] As software projects are developed, their complexity increases and so does their maintenance and understanding. The attention of developers and maintainers needs to understand the past system versions many times, and they often face such problems: which elements were defined by the past developers in the past system versions and what kind of differences existed between these elements relation. For other people who develop similar systems, they may search for systems with similar functions to the systems they are about to develop, to imitate the development of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/43
Inventor 孙小兵孙伟松李斌朱俊武杨辉
Owner YANGZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products