Unlock instant, AI-driven research and patent intelligence for your innovation.

Method for constructing code bad smell training dataset by combining with code evolution information

A technology for training data sets and codes, applied in code compilation, program code conversion, neural learning methods, etc., can solve problems such as low reliability of data sets and inability to generate large-scale data sets, so as to avoid overfitting and improve The ability to predict bad smells and the effect of improving the predictive ability

Active Publication Date: 2018-06-01
SUN YAT SEN UNIV
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The method for constructing code bad smell training data sets combined with code evolution information provided by the present invention uses existing tools to detect the entities of the baseline version and the control version of the same software, and extract the changed bad smell entities and non-bad smell entities in the process of code evolution Entity construction of training data sets can solve the problem of low credibility of data sets generated by existing tools, and also solve the problem that manual labeling cannot generate large-scale data sets; and combined with genetic algorithms, the metric features in the data set Dimensionality reduction, avoiding overfitting, can further improve the ability of this data set to predict bad taste

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for constructing code bad smell training dataset by combining with code evolution information
  • Method for constructing code bad smell training dataset by combining with code evolution information
  • Method for constructing code bad smell training dataset by combining with code evolution information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] For a given version of a software project, the entities marked as smelly by the automatic code smell detection tool can be divided into two categories: one type is detected by the code smell automatic detection tool as no bad smell in a subsequent version , is called the changed bad smell entity; the other type is the entity that is still detected as bad smell in a subsequent version, called the unchanged bad smell entity. After research, it is found that in the training data set that uses the code smell automatic detection tool to mark whether the entity has a bad smell, combined with the historical information of software evolution, only the changed bad smell entities are regarded as real bad smell entities, and they are compared with the baseline version Train and build a model of a supervised machine learning algorithm with entities that have not been recognized as bad smells by the automatic code smell detection tool in a subsequent version, instead of relying only ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method for constructing a code bad smell training dataset by combining with code evolution information. The method comprises the following steps that: A: obtaining the baseline version of one piece of software and the source code of a comparison version after the baseline version from a network source code warehouse; B: detecting the source code entity of the baseline version and the comparison version, extracting a changed bad smell entity and an entity without bed smell in the baseline version, annotating the changed bad smell entity as an entity with the bad smell, and annotating the entity without bed smell as an entity without the bad smell; C: extracting entity without bed smell, of which the amount is equivalent to the amount of changed bad smell entities; D: calculating the measurement features of the changed bad smell entity and the entity without bed smell in the source code of the baseline version; E: forming a changed bad smell training dataset by the changed bad smell entity and the entity without the bed smell; and F: utilizing a genetic algorithm to carry out dimension reduction on the measurement features of the entity in the changed badsmell training dataset, and forming a code bad smell training dataset by the changed bad smell entity subjected to the dimension reduction, and the entity without the bad smell.

Description

technical field [0001] The present invention relates to the technical field of code smell detection, and more specifically, relates to a method for constructing a code smell training data set in combination with code evolution information. Background technique [0002] The automatic detection of code smell is one of the hot issues in software engineering research at present. One of the important methods is to build a model based on machine learning algorithm to classify whether the code has bad smell. One of the keys to the accuracy of this method is to use A training data set for building machine learning algorithm models. At present, these methods use multiple open source software projects, manually review the project source code or use automatic tools (such as iPlasma, inFusion, PMD, etc.) The entity of the software project and whether it has bad smell is used as the training data set of the supervised machine learning algorithm, and the feature training model of the bad...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F8/41G06N3/08G06K9/62
CPCG06F8/4435G06N3/08G06F18/24G06F18/214
Inventor 王逸君周晓聪
Owner SUN YAT SEN UNIV