Unlock instant, AI-driven research and patent intelligence for your innovation.

A method of constructing a training dataset of bad smell of code by combining code evolution information

A technology for training data sets and codes, applied in code compilation, program code conversion, neural learning methods, etc., can solve the problems of low reliability of data sets, inability to generate large-scale data sets, etc., to avoid overfitting and improve Predictive ability, effect of removing unwanted noise

Active Publication Date: 2021-02-02
SUN YAT SEN UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The method for constructing code bad smell training data sets combined with code evolution information provided by the present invention uses existing tools to detect the entities of the baseline version and the control version of the same software, and extract the changed bad smell entities and non-bad smell entities in the process of code evolution Entity construction of training data sets can solve the problem of low credibility of data sets generated by existing tools, and also solve the problem that manual labeling cannot generate large-scale data sets; and combined with genetic algorithms, the metric features in the data set Dimensionality reduction, avoiding overfitting, can further improve the ability of this data set to predict bad taste

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method of constructing a training dataset of bad smell of code by combining code evolution information
  • A method of constructing a training dataset of bad smell of code by combining code evolution information
  • A method of constructing a training dataset of bad smell of code by combining code evolution information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] For a given version of a software project, the entities marked as smelly by the automatic code smell detection tool can be divided into two categories: one type is detected by the code smell automatic detection tool as no bad smell in a subsequent version , is called the changed bad smell entity; the other type is the entity that is still detected as bad smell in a subsequent version, called the unchanged bad smell entity. After research, it is found that in the training data set that uses the code smell automatic detection tool to mark whether the entity has a bad smell, combined with the historical information of software evolution, only the changed bad smell entities are regarded as real bad smell entities, and they are compared with the baseline version Train and build a model of a supervised machine learning algorithm with entities that have not been recognized as bad smells by the automatic code smell detection tool in a subsequent version, instead of relying only ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to a method for constructing a code bad smell training data set in combination with code evolution information, comprising the following steps: A, acquiring a software baseline version and the source code of a comparison version after the baseline version from a network source code warehouse; B , Detect the source code entities of the baseline version and the control version, and extract the changed bad-smelling entities and non-smelly entities in the baseline version, mark the changed bad-smelling entities as bad-smelling entities, and mark the non-smelly-smelling entities as Entities without bad smell; C. Extracting non-smelly entities whose number is equivalent to the number of changed bad-smelling entities; D. Calculating the metric characteristics of changed bad-smelling entities and non-smelly entities in the baseline version source code; E. Changing bad-smelling entities Smell entities and non-smell entities make up the training data set of bad smells; F. Use genetic algorithm to reduce the dimensionality of the entities in the training data set of bad smells, and change the bad smell entities and entities without bad smells after dimension reduction Compose the code smell training dataset.

Description

technical field [0001] The present invention relates to the technical field of code smell detection, and more specifically, relates to a method for constructing a code smell training data set in combination with code evolution information. Background technique [0002] The automatic detection of code smell is one of the hot issues in software engineering research at present. One of the important methods is to build a model based on machine learning algorithm to classify whether the code has bad smell. One of the keys to the accuracy of this method is to use A training data set for building machine learning algorithm models. At present, these methods use multiple open source software projects, manually review the project source code or use automatic tools (such as iPlasma, inFusion, PMD, etc.) The entity of the software project and whether it has bad smell is used as the training data set of the supervised machine learning algorithm, and the feature training model of the bad...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F8/41G06N3/08G06K9/62
CPCG06F8/4435G06N3/08G06F18/24G06F18/214
Inventor 王逸君周晓聪
Owner SUN YAT SEN UNIV