Source code file multi-service label automatic classification method

A technology of code files and classification methods, applied in the field of program business understanding, can solve the problem of less deep learning and achieve high accuracy

Active Publication Date: 2019-07-30
浙江网新恒天软件有限公司
View PDF9 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For multi-label classification in natural language, the current research status and industrial applications are mostly carried out through recommendation algorithms, and deep learning based on current popular research fields is still rare

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Source code file multi-service label automatic classification method
  • Source code file multi-service label automatic classification method
  • Source code file multi-service label automatic classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] The present invention will be further described in detail below in conjunction with the drawings and specific embodiments.

[0049] The present invention provides a method for automating business annotation procedures based on deep learning, such as figure 1 As shown, it includes four parts: data preparation, data preprocessing, word vector representation, and multi-label classification.

[0050] (1) Data preparation

[0051] The data preparation process is to prepare all the program files of the project, the business tag set provided by the experts, and the data marked by the experts. Assume that a software project consists of n program files, and these program files constitute a program file set X, which can be expressed as:

[0052] X={x 1 , X 2 ,..., x n }, |X|=n

[0053] Prepare a set of non-overlapping business label sets summarized by business experts for the software Expressed as:

[0054]

[0055] Label set Each element in λ i Represents the business label provided by...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a source code file multi-service label automatic classification method. The method comprises four parts of data preparation, data preprocessing, word vector representation andmulti-label classification. According to the method, semantics in codes can be mined more deeply through multi-label classification learning based on deep learning; the method comprises steps of decomposing the program file into an annotation part and a code part by utilizing the text characteristics of the program file, and respectively encoding by adopting different preprocessing and vectorization modes. According to the method, a strategy of fusing multiple models is adopted, and the method has higher accuracy than a single model; based on a Fast Text model, a classification result can be obtained more quickly on the basis of the layered softmax; based on the TextCNN model, deep mining of text features in a fixed window in the code is carried out; code context semantics of the dynamic length are captured by the TextRNN. The method can help programmers to screen effective codes more quickly and help programmers understand the meaning of deep-layer services of the codes.

Description

Technical field [0001] The method relates to the field of program business understanding, and specifically relates to a method for automatically classifying source code files in software engineering with multiple business tags. Background technique [0002] With the continuous increase in the scale of modern software, programmers are facing increasing development and maintenance burdens, especially when serving a specific business area. The biggest challenge for developers is how to correctly understand the business meaning of the code in a specific business area. . These tasks rely heavily on related requirements, design and change documents, but the lack of documents or the inability to guarantee timely updates make developers unable to rely on these documents. At this time, developers can only rely on the source code, so extracting business meaning from the source code has become an urgent need. [0003] To describe the business of the code, we can use several phrases with bus...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/33G06F16/35G06K9/62
CPCG06F8/33G06F16/35G06F18/2411G06F18/25Y02D10/00
Inventor 杨朝晖郭倩李善平
Owner 浙江网新恒天软件有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products