Side information-based code snippet programming language detecting method

A technology of code fragments and side information, which is applied in the field of programming language recognition and can solve problems such as low accuracy of code fragment recognition

Inactive Publication Date: 2016-08-31
NANJING UNIV
View PDF1 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For this reason, the present invention proposes a new code fragment-oriented programming language recognition method on the basis of side information such as text and labels and Bayesian technology, which effectively solves the problem of low recognition accuracy of code fragments

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Side information-based code snippet programming language detecting method
  • Side information-based code snippet programming language detecting method
  • Side information-based code snippet programming language detecting method

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0041] Example 1, the quantitative evaluation of the language recognition ability of the SIPLDM method of this patent

[0042] 1. Input and output data description

[0043] We apply the method of the present invention to the real data set of StackOverflow, a question-and-answer website in the programming community. The input is a set of question post data, each post has code fragments, explanatory text around the code, tags around the code, and language categories. The statistics are as follows: As shown in Table 1: The data set has 459393 posts, which are divided into 13 groups according to different language categories. Because the popularity of languages ​​is different, the number of each group is more or less, and each group has an average of 35337.92 posts. figure 1 Several examples of data are listed. We randomly sampled 90% of the data as training data and the remaining 10% as testing data.

[0044] The output is the language recognition quality evaluation index of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a side information-based code snippet programming language detecting method, a more reasonable classifier for code snippet programming language detecting is constructed through analysis of affiliated side information such as comments, description and tags around code snippets, and the problem that accuracy of a traditional detecting method which is only based on source codes is low is well solved. The method comprises two main steps, first, text information and known tags around the code snippets are analyzed by utilization of a keyword-enhanced multi-label learning technology, tags which are related to the code snippets and have a sufficient number are increased, then a Bayes classifier is trained by utilization of the code snippets of known programming language and the tags, and programming language detecting is carried out on the code snippets of unknown programming language. Experiments of a real data set collected by a programming community Quora StackOverflow show that the method has higher detecting accuracy compared with a traditional detecting technology.

Description

technical field [0001] The invention relates to code language identification, especially the programming language identification focusing on code fragments. There are often side information such as text and labels around the code fragments on the Internet. Using side information, based on text analysis technology, features are extracted and refined into labels, and the Bayesian model is trained based on labels, which effectively enhances the programming of code fragments. The language recognition ability improves the recognition accuracy of the code fragment programming language recognition system. Background technique [0002] In recent years, with the rapid development of the Internet and the globalization of the use of popular programming languages, more and more source codes of programs appear on blogs, forums, Q&A sites and other applications, such as StackOverflow, CSDN and so on. However, on most blogs, forums, and online question-and-answer websites, the source code...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/9535G06F18/24155
Inventor 吕建徐锋李立成
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products