Automatic source code annotation generation method based on data mining

An automatic generation and data mining technology, which is applied to network data retrieval, network data indexing, and other database retrieval, etc., can solve the problems of heavy data collection and screening workload, lagging annotation information update, low annotation coverage, etc.

Inactive Publication Date: 2017-05-17
INST OF SOFTWARE - CHINESE ACAD OF SCI
View PDF7 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the short software cycle of most open source projects, for example, the Linux kernel, as an excellent open source project, has a wide range of applications and influences, but the coverage of its annotations is far from meeting the needs of learning and junior developers
At the same time, most open source software versions are updated rapidly at present, and there is a trend of faster and faster version updates, and the corresponding notes cannot be updated in time with version changes
Therefore, there are problems in the number of annotation coverage between the code and annotations of open source software, and there are also quality issues such as lagging inaccurate and incomplete annotation information updates due to version updates
However, the public information related to the development and maintenance of open source software on the Internet is huge and complicated, good and bad are intermingled, and it is difficult to distinguish, which is of limited assistance to learning and development, and the workload of data collection and screening is relatively large.
[0003] Source code comments are generally manually written by experienced programmers, and the difficulty of manual comments has promoted the research of automatic comment methods
At present, most of the research on automatic annotation technology is dedicated to automatically generating natural language annotations using source code semantics. Most of them are aimed at object-oriented languages, involving very complex technologies such as syntax analysis and semantic analysis, and some technologies are dedicated to building complex models, which are technically difficult. large, difficult to achieve

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic source code annotation generation method based on data mining
  • Automatic source code annotation generation method based on data mining
  • Automatic source code annotation generation method based on data mining

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0098] This embodiment sets the following usage scenarios:

[0099] Using the Stack Overflow website data source, apply this method to automatically generate comments for the fork() function.

[0100] 1) Enter the annotation automatic generation system, select the Stack Overflow data source, search for the keyword "linux fork()", crawl the content of the obtained webpage, and obtain the corresponding texts of 15 topics. One of the topics reads as follows:

[0101] Question number: 12881111

[0102] Question title: Output offock() calls

[0103] Problem Description:

[0104] What would be the output off following fork() call?

[0105] fork(){

[0106] fork();

[0107] fork();

[0108] fork()&&fork()||fork();

[0109] fork();

[0110] Print("Saika collection\n");

[0111] Can anyone help me in getting the answer to this code as well as some explanations as i am new to OS? I have found several questions on fork() on SO, but couldn't figure out.

[0112] Question Votes:...

Embodiment 2

[0131] This embodiment sets the following usage scenarios:

[0132] Using the Linux kernel emails as the data source, select emails from July 16 to July 23, 2016, and apply this method to automatically generate function comments.

[0133] 1) Enter the comment automatic generation system, select the Linux kernel email as the data source, and crawl 4493 emails during this period. The title and body text of one of the letters is as follows:

[0134] Title: timer_list:print_tickdevice():calculate->min_delta_nsdynamically

[0135] Text: print_tickdevice(), assembling the per-tick device sections in / proc / timer_list, is the last user of struct clock_event_device's->min_delta_nsmember.

[0136] In order to make this one fully obsolete while retaining userspaceABI, calculate the displayed value of'min_delta_ns'on the fly from->min_delta_ticks_adjusted.

[0137] Signed-off-by: Nicolai Stange

[0138] ---

[0139] kernel / time / timer_list.c|5+++--

[0140] 1file changed, 3insertion...

Embodiment 3

[0149] This embodiment sets the following usage scenarios:

[0150] Use the Linux kernel Commit-log as the data source, select the log information between the v4.8-rc3 and v4.8-rc2 versions, and apply our own method to automatically generate function comments.

[0151] 1) Enter the comment automatic generation system, select the Linux kernel Commit-log as the data source, and crawl 136 Commit-log information in total. The text of one of the Commit-log title and body is as follows:

[0152] Title: dm raid:enhance attempt_restore_of_faulty_devices() to support more devices

[0153] Text: dm raid:enhance attempt_restore_of_faulty_devices() to support more devices

[0154] attempt_restore_of_faulty_devices() is limited to 64 when it should support the new maximum of 253 when identifying any failed devices. It clears any revivable devices via an MD personality hot remove and add cylce to allow for their recovery.

[0155] Address by using existing functions to retrieve and updat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an automatic source code annotation generation method based on data mining. The method includes the steps that texts containing needed annotations are extracted from three kinds of data sources; three filtering rules are formed by combining respective characteristics of the data sources to reject irrelevant noise information, and a text processing technology is used for character format preprocessing; two extraction rules of describing function annotation key characteristics are formed through summarization, and function general annotation is automatically generated according to the two extraction rules combined with the characteristics of the three data sources. The extracted annotation can enrich traditional function annotation, provide multi-dimensional information and support version alternation. Customization is carried out on linux kernel annotation and is easy to implement, and function annotation information which is high in readability and reliable can be provided with small cost; the blank of automatic linux kernel function annotation is effectively filled up, richer reference information is provided for learning and development, and development workload and difficulty are effectively reduced.

Description

technical field [0001] The invention relates to the technology of automatically generating function annotations, in particular to text processing technology and web data crawling technology in the field of information collection in data mining, and proposes a method for automatically generating source code annotations based on data mining. Background technique [0002] Annotation is a natural language text description of the corresponding source code written to improve the readability of the source code. The main purpose is to assist programmers in code understanding and improve the maintainability of the software system. Excellent software projects require high-quality code and accurate and comprehensive comments and documentation. Due to the short software cycle of most open source projects, for example, the Linux kernel is an excellent open source project with a wide range of applications and influences, but the coverage of its annotations is far from meeting the needs of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/44G06F17/30G06F17/27
CPCG06F8/73G06F16/335G06F16/951G06F16/955G06F40/258
Inventor 田兆楠李斌吴红双李婧贺也平
Owner INST OF SOFTWARE - CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products