Supercharge Your Innovation With Domain-Expert AI Agents!

Rapid code file tracing method and device for GitHub large-scale open source codes

An open source code and fast code technology, applied in the field of source code traceability of open source software, can solve the problems of high complexity of code clone detection algorithm, impossibility of collecting open source code, low execution efficiency, etc. The effect of ensuring practicality

Active Publication Date: 2021-03-19
INST OF SOFTWARE - CHINESE ACAD OF SCI
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition, a large number of new warehouses are generated on the Internet every day. Even if only considering a representative platform such as GitHub, it is almost impossible to collect the above open source code completely;
[0007] Second, the above-mentioned code clone detection algorithm has high complexity and low execution efficiency
[0008] In summary, a method that can reduce the cost and processing time of code traceability needs to be proposed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Rapid code file tracing method and device for GitHub large-scale open source codes
  • Rapid code file tracing method and device for GitHub large-scale open source codes
  • Rapid code file tracing method and device for GitHub large-scale open source codes

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0039] The core idea of ​​the present invention is that the key features of the source file and the current file are consistent or similar, and the code search engine of GitHub can quickly return query results.

[0040] Such as figure 1 As shown, the present invention is a flow chart of the steps of an embodiment of a fast code file traceability method for GitHub large-scale open source codes, which may include the following steps:

[0041] Step 11, read the file that needs to be traced, and construct the initial query for GitHub code search.

[0042] Constructs an initial query that conforms to the GitHub Code Search API standard, based on the file name, size, programming language, and code statements in the file. The following...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a rapid code file tracing method and device for GitHub large-scale open source codes. The method comprises the following steps: reading a file to be traced, and constructing aninitial query meeting a GitHub code search API standard; executing a query and obtaining a query result returned by the GitHub; extracting a file path and a code warehouse where the file path is located in the query result; obtaining attributes of the code warehouse through a code warehouse API of the GitHub; and sorting the code warehouses according to the attributes of the code warehouses, returning a sorted result, and taking the sorted result and the file path as a code file traceability result. Further, performing manual verification on the code file tracing result, and if the accuracy does not meet the requirement after manual verification, rebuilding code search query, and performing iterative tracing. According to the invention, code traceability in a large-scale code warehouse canbe assisted with relatively low cost.

Description

technical field [0001] The present invention relates to the field of source code traceability of open source software, in particular to a method and device for fast source code file traceability of GitHub large-scale open source codes. Background technique [0002] Open source software has been widely used in production and life. In software development, it is a very common practice to reuse existing open source software or its components. In order to reduce code maintenance costs and reduce the risk of open source license conflicts, many development teams need to trace the source of open source code used in their software projects. [0003] The basic method of code traceability is to collect large-scale open source codes and search for the source codes and their associated software projects through code clone detection technology in these open source codes. Code clone detection refers to given two code files or code fragments to determine whether they are similar. [000...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F8/75G06F16/903
CPCG06F8/751G06F16/90344
Inventor 朱家鑫叶丹陈伟吴全国窦文生魏峻
Owner INST OF SOFTWARE - CHINESE ACAD OF SCI
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More