Unlock instant, AI-driven research and patent intelligence for your innovation.

Fingerprint feature generation method and matching method for code snippets

A code fragment and fingerprint feature technology, applied in the fingerprint feature generation method and matching field of code fragments, can solve the problems of inconsistent matching results, inaccurate analysis of software group composition, etc., and improve the search effect

Pending Publication Date: 2022-03-04
上海安势信息技术有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in the process of software development, adaptive changes are often made to open source components. Once a letter is modified, the code of the entire line will change, resulting in the line code of the open source component changing due to the change of the line, resulting in matching Inconsistent results make software group composition analysis inaccurate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Fingerprint feature generation method and matching method for code snippets
  • Fingerprint feature generation method and matching method for code snippets

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0066] The following is attached Figure 1 to Figure 2 The application is described in further detail.

[0067] The embodiment of the present application discloses a method for generating fingerprint features of code fragments. refer to figure 1 , the fingerprint feature generation method of the code fragment includes the following steps:

[0068] S100. Obtain the source code of the code fragment.

[0069] The source code can be obtained by manually inputting the code, or directly calling the stored code from a preset database, as long as the source code can be obtained.

[0070] S200. Perform code cleaning on the source code to obtain a continuous character string carrying code line number information.

[0071] Wherein, the code cleaning includes at least one of the following: removing newline characters, removing spaces, and removing comment information. In addition to the above-mentioned several cleaning methods, the staff can also set the objects that need to be remov...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a code snippet fingerprint feature generation method and a code snippet fingerprint feature matching method. The code snippet fingerprint feature generation method comprises the following steps: acquiring a source code of a code snippet; performing code cleaning on the source code to obtain a continuous character string carrying code line number information; sliding and selecting character string segments in the continuous character strings one by one by using a first window with a preset character length; obtaining a fixed-length code of each character string segment to obtain a plurality of second fixed-length codes; a fixed-length code set in the multiple second fixed-length codes is selected one by one in a sliding mode through a second window with the preset fixed-length code number; screening a third fixed-length code from each fixed-length code set to obtain a plurality of third fixed-length codes; and taking the plurality of third fixed-length codes as fingerprint features of the code snippets. According to the method, the source code is converted into zero, then the fixed-length codes are used for representing character string fragments, data dimension reduction is achieved so as to reduce the subsequent matching amount, and the fixed-length codes are screened so as to further reduce the subsequent matching amount.

Description

technical field [0001] The present application relates to the field of software analysis, and in particular to a fingerprint feature generation method and a matching method of code fragments. Background technique [0002] With the application of open source components in software development, the work of software development has been greatly facilitated, resulting in more and more software being born in recent years. [0003] When an enterprise uses an outsourced software system, for the sake of system security, it is necessary to clarify which open source components are used in the software. Enterprises generally use software composition analysis tools to analyze software. [0004] At present, the mainstream software composition analysis method is based on code fingerprints, that is, the software code is MD5 encoded, and then the source code and component details in major open source libraries are MD5 encoded. Then match whether the MD5 number corresponding to the softwar...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/41G06F16/2455
CPCG06F8/44G06F8/43G06F16/2455
Inventor 杨钦余浩翔许渊聪
Owner 上海安势信息技术有限公司