Interactive code searching method and device based on structured embedding

A search method and structured technology, applied in digital data information retrieval, instrumentation, computing, etc., can solve problems such as insufficient search performance

Active Publication Date: 2020-05-15
WUHAN UNIV
View PDF14 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] In view of this, the present invention provides an interactive code search method and device based on structured emb

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Interactive code searching method and device based on structured embedding
  • Interactive code searching method and device based on structured embedding
  • Interactive code searching method and device based on structured embedding

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0093] This embodiment provides an interactive code search method based on structured embedding, please refer to figure 1 , the method includes:

[0094] Step S1: Collect the original data, extract the software repository and the model corpus of the code-description matching pair from the original data, and obtain the social attribute value of each code-description matching pair during the extraction process.

[0095] Specifically, the original data can come from different open source databases, and the software repositories can be in different programming languages. For example, the software repositories including C#, Java, SQL and Python are crawled from StackOverflow in the software Q&A community. and code - the model corpus describing the matching pairs.

[0096] Step S2: Perform structured word segmentation and preprocessing on the model corpus to obtain the processed corpus.

[0097] Specifically, S2 is the word segmentation of the code repository and model corpus. Sp...

Embodiment 2

[0178] Based on the same inventive concept, this embodiment provides, please refer to Figure 5 , the device consists of:

[0179] The collection module 201 is used to collect the original data, extract the software repository and the model corpus of the code-description matching pair from the original data, and obtain the social attribute value of each code-description matching pair during the extraction process;

[0180] The structured word segmentation module 202 is used to perform structured word segmentation and preprocessing on the model corpus to obtain the processed corpus;

[0181] Structured word embedding module 203, for adopting preset tool to carry out word embedding training to the corpus after processing, constructs the structured word embedding of pre-training;

[0182] The high-quality corpus extraction and division module 204 is used to carry out structured word segmentation and preprocessing on the model corpus, and filter out a preset number of corpus acco...

Embodiment 3

[0188] See Image 6 , based on the same inventive concept, the present application also provides a computer-readable storage medium 300, on which a computer program 311 is stored. When the program is executed, the method as described in the first embodiment is implemented.

[0189] Since the computer-readable storage medium introduced in the third embodiment of the present invention is the computer-readable storage medium used to implement the interactive code search method based on structured embedding in the first embodiment of the present invention, based on the introduction in the first embodiment of the present invention Those skilled in the art can understand the specific structure and deformation of the computer-readable storage medium, so details will not be repeated here. All computer-readable storage media used in the method in Embodiment 1 of the present invention fall within the scope of protection intended by the present invention.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an interactive code searching method based on structured embedding, which comprises the following steps: firstly, extracting a software storage library and a model corpus of a code-description matching pair from collected original data, then, carrying out word segmentation on the code storage library and the model corpus, then, constructing pre-trained structured embedding by adopting a preset tool, extracting high-quality model corpus and dividing the high-quality model corpus, then, constructing an interactive code search model NICS sequentially comprising a feature extraction module, an interactive attention extraction module and a similarity matching module, setting a hinge loss function of a training network, embedding and loading a pre-trained structured word into the interactive code search NICS model, training the NICS model, and finally predicting a to-be-processed query by utilizing the trained NICS model to obtain a code search result corresponding tothe query. According to the method, the code snippets can be effectively searched, and the most advanced performance is obtained in all benchmark tests.

Description

technical field [0001] The invention relates to the field of code technology of software engineering, in particular to an interactive code search method and device based on structured embedding. Background technique [0002] Code searching is a common developer activity in software development practice and has been an important part of software development for decades. Previous research has shown that over 60% of developers search source code on a daily basis. Since online public code repositories (e.g., StackOverflow, GitHub, Krugle) contain millions of open source projects, many search engines are designed to help developers query the software Q&A community in natural language for relevant code snippets to maintain or fix code. But unfortunately, most existing search engines often return irrelevant code or sample code even when the descriptions of these queries are reconstructed. As a result, code search techniques are currently gaining more and more attention in both a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/242
CPCG06F16/2433
Inventor 彭敏黎芮彤胡刚刘进崔晓晖
Owner WUHAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products