Crowdsourcing test report similarity detection method based on natural language processing

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A natural language processing and test report technology, applied in natural language data processing, software testing/debugging, electrical digital data processing, etc., can solve problems such as poor detection results of similar reports

Pending Publication Date: 2021-12-03

NANJING UNIV

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0008] The problem to be solved by the present invention is: in the current crowdsourcing test report similarity detection, the detection effect of similar reports with the same meaning but not completely the same text content is not good

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0024] Several key technologies involved in the present invention are jieba word segmentation, Word2Vec model, K-means clustering and LSTM-DSSM deep learning model.

[0025] 1. Jieba participle

[0026] jieba is currently the best Chinese word segmentation component for Python. It mainly has the following three features. 1. Support 3 participle word mode

[0027] Mode: exact mode, full mode, search engine mode. 2. Support traditional word segmentation. 3. Support custom dictionary.

[0028] 2. Word2Vec model

[0029] Word2Vec, is a group of related models used to generate word vectors. These models are shallow, two-layer neural networks that are trained to reconstruct the linguistic text. The network is represented in terms of words and needs to guess the input words in adjacent positions. Under the assumption of the bag-of-words model in Word2Vec, the order of words is unimportant. After the training is completed, the Word2Vec model can be used to map each word to a v...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A crowdsourcing test report similarity detection method based on natural language processing adopts a natural language processing technology to detect the similarity of complex test reports submitted by crowdsourcing workers, and has the function of performing Chinese word segmentation, stop word removal and other preprocessing on the crowdsourcing reports. Sentences represented by the preprocessed phrases are represented as word vectors by utilizing a Word2Vec technology, a cosine similarity measurement mode is selected to calculate the distance between the word vectors, training is performed by adopting a semantic model trained according to a large amount of previous crowd test report data, and then each word vector is taken as input of K-Means clustering analysis, and clustering analysis is carried out on each word vector, and similar reports are classified into the same class according to a set similarity threshold, so that the similarity between the crowdsourcing test reports can be accurately measured.

Description

technical field [0001] The invention belongs to the field of software engineering, is an application of natural language processing in the field of software engineering, and is used for detecting code similarity. Background technique [0002] The detection of similar crowdsourced test reports is a key technology to improve the utilization rate of crowdtest reports and reduce the workload of testers reading repeated reports. The crowdsourcing test report is the result that the crowdsourcing workers feedback to the testers after completing the tasks set by the initiator, and the testers guide the reproduction and location of the bug based on the crowdsourcing report. If there are a lot of repeated content describing the same bug content in the public test report, testers cannot know in advance whether the bug described in the report is mentioned before before reading, so testers need to waste a lot of time In reading duplicate reports, this is not helpful for testers to repro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F40/247G06F40/30G06K9/62G06F16/35G06F11/36G06N3/04

CPCG06F40/247G06F40/30G06F16/35G06F11/3692G06N3/044G06N3/045G06F18/23213

Inventor房春荣曹振飞王旭虞圣呈恽叶霄李彤宇

OwnerNANJING UNIV

Crowdsourcing test report similarity detection method based on natural language processing

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology