Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Crowdsourcing test report similarity detection method based on natural language processing

A natural language processing and test report technology, applied in natural language data processing, software testing/debugging, electrical digital data processing, etc., can solve problems such as poor detection results of similar reports

Pending Publication Date: 2021-12-03
NANJING UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The problem to be solved by the present invention is: in the current crowdsourcing test report similarity detection, the detection effect of similar reports with the same meaning but not completely the same text content is not good

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Crowdsourcing test report similarity detection method based on natural language processing
  • Crowdsourcing test report similarity detection method based on natural language processing
  • Crowdsourcing test report similarity detection method based on natural language processing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] Several key technologies involved in the present invention are jieba word segmentation, Word2Vec model, K-means clustering and LSTM-DSSM deep learning model.

[0025] 1. Jieba participle

[0026] jieba is currently the best Chinese word segmentation component for Python. It mainly has the following three features. 1. Support 3 participle word mode

[0027] Mode: exact mode, full mode, search engine mode. 2. Support traditional word segmentation. 3. Support custom dictionary.

[0028] 2. Word2Vec model

[0029] Word2Vec, is a group of related models used to generate word vectors. These models are shallow, two-layer neural networks that are trained to reconstruct the linguistic text. The network is represented in terms of words and needs to guess the input words in adjacent positions. Under the assumption of the bag-of-words model in Word2Vec, the order of words is unimportant. After the training is completed, the Word2Vec model can be used to map each word to a v...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A crowdsourcing test report similarity detection method based on natural language processing adopts a natural language processing technology to detect the similarity of complex test reports submitted by crowdsourcing workers, and has the function of performing Chinese word segmentation, stop word removal and other preprocessing on the crowdsourcing reports. Sentences represented by the preprocessed phrases are represented as word vectors by utilizing a Word2Vec technology, a cosine similarity measurement mode is selected to calculate the distance between the word vectors, training is performed by adopting a semantic model trained according to a large amount of previous crowd test report data, and then each word vector is taken as input of K-Means clustering analysis, and clustering analysis is carried out on each word vector, and similar reports are classified into the same class according to a set similarity threshold, so that the similarity between the crowdsourcing test reports can be accurately measured.

Description

technical field [0001] The invention belongs to the field of software engineering, is an application of natural language processing in the field of software engineering, and is used for detecting code similarity. Background technique [0002] The detection of similar crowdsourced test reports is a key technology to improve the utilization rate of crowdtest reports and reduce the workload of testers reading repeated reports. The crowdsourcing test report is the result that the crowdsourcing workers feedback to the testers after completing the tasks set by the initiator, and the testers guide the reproduction and location of the bug based on the crowdsourcing report. If there are a lot of repeated content describing the same bug content in the public test report, testers cannot know in advance whether the bug described in the report is mentioned before before reading, so testers need to waste a lot of time In reading duplicate reports, this is not helpful for testers to repro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/247G06F40/30G06K9/62G06F16/35G06F11/36G06N3/04
CPCG06F40/247G06F40/30G06F16/35G06F11/3692G06N3/044G06N3/045G06F18/23213
Inventor 房春荣曹振飞王旭虞圣呈恽叶霄李彤宇
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products