Unlock instant, AI-driven research and patent intelligence for your innovation.

Text deduplication method, device and equipment

A technology of text and preset text, which is applied in the direction of text database query, unstructured text data retrieval, text database clustering/classification, etc. Problems such as this semantics, to achieve the effect of accurate and efficient deduplication of text

Pending Publication Date: 2019-08-23
SHENZHEN TENCENT INFORMATION TECH CO LTD
View PDF0 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the above-mentioned existing text deduplication methods, the keywords extracted after word segmentation are directly used as the basis for calculating the similarity between two texts. Due to the single keyword information, it is often impossible to accurately represent the semantics of the text, and it is impossible to accurately represent the semantics of the text based on the keywords. Accurately calculate the similarity between texts, resulting in poor text deduplication effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text deduplication method, device and equipment
  • Text deduplication method, device and equipment
  • Text deduplication method, device and equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in the present application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present application.

[0036] It should be noted that the terms "first" and "second" in the description and claims of the present application and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein can be practiced in sequences other than those illustrated or des...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text deduplication method, device and equipment, and the method comprises the steps: determining entity keywords and description keywords in a first feedback text based on the first feedback text fed back by a target object; determining a first word vector of the entity keyword and a second word vector of the description keyword based on a text classification model; basedon the first word vector and the second word vector, determining a sentence vector of the first feedback text; calculating the similarity between the sentence vector of the first feedback text and the sentence vector of a second feedback text in a preset text vector library; and based on the similarity, performing duplicate removal processing on the first feedback text. By adopting the technicalscheme provided by the invention, the similarity between the first feedback text fed back by the target object and the second feedback text in the preset text vector library can be accurately calculated, so that the text duplicate removal accuracy is improved.

Description

technical field [0001] The present application relates to the technical field of Internet text analysis, and in particular to a method, device and equipment for deduplication of text. Background technique [0002] For a new game or a new version of a game, it will be tested before it is officially applied. For example, recruiting hundreds of players to experience the game and report bugs in the game. Usually, multiple players use different expressions or descriptions for the same problem. When counting game defects in the later stage, it is necessary to find and extract repeated feedback for different descriptions. [0003] In the existing technology, when performing text deduplication, word segmentation is performed on the text that needs to be deduplicated; then, keywords are directly extracted from the word segmentation; then, the similarity between the keywords of the two texts is calculated, and finally, text-based The similarity between keywords is used to deduplicat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F16/33
CPCG06F16/35G06F16/334
Inventor 智绪浩庄超毕研涛魏学峰
Owner SHENZHEN TENCENT INFORMATION TECH CO LTD