Text similarity calculation method and device, computer equipment and computer storage medium

A technology of text similarity and calculation method, which is applied in calculation, unstructured text data retrieval, text database clustering/classification, etc., can solve the problem of unsatisfactory text similarity results and achieve the effect of improving accuracy

Active Publication Date: 2019-09-27
PING AN TECH (SHENZHEN) CO LTD
View PDF6 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of this, the present invention provides a text similarity calculation method, device, computer equipment and computer storage medium, the main purpose of which is to solve the problem that the calculated text similarity results are not ideal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text similarity calculation method and device, computer equipment and computer storage medium
  • Text similarity calculation method and device, computer equipment and computer storage medium
  • Text similarity calculation method and device, computer equipment and computer storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0056] The embodiment of the present invention provides a method for calculating text similarity, which can accurately calculate the similarity between texts in complexly expressed texts, such as figure 1 As shown, the method includes:

[0057] 101. Obtain training word segmentation data obtained by tokenizing text corpus with different sentence lengths.

[0058] Among them, the text corpus contains multiple pairs of sentence comb...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text similarity calculation method and device, and relates to the technical field of text processing, which can accurately calculate the similarity between texts in a text with complex expression. The method comprises the steps of obtaining training word segmentation corpora obtained after word segmentation is conducted on text corpora with different sentence lengths; inputting the training word segmentation corpora as training data into a supervision model for training, and constructing a sentence vector conversion model which is used for converting sentences in the text corpus into sentence vectors for representing text characteristics; adjusting characteristic parameters in the sentence vector conversion model according to the sentence vector which is obtained by training and represents the text characteristics; based on the adjusted sentence vector conversion model, performing sentence vector conversion on the plurality of target texts to obtain a plurality of sentence vectors representing the characteristics of the target texts; and calculating the similarity among the plurality of target texts according to the plurality of sentence vectors representing the characteristics of the target texts.

Description

technical field [0001] The invention relates to the technical field of text processing, in particular to a calculation method, device, computer equipment and computer storage medium for text similarity. Background technique [0002] Natural language processing is an important direction in the field of computer science and artificial intelligence. In the process of natural language processing, it often encounters the scene of finding similar sentences or finding the approximate expression of sentences. It is necessary to use the calculation text similarity Classify similar sentences in a degree-based manner. [0003] Text similarity calculation is the most common application problem in the field of natural language processing. At present, the similarity between texts is usually calculated by means of text string distance and text word vectorization. However, this method of calculating similarity between texts All texts are represented by splitting strings and word vectors, a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F17/27
CPCG06F16/35G06F40/279G06F40/289Y02D10/00
Inventor 申超波阮晓雯徐亮
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products