Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for calculating similarity between texts, storage medium and electronic equipment

A similarity calculation and similarity technology, applied in the computer field, can solve the problems of affecting semantic conversion, incomplete semantic information, and inaccurate text similarity.

Active Publication Date: 2019-04-26
NEUSOFT CORP
View PDF8 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the semantics of a word in a text will be affected by the context of the word’s position in the text, thus affecting the semantic conversion between words between texts, and the above-mentioned similarity calculation method focuses on the whole text. For example, the semantic correspondence conversion relationship between words is not considered, so the semantic information of each text is not comprehensive when calculating, resulting in the calculated text similarity is not accurate enough

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for calculating similarity between texts, storage medium and electronic equipment
  • Method and device for calculating similarity between texts, storage medium and electronic equipment
  • Method and device for calculating similarity between texts, storage medium and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0066] Specific embodiments of the present disclosure will be described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to illustrate and explain the present disclosure, and are not intended to limit the present disclosure.

[0067] figure 1 is a flowchart of a method for calculating similarity between texts provided according to an embodiment of the present disclosure. Such as figure 1 As shown, the method may include the following steps.

[0068] In step 11, for the first text and the second text whose similarity is to be calculated, word segmentation and stop word filtering are performed, and according to the processing results, the first word segmentation set corresponding to the first text without repeated word segmentation and the corresponding in the second participle set of the second text.

[0069] For the first text and the second text for which the similarity betwee...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an inter-text similarity calculation method and device, a storage medium and electronic equipment. The method comprises the steps that word segmentation and stop word filtering processing is conducted on a first text and a second text with the similarity to be calculated, and a first word segmentation set which does not contain repeated word segmentation and corresponds tothe first text and a second word segmentation set which corresponds to the second text are obtained according to the processing result; Determining the semantic information transfer cost between thefirst text and the second text according to the information amount of each segmented word in the first segmented word set and the second segmented word set in the text and the word embedding vector corresponding to each segmented word; And determining the similarity between the first text and the second text according to the semantic information transfer cost. Therefore, the semantic influence ofeach word in the text and the context of each word on the text is fully considered, and the calculation basis of the similarity is closer to the semantics of the text, so that the calculated similarity is more accurate.

Description

technical field [0001] The present disclosure relates to the field of computer technology, in particular to a method, device, storage medium and electronic equipment for calculating similarity between texts. Background technique [0002] In the prior art, when calculating the similarity between texts, a structured processing method is generally adopted. Firstly, the text is structured, for example, two pieces of text are processed into vectors, such as word-based one-hot representation, word-based one-hot representation, word embedding and accumulation-based vectorized representation, etc. Afterwards, similarity calculations are performed on the results of text structured processing, such as calculating the Euclidean distance between text vectors, the cosine angle between vectors, etc. However, the semantics of a word in a text will be affected by the context of the word’s position in the text, thus affecting the semantic conversion between words between texts, and the abov...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06K9/62G06F16/332
CPCG06F40/284G06F18/22
Inventor 董超
Owner NEUSOFT CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products