Text similarity calculation method and device and electronic device

A technology of text similarity and calculation method, applied in the field of text similarity calculation method, device and electronic equipment, can solve the problems of weak processing ability, inability to handle reversed character strings, unsatisfactory accuracy of Chinese text similarity, etc. The effect of accuracy

Active Publication Date: 2019-01-25
广西三方大供应链技术服务有限公司
View PDF5 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the existing text similarity methods, whether constructing feature vectors or calculating edit distance, only calculate the similarity of text in character content, and do not or rarely take into account the impact of word order on grammar and semantics, for example, When calculating the bullet chat similarity, the traditional Levenshtein algorithm (a kind of edit distance algorithm) is used. This method is mainly based on the edit distance from the source string to the target string. This method does not consider the common substring between them. String factor, unable to handle reversed string case
This has led to the weak processing ability of the text similarity calculation method in the prior art to the Chinese language that depends on word order, and the accuracy of calculating the similarity of Chinese text is not ideal enough.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text similarity calculation method and device and electronic device
  • Text similarity calculation method and device and electronic device
  • Text similarity calculation method and device and electronic device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] In order to enable those skilled in the art to better understand the solutions of the present invention, the following will clearly and completely describe the technical solutions in the embodiments of the present invention in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only It is an embodiment of a part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

[0055] The terms "first", "second" and the like (if any) in the description and claims of the present invention and the above drawings are used to distinguish similar objects and not necessarily to describe a specific order or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a text similarity calculation method and device and an electronic device. The method according to the embodiment of the invention comprises the following steps: obtaining an original text and a target text; calculating an editing distance between the original text and the target text; determining the longest common substring of the original text and the target text, and obtaining a starting position of the longest common substring in the original text; calculating text similarity between the original text and the target text based on the starting position of the longest common substring in the original text. The embodiment of the invention combines the editing distance of the original text and the target text and the longest common substring to calculate the text similarity, the calculated text similarity is closer to the reality, and the accuracy of the text similarity calculation is improved.

Description

technical field [0001] The invention relates to the technical field of video playback, in particular to a text similarity calculation method, device and electronic equipment. Background technique [0002] Similarity calculation is used to measure the degree of similarity between objects. In the field of natural language processing technology, similarity calculation is a basic operation, which is widely used in technical scenarios such as data mining, data classification, information retrieval and information collection. Text similarity calculation is a type of similarity calculation often involved in the field of natural language processing technology. By calculating the similarity between different texts, it is possible to perform cluster analysis, text matching or deduplication processing on large-scale text corpora. [0003] The text similarity calculation methods in the prior art mainly include cosine similarity, edit distance and neural network language model-based sim...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F16/33
CPCG06F40/253G06F40/284
Inventor 徐乐乐
Owner 广西三方大供应链技术服务有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products