Text similarity obtaining method and device

A technology of text similarity and acquisition method, applied in the field of text similarity acquisition, can solve the problems of complex calculation method, low performance, large amount of data calculation, etc., and achieve the effect of simple calculation method, reduced complexity and data calculation amount

Active Publication Date: 2013-05-29
BEIJING FEINNO COMM TECH
View PDF2 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a text similarity acquisition method and device to solve the problems that the existing text similarity calculation scheme is related to the order of words, the calculation method is relatively complicated, the amount of data calculation is large, and the performance is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text similarity obtaining method and device
  • Text similarity obtaining method and device
  • Text similarity obtaining method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] In order to make the object, technical solution and advantages of the present invention clearer, the implementation manner of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0024] see figure 1 , a method for obtaining text similarity provided in Embodiment 1 of the present invention, the method includes:

[0025] 11: Eliminate the stop words in each text according to the predetermined stop word rules, and extract the available words of the text;

[0026] In this embodiment, first use the word segmentation system based on statistics to perform word segmentation on the text to be processed, that is, remove the stop words according to the predetermined stop word rules, and obtain only available words (that is, the text except the stop words in the text) words) text. It is more difficult for a computer to process Chinese word segmentation than it is to process Western word segmentation. Word segmentation is...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text similarity obtaining method and a device. The text similarity obtaining method and the device can accurately and rapidly calculate similarity of texts, operation is simple, and data calculation amount is small. The text similarity obtaining method comprises the following steps. Stop words in all texts are removed according to scheduled stop word rules, and available words of the texts are extracted. Hash values of all the available words in the texts are calculated, and similar hash values of the text are obtained according to the hash values of all the available words in the texts. Similarity among different texts is obtained by utilizing the similar hash values of the texts.

Description

technical field [0001] The present invention relates to the technical field of text information processing, in particular to a text similarity acquisition method and device, which can be widely used in fields such as information retrieval, machine translation, automatic question answering system, web page deduplication, text clustering, and text mining. Background technique [0002] With the widespread application of computers and the popularization of the Internet, all kinds of information are rapidly expanding, which brings convenience to people and also brings about the problem of information overload. [0003] Text is the most important information carrier, and the processing and analysis of text documents has become one of the hot spots in data mining and information retrieval technology today. A basic and key issue in text processing technology is the calculation of text similarity. Text similarity calculation can calculate the similarity of different entries in a tex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 张雁飞
Owner BEIJING FEINNO COMM TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products