Unlock instant, AI-driven research and patent intelligence for your innovation.

A method and device for evaluating text similarity

A text similarity and similarity technology, applied in the direction of instruments, calculations, electrical digital data processing, etc., can solve the problems of poor accuracy and low calculation efficiency, and achieve the effect of improving accuracy and speed of operation

Inactive Publication Date: 2019-01-11
ALIBABA (CHINA) CO LTD
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Embodiments of the present invention provide a text similarity evaluation method and device to solve the problems of poor accuracy and low calculation efficiency of existing text similarity evaluation methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and device for evaluating text similarity
  • A method and device for evaluating text similarity
  • A method and device for evaluating text similarity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] First, the embodiment provided by the text similarity evaluation method of the present invention is described, see figure 1 , a schematic flow chart of an embodiment provided for the text similarity evaluation method of the present invention, this embodiment includes the following steps:

[0051]Step 101 : After the two target texts to be evaluated are segmented into sentence units to obtain a word segment set, an effective word segment set is screened from the word segment set.

[0052] A sentence unit is a "sentence" understood in a general sense, which can be obtained by segmenting the body of the text through certain punctuation marks contained in the text. These punctuation marks are generally used to represent pauses or semantic transitions, such as comma ",", period ".", semicolon ";", exclamation mark "!", ellipsis "..." and so on.

[0053] There are many word segmentation methods in the prior art, such as forward maximum matching word segmentation method, reve...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text similarity assessment method and device. The method comprises the following steps: after two target texts to be assessed are subjected to word segmentation according to a statement unit to obtain word segmentation sets, screening an effective word segmentation set from the word segmentation sets; and carrying out statistics on the independent occurrence frequencies of words contained in the effective word segmentation set in the two target texts, calculating the vector cosine values of the target text according to the frequencies, and determining the similarity of the target texts according to the vector cosine values. The method further carries out optimal screening on a plurality of word segmentation results corresponding to the statement unit on the basis of carrying out word segmentation on the statement unit, so that one effective word segmentation set corresponding to the statement unit is screened so as to improve the accuracy of a word segmentation result. In addition, through an assessment algorithm that the vector cosine values of the target texts are calculated and the similarity of the target texts is determined according to the vector cosine values, compared with an assessment method of word-by-word comparison, the text similarity assessment method obviously improves operating speed.

Description

technical field [0001] The invention relates to the technical field of mobile communication, in particular to a text similarity evaluation method and device. Background technique [0002] With the rapid development of communication and network technology, the Internet has become an important platform for users to publish and obtain information. Among the massive Internet text information, some text information has high similarity or correlation with each other in subject or content, resulting in high redundancy of information. Therefore, it is necessary to evaluate the similarity of these text information through the method of similarity evaluation, and then deduplicate and classify them, so as to manage these information resources more accurately and efficiently. [0003] Existing text similarity evaluation methods are generally based on word-by-word comparison, that is, the two texts to be compared are divided into words (or strings), and then the words contained in the t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27
Inventor 梁捷尹兵
Owner ALIBABA (CHINA) CO LTD