Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for comparing text similarity

A text similarity and similarity technology, applied in the Internet field, can solve the problem of not considering the importance of word elements, and achieve the effect of improving accuracy

Active Publication Date: 2018-06-15
BEIJING QIHOO TECH CO LTD
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, the existing technology does not consider the importance of word elements in the text feature vector when comparing text similarity, so it may cluster two text information that are not actually of interest to the public at the same time
[0006] However, when the existing minimum hash algorithm is used to cluster text information, it does not consider the importance of each word element in the text, so it may combine two elements that are not of interest to the public at the same time. Text information clustered together

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for comparing text similarity
  • Method and device for comparing text similarity
  • Method and device for comparing text similarity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present disclosure will be more thoroughly understood, and will fully convey the scope of the present disclosure to those skilled in the art.

[0036] In the present invention, clustering refers to the process of dividing a collection of physical or abstract objects into multiple classes composed of similar objects. A cluster generated by clustering is a collection of data objects that are similar to objects in the same cluster and different from objects in other clusters.

[0037] see figure 1 , which shows a text information clustering method provided by a speci...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text similarity comparison method and device. The method comprises the following steps: extracting the initial feature vectors of more than two texts, wherein the weighted value of at least one element in the initial feature vectors is endowed with a value which is multiple of a minimum weighted value and the weighted values of the other elements are endowed with the minimum weighted value; adding corresponding elements in the initial feature vectors according to the multiple so as to form new feature vectors; and comparing the similarity of the more than two texts according to the new feature vectors. According to the text similarity comparison method and device provided by the invention, the correctness of text information representation can be improved and then the similarity comparison result more accords with the user demand.

Description

technical field [0001] The present invention relates to the field of Internet technology, and in particular, to a method and device for comparing the similarity of information. Background technique [0002] With the continuous development and popularization of Internet technology, the amount of information faced by users of text information such as news is increasing at an alarming rate, and the need for convenient access to text information that they are interested in is becoming more and more urgent. [0003] Due to the rapid increase in the amount of text information, the text categories are becoming more and more refined, and have strong real-time performance, often updated quickly and the timeliness is extremely short, so the text is effectively clustered to provide different users or provide to different applications. is very important. [0004] In the prior art, the feature vector of the text is first extracted, and then the similarity of the text is compared accordi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/3334G06F16/3335G06F16/3347G06F40/216G06F40/289
Inventor 张伸正魏少俊陈培军
Owner BEIJING QIHOO TECH CO LTD