Device and method for detecting similar texts

A text detection and text technology, applied in the computer field, can solve problems such as large amount of calculation, high complexity of text calculation, similarity, etc., and achieve the effect of reducing the amount of calculation

Inactive Publication Date: 2014-02-26
BEIJING QIHOO TECH CO LTD +1
View PDF2 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in network applications, there are a large number of variants of similar texts, such as using traditional characters, applying pinyin instead of text, using homophones instead of original characters, adding a large number of meaningless interfering characters, etc., the above-mentioned technology has the following disadvantages: (1) There are errors in word segmentation results; (2) Texts with the same pronunciation and different characters cannot be judged as similar; (3) Two texts that have undergone pinyin processing cannot be recognized as similar texts; (4) The computational complexity of the text is too high (For example, expressing text as a vector requires a large amount of calculation), which cannot meet the real-time requirements of the current large amount of data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Device and method for detecting similar texts
  • Device and method for detecting similar texts
  • Device and method for detecting similar texts

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040]Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present disclosure will be more thoroughly understood, and will fully convey the scope of the present disclosure to those skilled in the art.

[0041] figure 1 A flow chart of a similar text detection method according to an embodiment of the present invention is shown. figure 2 show figure 1 The detailed flow chart of step S100, step S200 and step S300 in the middle. The method includes the following steps S100, S200, S300 and S400.

[0042] S100. Perform text processing on the text to be detected to obtain Chinese text.

[0043] By acquiring the Chinese text f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a device and a method for detecting similar texts. The device and the method are used for recognizing the similar texts. The device comprises a Chinese text acquiring unit, a pinyin text acquiring unit, a fingerprint acquiring unit and a detecting unit, wherein the Chinese text acquiring unit is suitable for processing texts so as to acquire Chinese texts; the pinyin text acquiring unit is suitable for transforming Chinese characters in the acquired Chinese texts into pinyin so as to obtain pinyin texts; the fingerprint acquiring unit is suitable for extracting features of the pinyin texts, and forming the feature vector of the pinyin texts by using the extracted features; and the detecting unit is suitable for judging whether the texts to be detected are matched with records in a database or not according to the feature vector. By using the device and the method, the Chinese texts can be obtained by the texts to be detected, then the pinyin texts are obtained, the feature vector of the pinyin texts is formed, whether the texts to be detected are matched with the records in the database or not is judged according to the feature vector, and beneficial effects of reducing operand and accurately recognizing variety of similar texts are achieved.

Description

technical field [0001] The invention relates to the field of computers, in particular to a similar text detection device and method. Background technique [0002] With the rise of network applications such as Q&A communities, a large number of texts have appeared on the Internet, such as users' questions and answers. However, a large number of advertisement information is flooded in network applications, which brings a lot of inconvenience to users to find information, and also reduces the quality of web applications. In order to solve this problem, the research work of text similarity calculation is gradually carried out, hoping to find spam information such as advertisements by calculating text similarity. [0003] A similar text detection method is: first extract the features of the text (such as segmenting the text, extracting entity words) and use various techniques to expand the features (such as using a synonym word forest, synonym dictionary and other knowledge base...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/3344
Inventor 孙林陈培军秦吉胜
Owner BEIJING QIHOO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products