Text similarity comparison method

A text similarity and comparison technology, applied in natural language data processing, special data processing applications, network data retrieval, etc., can solve problems such as huge number of times, heavy workload, low efficiency and accuracy

Active Publication Date: 2017-08-04
CHINSESALL DIGITAL PUBLISHING GRP CO LTD +1
View PDF5 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] When performing content similarity comparison, since there are tens of millions of works in the works library, the number of works captured by the network infringement tracking development platform from the Intern

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text similarity comparison method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0071] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to explain the present invention, and are not intended to limit the present invention.

[0072] The network infringement tracking development platform monitors and tracks textual digital works, therefore, the content similarity comparison technical solution for works is aimed at text content. Text comparison is performed on the plain text content after data processing. The architecture design of the comparison system and the design of the comparison algorithm have a great impact on the efficiency of the entire tracking platform.

[0073] In the embodiment of the present invention, the text similarity comparison adopts a distributed architecture in the system architecture, and adopts ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text similarity comparison method, and relates to the technical field of network works comparison. In the method, on system architecture, a distributed architecture is used for text similarity comparison, and a multi-granularity hierarchical algorithm is used on comparison algorithm, including similarity comparison using documents as coarse granularity and similarity comparison using subsection texts as fine granularity, so that relatively good equilibrium is obtained on efficiency and accuracy of content similarity comparison, and the following performance indexes are realized, on established test data, average missed detection rate and false alarm rate is <= 10%, and comparison response time is <= 0.1 s.

Description

technical field [0001] The invention relates to the technical field of online works comparison, in particular to a text similarity comparison method. Background technique [0002] At present, with the rapid development of Internet technology, the speed of dissemination of online works is getting faster and wider, and the scope of dissemination is wider and wider, and there are more and more infringements on online works. In order to solve the infringement of works that occur through the Internet, you can Use the network infringement tracking development platform to monitor and track works. [0003] In the tracking process, the web crawler technology is mainly used to grab works from the Internet first, and then compare the content similarity with the works stored in the works library of the network infringement tracking development platform, so as to confirm whether the network works are infringing works . [0004] Among them, web crawlers (also known as web spiders, web r...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/951G06F40/289G06F40/30
Inventor 张国文
Owner CHINSESALL DIGITAL PUBLISHING GRP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products