Online novel content similarity comparison method

A network novel and approximation technology, which is applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of large amount of calculation and slow processing speed, avoid missing checks, improve the accuracy of duplicate checks, Avoid complex effects

Inactive Publication Date: 2013-07-17
CHINSESALL DIGITAL PUBLISHING GRP CO LTD
View PDF2 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, web novels have different characteristics from web pages and academic papers, and the existing direct processing methods for web pages and academic papers are computationally intensive and slow in processing speed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Online novel content similarity comparison method
  • Online novel content similarity comparison method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0012] The particularity of the content of the web novel is that it has a special chapter, a specific character, a specific time and space, and the combination of these three aspects constitutes a web novel's own characteristics. Therefore, the approximation comparison of the method of the present invention is mainly aimed at: (1) complete repetition of important chapters; (2) plot repetition of core characters; (3) time-space mapping conversion.

[0013] Such as figure 1 As shown, the present invention first preprocesses the online novel to be compared, then performs feature extraction, and compares the features of the existing online novel stored in the feature comparison library, thereby determining the difference between the online novel to be compared and the existing online novel. approximation. The approximation comparison method of the present invention will be described in detail below.

[0014] 1. Pretreatment:

[0015] 1) Synonym replacement

[0016] Preferably,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an online novel content similarity comparison method which comprises the following steps that a pre-processing step is carried out: a compared online novel is pre-processed, key words are extracted, and the key words are replaced by synonyms to form a standard online novel; a characteristic fingerprint is extracted: the online novel to be compared is divided into a plurality of groups of close order words, a Hash table which is formed by the Hash operation of each group of close order words serves as the characteristic fingerprint; and characteristic fingerprints are compared: the characteristic fingerprint is compared with the characteristic fingerprints of existing online novels which are stored in a characteristic comparison library, and the similarity of the online novel to be compared and the existing online novels is determined according to the quantity or proportion of the same characteristic fingerprints. The method is used for comparing the similarity of the online novels, solves the problems of low efficiency caused by high complicacy of the method by pre-processing and checking omission caused by low purposefulness, and has self-learning ability, and the repetition checking accuracy is improved.

Description

technical field [0001] The present invention relates to a method for comparing content similarity, and more particularly, to a method for comparing content similarity for online novels. Background technique [0002] With the rapid development of Internet technology, there are more and more various digital contents on the Internet, showing an explosive growth trend, including pages of various contents, academic papers, online novels, student assignments, etc., facing so many digital contents , it is a very important technology to realize automatic duplicate checking (approximation comparison) of content. Repeated retrieval of search engine content, plagiarism detection of published content, non-repeated entry of stored content, and identification of relevant research content can be realized through duplicate checking. Chaxin and so on. [0003] Duplicate checking technology is derived from copy detection technology. Copy detection is to judge whether the content of a file i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 刘瑞虹姜波
Owner CHINSESALL DIGITAL PUBLISHING GRP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products