Chinese document gene matching method based on multi-weight system

A matching method and document technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as calculation, and achieve the effect of strong practical application, strong realization, and improved matching accuracy.

Active Publication Date: 2017-12-19
北京云量数盟科技有限公司
View PDF2 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in some practical applications, some factors play a decisive role

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese document gene matching method based on multi-weight system
  • Chinese document gene matching method based on multi-weight system
  • Chinese document gene matching method based on multi-weight system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] In order to make the above objects, features and advantages of the present invention more obvious and understandable, the present invention will be further described below through specific embodiments and accompanying drawings.

[0032] figure 1 It is a schematic diagram of the genetic composition elements of the document. Document gene is composed of document carrier feature, document attribute feature and document content feature. Document carrier characteristics are composed of file name, file size, file creation time, file modification time and file hash value (including MD5, SHA1, SHA265 and SHA512); document attribute characteristics are composed of inherent attributes and statistical attributes, and inherent attributes include Document type, document title, document category, document note, document author, document revision number, document last saver, statistical attributes include document word count, document sentence count and document paragraph count; do...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Aiming at the Chinese document gene matching and orienting to document evasion checking scenarios, the present invention proposes a gene feature matching method of 28 hybrid documents. In particular, a multi-weight system is introduced for the first time to reflect the consideration of gene differentiation within and among systems and to form a unified similarity calculation formula. Based on the document gene matching method provided by the present invention, the weight can be finely configured, the process of algorithm condition jump can be reduced, and the achievability and practical applicability are strong.

Description

technical field [0001] The invention belongs to the technical field of document similarity and deduplication, and in particular relates to a Chinese document gene matching method based on a multi-weight system. Background technique [0002] The high-speed and massive data of the Internet contain intricate and various documents. In the process of dissemination of the same document on the Internet, there will be cases where partial modifications such as additions and deletions are made to the content of the document, which results in variants of the document targeting the same content. This has caused a certain degree of obstacles and challenges to issues such as document matching, traceability, and dissemination analysis. [0003] Document gene is an important technical means to solve the above problems. It mainly refers to extracting several features from the document and effectively combining them to form a relatively unique representation that describes the essence of t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/22
CPCG06F40/126G06F40/194
Inventor 李岩
Owner 北京云量数盟科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products