Unlock instant, AI-driven research and patent intelligence for your innovation.

A Genetic Quantification and Representation Method for Chinese Documents Based on Numeric-String Mixed Coding

A mixed coding and string technology, applied in the field of gene quantification and characterization of Chinese documents, can solve the problem of low matching accuracy, improve the protection ability, facilitate storage and matching, and prevent unauthorized reading.

Active Publication Date: 2021-03-30
北京云量数盟科技有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, phrase vectors often have low matching accuracy in matching methods, but other document statistical features such as word count, line count, paragraph count, and other numerical vectors can often play a key role.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Genetic Quantification and Representation Method for Chinese Documents Based on Numeric-String Mixed Coding
  • A Genetic Quantification and Representation Method for Chinese Documents Based on Numeric-String Mixed Coding
  • A Genetic Quantification and Representation Method for Chinese Documents Based on Numeric-String Mixed Coding

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] In order to make the above objects, features and advantages of the present invention more obvious and understandable, the present invention will be further described below through specific embodiments and accompanying drawings.

[0040] figure 1 It is a schematic diagram of the genetic composition elements of the document. Document gene is composed of document carrier feature, document attribute feature and document content feature. Document carrier characteristics are composed of file name, file size, file creation time, file modification time and file hash value (including MD5, SHA1, SHA265 and SHA512); document attribute characteristics are composed of inherent attributes and statistical attributes, and inherent attributes include Document type, document title, document category, document note, document author, document revision number, document last saver, statistical attributes include document word count, document sentence count and document paragraph count; do...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a numeric value-character string hybrid coding-based Chinese document gene quantification and representation method aiming at a Chinese document gene with 28 hybrid features and orienting document protection and matching detection scenes. The method comprises the following steps of: separating features of a numeric value vector and a character string vector to carry out separate quantification; further defining elements in the features and encoding separators between the features, and finally writing a hexadecimal data block into a file by taking hexadecimal hybrid encoding as a uniform encoding manner, so as to form a document gene encoding file. According to the method, the document protection ability is greatly improved, non-permission reading is prevented, the storage and matching of special documents in internet environment are convenient, and the realizability and practicability are relatively strong.

Description

technical field [0001] The invention belongs to the fields of natural language processing, feature selection and extraction, and format document encoding, and in particular relates to a method for quantifying and characterizing Chinese document genes based on numerical-character string mixed encoding. Background technique [0002] Paperless office refers to a way of working in a paperless office environment without using paper. Paperless office requires the cooperation of hardware, software and communication network to achieve office experience. With the acceleration of modernization and informatization construction, paperless office has been gradually applied from concept to many industries. The paperless office has led to a surge of internal documents in companies and departments. Among these documents, there are many internal and confidential documents. Internal documents and confidential documents cannot be easily viewed and circulated. Therefore, a method is needed It ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/126
CPCG06F40/126
Inventor 李岩
Owner 北京云量数盟科技有限公司