Classification and Grading System of Electronic Official Documents Based on Template

An electronic document and grading system technology, which is applied in text database clustering/classification, text database query, electronic digital data processing, etc., can solve problems such as poor applicability and false positives in the screening process of sensitive words, and achieve the effect of strong applicability

Active Publication Date: 2022-04-29
STATE GRID HEILONGJIANG ELECTRIC POWER CO LTD ELECTRIC POWER RES INST +2
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] In order to solve the problem of poor applicability in the unified setting of sensitive fonts in the existing information security supervision means and the situation of many false positives in the sensitive word screening process that only matches sensitive words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Classification and Grading System of Electronic Official Documents Based on Template
  • Classification and Grading System of Electronic Official Documents Based on Template
  • Classification and Grading System of Electronic Official Documents Based on Template

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0043] Specific implementation mode one: combine figure 1 To describe this embodiment,

[0044] A template-based electronic document classification and grading system, including:

[0045] Sensitive words and stop words management module, which is used to provide users with setting operations of sensitive words and stop words; Sensitive words; users can set stop words according to Chinese usage habits;

[0046] The sensitive words mentioned are key words or parameters that the user considers to be confidential or possibly confidential in the file or page;

[0047] The above stop words refer to certain words or words that are automatically ignored when the scanning module is scanning to index pages or process search requests in order to save space and improve search efficiency; in a general sense, stop words roughly include tone Auxiliary words, adverbs, conjunctions, etc., usually have no clear meaning by themselves, and only have a certain effect when they are put into a co...

specific Embodiment approach 2

[0054] The scanning module described in this embodiment includes a file scanning submodule and a URL scanning submodule:

[0055] The file scanning submodule is used to provide full-text text extraction for office documents such as Office series documents and PDF; for compressed files such as ZIP and RAR, it provides decompression and then performs file type determination and text extraction operations, and supports Nested recursive decompression of compressed files;

[0056] The URL scanning sub-module is used to scan the URL (Uniform Resource Locator, Uniform Resource Locator) of the specified location, and use the search engine crawler technology to crawl recursively according to the set number of crawling layers, so as to realize the text extraction of HTML pages and page attachments; In the form of attachments, it also supports office documents such as Office series, PDF and other document types, as well as text extraction of compression types such as ZIP and RAR;

[005...

specific Embodiment approach 3

[0058] The file scanning sub-module described in this embodiment encapsulates the text content extraction of different files, that is, only a single interface is provided to realize the content extraction of documents such as Office and PDF. When the URL scanning sub-module extracts HTML content, the encoding of the processed text is UTF-8 by default.

[0059] The scanning module has designed a unified multi-format text content extraction interface: it supports the text content extraction of Office and PDF, and the text extraction of HTML and attachments. Because the text extraction methods of different types of documents are different, even different versions of the same type of documents have Differences, such as Office 2003 and Office 2007, extracting file content separately will lead to interface complexity and lower maintainability. In view of the above situation, the text content extraction of different files is used to encapsulate, that is, only a single interface is pro...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A template-based electronic document classification and grading system relates to an electronic official document classification and grading system. The present invention solves the problem of poor applicability in the unified setting of sensitive fonts in the existing information security supervision means and the situation of many false positives in the process of checking sensitive words only for matching sensitive words. The present invention includes a sensitive word and stop word management module for providing setting operations of sensitive words and stop words; a source file learning module that learns and generates templates according to sensitive words input by users and imported source files; A scanning module for text extraction of detection files; a template management module used to support selecting and exporting templates and source files uploaded by superior departments in an enterprise intranet environment, and only exporting templates in a non-enterprise intranet environment; used to convert text A secret-related matching module that performs sensitive word matching according to the exported template, and judges the similarity of paragraphs and full texts. The invention is used for classification and hierarchical management of electronic official documents.

Description

technical field [0001] The invention relates to a system for classifying and grading electronic official documents. Background technique [0002] In today's information-based society, the daily work of government departments at all levels, enterprises and institutions is inseparable from the application of computer systems. The company's various electronic documents involve many types and are widely distributed. At the same time, they are stored in various storage media And the website also contains various important information and work materials of the company. Ensuring the security of these data has become a direction of information security work. The business data of the government and large enterprises and institutions is important basic data, and data leakage will cause major economic losses and serious security risks to the country and users. Therefore, when the headquarters distributes electronic documents, it is necessary to classify various electronic documents, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F16/33
CPCG06F40/289
Inventor 尚方冉庆辉孙立业景菲韩冰张凯王孝余刘生
Owner STATE GRID HEILONGJIANG ELECTRIC POWER CO LTD ELECTRIC POWER RES INST
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products