Open type document isomorphism engines system

An open and document-based technology, which is applied in the system field of information security technology, can solve the problems that multi-format documents cannot be processed uniformly, and achieve the effect of more accurate concept than word specification and weight calculation

Inactive Publication Date: 2009-10-21
SHANGHAI JIAOTONG UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is to overcome the deficiencies of the prior art, to provide an open document isomorphism engine system, which can be used to extract the plain text content of multi-format documents and the semantics it represents, and solve the problem that cannot be solved for multi-format documents. Unified processing of issues that can be applied to semantic and Internet content security analysis projects

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Open type document isomorphism engines system
  • Open type document isomorphism engines system
  • Open type document isomorphism engines system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] Embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. This embodiment is carried out on the premise of the technical solution of the present invention, and the detailed implementation and specific operation process are given, but the protection scope of the present invention is not limited to the following embodiments.

[0035] Such as figure 1 As shown, under the guidance of the ODLM theory, the present invention implements an engine suitable for the actual environment—the Open Document Isomorphism Engine (ODIE) system. According to the actual needs of natural language processing related technologies, the processing process of electronic documents is theoretically divided into five levels, which are: physical structure layer, logical structure layer, words, syntactic analysis layer, concept extraction layer, topic presentation layer Wait for 5 levels. In technical implementation, the five levels correspo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An open document isomorphism engine system in the field of information security technology, wherein: the physical structure module accepts the input of various documents, and outputs the physical structure of the document to the logical structure module; the logical structure module processes the information input by the physical structure module Process to obtain the logical structure of the document, and input it to the lexical and syntactic analysis module; the lexical and syntactic analysis module receives the information input by the logical structure module, and processes the information to obtain the analyzed document, and the obtained The document is input into the concept extraction module; the concept extraction module processes the information input by the lexical and syntactic analysis module to obtain the concept and concept attributes converted from the words in the document, and inputs the obtained concepts and concept attributes into the topic representation module; topic The representation module processes the information input by the concept extraction module to obtain the document theme in concept units. The invention solves the problem that multi-format documents cannot be processed uniformly.

Description

technical field [0001] The present invention relates to a system in the technical field of information security, in particular to an open document isomorphic engine system (ODIE-Open Document Isomorphic Engine). Background technique [0002] In the field of content security, all content security products based on text information must perform semantic understanding of text and filter bad information. Such products all face a unified problem, that is, to extract plain text information for understanding and filtering from various documents. Due to the complexity and diversity of document formats in reality, most systems avoid this difficult problem, resulting in low accuracy of these systems. [0003] At present, there are two difficult problems in the process of obtaining plain text information: (1), how to deal with various original document formats and obtain plain text information therefrom. According to the degree of structure, various electronic documents in reality ca...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 刘功申杨金升王士林
Owner SHANGHAI JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products