Document information extraction and mapping method and system

A document information and graphization technology, which is applied in text database clustering/classification, unstructured text data retrieval, semantic analysis, etc., can solve problems such as pruning processing of entity words that cannot be misrecognized, and reduce computer resource consumption , strong interpretability, and the effect of improving efficiency

Pending Publication Date: 2021-11-05
EAST CHINA INST OF COMPUTING TECH
View PDF2 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this patent cannot prune misrecognized entity words and obtain the classification of entity words, so as to achieve the purpose of extracting entity words from military demand documents

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document information extraction and mapping method and system
  • Document information extraction and mapping method and system
  • Document information extraction and mapping method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] In combination with the requirements, the present invention extracts entity and relationship attributes from the itemized data of the requirement documents and imports them into the graph database for storage and visualization, studies the underlying natural language understanding and natural language processing technology, and combines the open source natural language processing platform LTP Analyzed the word formation features of Chinese requirement documents from the aspects of lexical, syntactic, and semantic aspects, formulated the corresponding information extraction rules, used the Drools engine for rule maintenance, extracted the entity and relationship attributes in the requirement documents, and graphed them to form requirement knowledge Atlas.

[0063] According to the method for extracting and graphing demand management document information based on syntax and semantic rules provided by the present invention, the method includes the following steps:

[0064]...

Embodiment 2

[0093] Embodiment 2 is a preferred example of Embodiment 1.

[0094] The method for extracting entities of requirements management documents based on syntactic and semantic rules according to the present invention includes:

[0095] baseNP: Simple Non-Nested Noun Phrases - First proposed in English by Church in 1988. Chinese non-nested noun phrases are different from English. The formal description of Chinese baseNP (basic entity noun) is divided into 4 categories:

[0096] 1. baseNP→baseNP+baseNP

[0097] 2. baseNP→baseNP+noun / gerund

[0098] 3. baseNP→baseNP+noun / gerund

[0099] 4. baseNP→baseNP+noun / gerund

[0100] The definite attributives include: adjective|differential word|adverb|verb|noun|local word|English word|numeral|quantifier|.

[0101] Obtain the word formation features of the required document from the word features and dependency syntax tree, and formulate rules to extract entities by pattern matching. This process is actually the process of traversing all ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a document information extraction and mapping method and system, and the method comprises the steps: 1, obtaining word formation features of a document from word features and a dependency syntax tree through a natural language understanding technology and a natural language processing technology, formulating a rule, and carrying out the entity extraction through a mode matching method; 2, through a natural language understanding technology and a natural language processing technology, obtaining word formation features of the document from the word features and the dependency syntax tree, formulating rules, and extracting relation and corresponding entity attributes through a mode matching method; and 3, performing mapping on the extracted entity, relationship and attribute triad to generate a document atlas. Relation and attribute extraction can be carried out on the document based on syntax and semantic rules, data labeling and training are carried out without adopting a machine learning method, the extraction efficiency is improved, and computer resource consumption during extraction is reduced.

Description

technical field [0001] The present invention relates to the technical field of natural language understanding and processing, in particular to a method and system for document information extraction and mapping. In particular, it relates to a method for extracting and graphing management document information based on syntax and semantic rules. Background technique [0002] With the advent of the information and Internet era, the construction of information resources has become the core content of the current military information construction. Military equipment is rapidly updated and upgraded, military organizations and personnel redeployment planning, military tactics are introduced, and military project construction and demand tasks are increasing. The degree of automation of military information is required to be further improved. [0003] The precise analysis of data plays an increasingly prominent role in modern military intelligence research, and the existence of a la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/211G06F40/30G06F40/295G06F16/35G06F16/36
CPCG06F40/211G06F40/30G06F40/295G06F16/35G06F16/367
Inventor 牛硕硕王金华王盼盼李德启黄哲
Owner EAST CHINA INST OF COMPUTING TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products