Unlock instant, AI-driven research and patent intelligence for your innovation.

Multi-language compatible document information accurate extraction system

A document information and extraction system technology, which is applied in the field of accurate document information extraction system, can solve problems such as hindering effective information efficiency, immature document information extraction technology, and low accuracy of information extraction, so as to achieve enhanced expansion ability and language transplantation ability, and good Good effect of language porting, scalability, and language porting capabilities

Pending Publication Date: 2020-12-25
刘秀萍
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] One is that the number of electronic documents is increasing day by day, but a large part of the huge document library contains a lot of junk information, and the information in it is still very disordered. When faced with a large number of electronic document resources, the existing technology In order to obtain the required information, the method of reading through the full text or searching one by one is mainly adopted, which greatly hinders the efficiency of people's acquisition of effective information; there is a lack of compatibility with multiple electronic documents to intelligently, quickly and accurately obtain information of interest to users. Language document information accurate extraction system;
[0012] Second, although information extraction technology is an effective means to extract the required information from many documents, it is extremely difficult and complicated to build a general and feasible information extraction system with existing technologies. In the early days, information extraction was generally constructed based on knowledge engineering methods system, but because the extraction rules are all based on manual establishment, it takes a lot of time and resources, and is prone to errors, resulting in poor portability of the system;
[0013] The third is that the existing technology focuses on the information extraction method based on machine learning. According to whether the model training process uses a marked training sample set, it can be divided into supervised learning methods and unsupervised learning methods. Although there are Supervised machine learning methods are gradually enriched, but there are also bottlenecks such as the inability to quickly obtain many labeled training sample sets. However, unsupervised learning methods have solved this problem well, but this method is still in a blank state , there are some problems that need to be overcome urgently, such as the defect of feature space redundancy, etc. The document information extraction technology compatible with multiple languages ​​is immature and cannot meet the needs of the industry;
[0014] The fourth is that Chinese does not have the natural characteristics of English words such as space separators between words, which makes it more difficult to extract Chinese information. The existing technology mainly focuses on improving the accuracy of named entity recognition in Chinese information extraction. and recall rate and building a simple information extraction system, however, the design and implementation of a complex and robust Chinese information extraction system is relatively weak, and it is only suitable for the relationship extraction of small-scale training sets, and the accuracy is not high and the portability is poor. Generally, it is only used in specific fields, and there are defects such as weak interactive performance, low degree of intelligence, low scalability, slow extraction speed, poor language portability, and low accuracy of information extraction.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-language compatible document information accurate extraction system
  • Multi-language compatible document information accurate extraction system
  • Multi-language compatible document information accurate extraction system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0070] The technical solution of the document information accurate extraction system compatible with multiple languages ​​provided by the present invention will be further described in conjunction with the accompanying drawings, so that those skilled in the art can better understand the present invention and implement it.

[0071] Intelligently, quickly and accurately obtaining information that users are interested in from numerous electronic documents is becoming an urgent problem to be solved. The information extraction method of statistical machine learning in the existing technology has become a hot spot. Learning document information extraction architecture system, but most of them face defects such as weak interactive performance, low intelligence, low scalability, slow extraction speed, poor language portability, and low accuracy of information extraction. For this reason, the present invention proposes a universal, efficient and feasible document information accurate ex...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

According to the multi-language compatible document information accurate extraction system provided by the invention, a multi-language information extraction method of a support vector machine algorithm is realized, and according to the practical application of document classification, the result shows that the support vector machine algorithm has obvious advantages in the aspects of active learning ability and classification effect; a universal and feasible document information extraction architecture compatible with multiple languages is designed, and a document information accurate extraction system compatible with multiple languages is realized based on the architecture. The system is respectively applied to information extraction of Chinese and English science and technology news documents, and results show that the multi-language compatible document information extraction system is successful practice of entity relationship extraction, and has the advantages of good language transplantation and extendibility, strong interaction performance, high intelligent degree, high extendibility, high extraction speed, high information extraction precision and the like; the document information extraction system has remarkable innovativeness and outstanding advantages.

Description

technical field [0001] The invention relates to a document information accurate extraction system, in particular to a document information accurate extraction system compatible with multiple languages, belonging to the technical field of document information extraction. Background technique [0002] With the popularity of computers and office intelligence and the rapid development of the Internet, especially the mobile Internet, various forms of electronic documents are continuing to grow rapidly. In recent years, the popularity of the mobile Internet has further accelerated the popularization and application of electronic documents. People are closely related to electronic documents all the time in their daily life, work and study. Although the number of electronic documents is increasing day by day, a large part of the huge document library contains a lot of junk information, and the information in it is still very disordered. When faced with a large number of electronic d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N20/10G06F8/38G06F8/20G06F40/289G06F40/295
CPCG06N20/10G06F8/38G06F8/24G06F40/289G06F40/295G06F18/2411G06F18/2451
Inventor 刘秀萍王程
Owner 刘秀萍