Unlock instant, AI-driven research and patent intelligence for your innovation.

General text information extraction method and system

A text information and extraction system technology, which is applied in unstructured text data retrieval, special data processing applications, natural language data processing, etc., can solve the problem that the accuracy of extraction is not applicable, the accuracy is difficult to predict, and the limitation is strong. problem, to achieve the effect of avoiding manual labeling investment, expanding the scope of application, and reducing dependence

Pending Publication Date: 2018-05-01
FUJIAN YIRONG INFORMATION TECH +3
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] Since the pattern can be automatically induced by the algorithm (forming an "extraction model"), the method based on the "pattern induction" has the advantage of strong adaptability. The extraction of key information that is difficult to observe specific rules) has a strong extraction ability; its disadvantages are:
[0014] The rule-based method has a stable extraction effect and does not require manual labeling, but it is too restrictive and the matching range is relatively small, so it is not suitable for text extraction without a fixed template;
[0015] The pattern automatic induction method has a large matching range, but it needs to prepare a lot of manually labeled corpus in advance, and the extraction effect is unstable and the accuracy is difficult to predict. It is not suitable for scenarios with strict requirements on the extraction accuracy.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • General text information extraction method and system
  • General text information extraction method and system
  • General text information extraction method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] Such as image 3 As shown, the general text information extraction method of the present invention comprises the following steps:

[0046] Step 1. Write a limited number of regular expressions to extract the original corpus, and extract the text corpus and field corpus;

[0047] Step 2, cutting out a limited proportion of the text corpus from the extracted text corpus as the training text corpus, and obtaining the field corpus corresponding to the training text corpus as the training field corpus;

[0048] Step 3, importing the fields of training text corpus, training field corpus and each training field corpus corresponding to the front and rear limited number into the automatic pattern induction method, constructing the extraction model, and the automatic pattern induction method is the CRF algorithm;

[0049] Step a. Use the remaining corpus in step 2 as the verification corpus, extract the verification corpus through the extraction model, and judge the accuracy of ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a general text information extraction method. According to the method, a finite number of regular expressions are written to extract an original corpus; a finite proportion of corpus is divided from the extracted corpus to serve as a training corpus; an extraction model is constructed for the training corpus through an automatic mode induction method; and extraction is performed through the extraction model. The invention furthermore provides a general text information extraction system which is suitable for service scenes with different requirement levels and beneficialfor cultivating engineers meeting the corresponding requirements and forming "assembly line" operation of text key information extraction.

Description

technical field [0001] The invention relates to a general text information extraction method and system. Background technique [0002] With the continuous development of big data technology, massive and heterogeneous "unstructured documents" that were difficult to use, such as various web pages and office documents, have gradually gained attention. The demand for analysis and mining of unstructured documents However, the existing mainstream big data technology mainly solves the storage and calculation problems of massive data, and the mining and analysis of unstructured data has not been widely used and has not formed a stable technical route. [0003] The existing extraction of key information in text, the so-called "text key information extraction", refers to the automatic extraction of structured information from unstructured "unstructured documents" according to specific business needs. There are two types of schemes that are more mainstream: [0004] 1. Extraction bas...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/22G06F17/30
CPCG06F16/36G06F40/131
Inventor 倪时龙苏江文宋立华王秋琳陈颖华
Owner FUJIAN YIRONG INFORMATION TECH