Unlock instant, AI-driven research and patent intelligence for your innovation.

Zone defined text information extraction method and device

A text information and information extraction technology, applied in the field of text processing, can solve the problems of increasing the extraction time, increasing the workload, and reducing the extraction efficiency, so as to reduce the extraction time, improve the extraction efficiency and the operation speed.

Active Publication Date: 2018-02-23
ZHONGKE DINGFU BEIJING TECH DEV
View PDF6 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] This application provides a method and device for extracting text information in a limited area to solve the problem that existing text information extraction methods can only extract information from the entire text information, especially for common words, which can easily lead to a large number of extraction results , not only increases the extraction time and reduces the extraction efficiency, but also the staff need to filter the required extraction information in the extraction results, which increases the workload

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Zone defined text information extraction method and device
  • Zone defined text information extraction method and device
  • Zone defined text information extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Such as figure 1 As shown, in the first aspect, an embodiment of the present application provides a method for extracting text information in a limited area, including:

[0030] Step 101: Obtain text and an extraction rule expression corresponding to the text, where the extraction rule expression includes a positioning expression and an information extraction expression.

[0031] The text may be a document in doc format, a text document in txt format, or an html document, etc. The content of the text may be words, numbers, or a combination of words and numbers, which is not limited in this embodiment.

[0032] The text includes acquiring text information from user-generated content, preferably, including acquiring text information from news channels, microblog channels and forum channels, and using the text content in these channels as text information. Among them, the news channels include Sina, Netease, Sohu, Tencent, and "Today's Headlines", etc.; the microblog cha...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a zone defined text information extraction method and device. The zone defined text information extraction method comprises the steps that texts and extraction rule expressionscorresponding to the text are obtained, wherein the extraction rule expressions include positioning expressions and information extraction expressions; the positioning expressions are matched with the texts to obtain matching results; according to the matching results, starting words and ending words are determined; according to the starting words and the ending words, zones to be extracted of the texts are determined; according to the zones to be extracted, text information to be extracted is obtained; by utilizing the information extraction expressions, the text information to be extractedare matched; information matched with the information extraction expressions is extracted, and target information is obtained. By utilizing the positioning expressions in the extraction rule expressions, the zones to be extracted can be divided in the texts, then information extraction is conducted on the text information in the zones to be extracted, the extraction time is shortened, the extraction efficiency is improved, and the accuracy of information extraction is improved.

Description

technical field [0001] The present application relates to the technical field of text processing, in particular to a method and device for extracting text information in a limited area. Background technique [0002] With the explosive growth of Internet information, the contents of various documents are becoming more and more colorful. Since the information people need is hidden in various styles of content, it is increasingly difficult to find it. Therefore, people need to use information extraction methods to find the required information in relevant texts. [0003] At present, the information extraction method is mainly based on the HTML structure extraction method, which uses the HTML parser to scan the characters in the HTML text information one by one, analyzes the structural hierarchical relationship of the HTML text information, and numbers the same HTML tags sequentially from zero, and finally Form the DOM tree corresponding to the HTML text information, and then s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/22G06F17/27
CPCG06F16/322G06F16/3331G06F16/986G06F40/131G06F40/14G06F40/30
Inventor 席丽娜李德彦晋耀红
Owner ZHONGKE DINGFU BEIJING TECH DEV