Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text information extraction method and device, computer equipment and storage medium

A text information and extraction method technology, applied in the field of data processing, can solve the problems of easy omissions and errors, large workload of information extraction, and high labor costs, to avoid omissions and errors, avoid infinite backtracking, and improve efficiency.

Pending Publication Date: 2021-06-18
南京星云数字技术有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problems of the prior art, the embodiment of the present invention provides a long text information extraction method, device, computer equipment and storage medium to overcome the heavy workload, low efficiency and high labor cost of information extraction existing in the prior art , and prone to omissions and errors

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text information extraction method and device, computer equipment and storage medium
  • Text information extraction method and device, computer equipment and storage medium
  • Text information extraction method and device, computer equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0056] Specifically, such as figure 1 As shown, taking the relevant disclosure text of the fund as an example, the process of using the above method to extract information from the long text of the fund announcement includes:

[0057] Step 1. Obtain the original long text sequence of the information to be extracted, and the original long text sequence includes the long text of the fund announcement;

[0058] Specifically, the to-be-extracted texts obtained here mainly include relevant types of disclosure texts such as fund information disclosure prospectuses, fund contracts, etc. obtained from official websites that publicly disclose information. What needs to be explained here is that the fund information disclosure prospectus, fund contract and other related types of disclosure texts in the embodiments of the present invention are only exemplary descriptions and do not limit the embodiments of the present invention. Except for the above-mentioned long texts, the present inve...

Embodiment 2

[0086] image 3 is a flow chart of a method for extracting text information according to an exemplary embodiment, refer to image 3 As shown, the method includes the following steps:

[0087] S1: Obtain text to be extracted and an extraction rule corresponding to the text to be extracted, where the extraction rule includes an extraction field.

[0088] Specifically, the texts to be extracted include, but are not limited to, fund information disclosure prospectuses, long texts with a fixed directory structure for fund contracts. It should be noted here that the information extraction method provided by the embodiment of the present invention can also be applied to information extraction of other long texts with relatively standardized structure and style. Extraction rules include configuration file regular statements and custom rules. The custom rules are mainly used to configure the fields and other information that users need to extract. The custom rules can be adjusted accor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text information extraction method and device, computer equipment and a storage medium, and the method comprises the steps: obtaining a to-be-extracted text and an extraction rule corresponding to the to-be-extracted text, determining a chapter position of each piece of directory information in a file directory in a to-be-extracted text according to the file directory of the to-be-extracted text, generating chapter information, dividing the chapter information according to a preset rule, generating a corresponding division list, generating key value pair information corresponding to the to-be-extracted text according to the division list and the extraction rule, and storing the key value pair information in a database. According to the method, on the one hand, the text extraction efficiency is improved, the problems of information extraction omission, errors and the like are avoided, and the text extraction accuracy is improved, and on the other hand, by splitting the long text, the infinite backtracking situation possibly encountered in regular matching can be avoided, the error-tolerant rate of codes is increased, and the overall operation time consumption is reduced.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a text information extraction method, device, computer equipment and storage medium. Background technique [0002] The announcement text information in the financial field is usually very cumbersome, such as common public prospectuses, contract announcements and other related types of text. They are often assembled from information on the order of hundreds of pages. For fund information extraction tasks, the common processing methods in the industry are generally to copy and extract information through manual operation and maintenance, or simple regular expression extraction. [0003] However, there are some obvious disadvantages in the above-mentioned traditional processing methods. For example, the purely manual method of extracting information has a very heavy workload and involves a lot of repetitive work, which is inefficient and high in labor costs. For simple re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/211G06F40/284G06Q40/06
CPCG06F40/211G06F40/284G06Q40/06
Inventor 孟泽洋
Owner 南京星云数字技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products