Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for batch structuring of large-scale text information

A text information and structured technology, applied in the computer field, can solve problems such as unsatisfactory demand, waste of human resource costs, and increased overhead

Active Publication Date: 2020-10-23
TSINGHUA UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method is not only very inefficient, but also wastes a lot of human resource costs and increases overhead.
In addition, this method also has great limitations. It is necessary to hire a large number of corresponding personnel to analyze and extract each type of different text information, and it is not reusable. , is not a very good method and cannot meet the needs of today's big data era

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for batch structuring of large-scale text information
  • Method and device for batch structuring of large-scale text information
  • Method and device for batch structuring of large-scale text information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are the Some, but not all, embodiments are invented. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0032] see figure 1 , this embodiment discloses a method for batch structuring of large-scale text information, including:

[0033] S1. Establish different segmentation and extraction rules according to the target information items of different text information, and provide a rule input interface in the form of a configuration file;

[0034] In this step, completely d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a large-scale text information batch structuring method and apparatus, which can process a large amount of text information in batches in a short time. The method comprises thesteps of establishing different segmentation and extraction rules according to target information of different text information, and providing a rule input interface in a configuration file form; according to corresponding segmentation and extraction rules, finishing automatic information extraction operation on single text information in sequence in a pipeline processing way; and according to data type format and length of a target information item, establishing a database relationship table, converting the text information subjected to automatic information extraction into structured records, storing the structured records in the database relationship table, inferring candidate information items from successfully extracted structured record data by using a statistic machine learning method for the text information from which key information is not successfully extracted, and correcting existing rules according to the candidate information items and the corresponding text informationand rule contents.

Description

technical field [0001] The invention relates to the field of computers, in particular to a method and device for batch structuring of large-scale text information. Background technique [0002] In recent years, with the advent of the era of big data, the rapid growth of data has become a common opportunity and challenge for many industries. The part of "opportunity" is that by analyzing a large amount of data, data owners can dig out many frequent patterns and obtain a lot of potential information, and can predict the future trend and development of related industries based on this information, so as to make corresponding decisions , to obtain a large amount of income; and the "challenge" part is that although through the Internet, anyone can easily obtain a large amount of data, and even some professionals can crawl the Internet more quickly and efficiently through crawlers. However, how to efficiently manage and utilize such massive data has become a difficult problem. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/22G06F16/2455G06F16/28G06F16/2458
CPCG06F16/2282G06F16/24564G06F16/2462G06F16/284
Inventor 汪东升蔡尚铭徐涛
Owner TSINGHUA UNIV