Method, system and device for extracting structured data and storage medium

A technology of structured data and extraction methods, applied in the field of data analysis, can solve problems such as inability to solve in-depth customization requirements, customization or customization difficulties, limited data scope, etc., to reduce customization difficulty and rule modification difficulty, reduce implementation costs and Labor cost, the effect of improving efficiency

Active Publication Date: 2018-12-11
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF8 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] 1) The range of data that can be extracted and structured in the general field is relatively limited (limited by general entity recognition, usually only the names of people, places, institutions, etc.), and the accuracy of extraction is low (limited by dependency analysis and rules ), usually does not support customization or customization is difficult (professional personnel are required to mine and formulate extraction rules)
[0009] 2) It cannot be solved or needs to rely on a large amount of manpower to solve the deep customization needs of the professional field

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method, system and device for extracting structured data and storage medium
  • Method, system and device for extracting structured data and storage medium
  • Method, system and device for extracting structured data and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] figure 1 It is a flow chart of a method for extracting structured data provided by Embodiment 1 of the present invention. This embodiment of the present invention is applicable to the case of converting input text into structured data. The method can be executed by the structured data extraction system provided by the embodiment of the present invention, the structured data extraction system can be implemented in the form of software and / or hardware, and can be integrated into a server that provides text structured services for users . Such as figure 1 As shown, the method specifically includes:

[0044] S110, through the online recognition subsystem, recognize the input text based on the online recognition model to output structured data.

[0045] In this embodiment, the structured data extraction system can be divided into an online identification subsystem and an offline labeling subsystem. The online identification subsystem can be a server that provides data ...

Embodiment 2

[0057] figure 2 It is a flow chart of a method for extracting structured data provided by Embodiment 2 of the present invention. This embodiment is further specificized based on the above-mentioned embodiments, and the steps are: obtain the user's data through the customized interface of the offline labeling subsystem. Customize the data, and adjust the offline recognition model according to the customized data, which is embodied as: through the interface of at least two model adjustment modules corresponding to the recognition sub-model in the offline labeling subsystem, obtain the user's customized data, respectively The identified submodels are tuned. Such as figure 2 As shown, the method specifically includes:

[0058] S210, through the online recognition subsystem, recognize the input text based on the online recognition model to output structured data.

[0059] Specifically, the online recognition subsystem may include an entity recognition subsystem, a relationship...

Embodiment 3

[0078] Figure 3a A flow chart of a method for extracting structured data provided by Embodiment 3 of the present invention. This embodiment is further embodied on the basis of Embodiment 2 of the present invention, and the online recognition subsystem is embodied as an entity recognition subsystem , meanwhile, the recognition sub-model specifically includes a dictionary rule recognition sub-model and an entity deep learning sub-model, and the input text is specifically an unstructured text. Such as Figure 3a As shown, the method specifically includes:

[0079] S310, through the entity recognition subsystem, recognize the unstructured text based on the online recognition model to output structured data.

[0080] In this embodiment, the entity recognition subsystem can be used to identify entities, where entities can refer to names of people, institutions, places and all other entities identified by names, and can also refer to time, numbers, currency, addresses Wait.

[0...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a method, a system and a device for extracting structured data and a storage medium. The method comprises the following steps: recognizing input text based onan on-line recognition model by an on-line recognition subsystem to output structured data; obtaining customized data of the user through a customized interface of an offline marking subsystem, and adjusting an offline identification model according to the customized data; updating an on-line identification model of the off-line identification subsystem according to the off-line identification model by the off-line identification subsystem, wherein the on-line identification model corresponds to the off-line identification model. The embodiment of the invention can update the identification model according to the customized data of the user, reduce the difficulty of customizing the rules and the difficulty of modifying the rules, and reduce the realization cost of text extraction and structuring.

Description

technical field [0001] Embodiments of the present invention relate to data parsing technology, and in particular to a method, system, device and storage medium for extracting structured data. Background technique [0002] There is a huge amount of unstructured text data in the Internet. Among them, unstructured text data refers to the data that is not convenient to be represented by the two-dimensional logic table of the database. This kind of data often contains a lot of information and knowledge, but due to poor difficult to extract. If it can be effectively sorted into structured data, typically an attribute pair structured data such as {attribute name, attribute value}, it will be very convenient for technicians in various fields to find and have great use value. [0003] For example, the unstructured text reads, "User A, the deputy chief judge of XX Court, XX District, XX City, XX City, serves as the presiding judge, and forms a collegial panel with Judge User B and Pe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 戴岱高原贾巍肖欣延吴甜
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products