A Chinese Entity Attribute Extraction Method

A technology of entity attributes and Chinese, applied in the field of information extraction, can solve the problems of time-consuming and laborious, and the number of manually labeled data sets is small, and achieve high accuracy

Active Publication Date: 2021-08-24
湖南四方天箭信息科技有限公司
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At this stage, the number of authoritative human-labeled datasets is small, and it is time-consuming and labor-intensive to build a manually-labeled dataset

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Chinese Entity Attribute Extraction Method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The technical solutions in the embodiments of the present invention will be described clearly and completely below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

[0036] The present invention provides a technical solution: a method for extracting Chinese entity attributes, comprising the following extraction steps:

[0037] Step 1: Extract the text of the entry page of Baidu Encyclopedia, and obtain information such as Encyclopedia information box, entry label, etc.; among them, it is assumed that the URL collection to be crawled Collection of URLs to be crawled and crawled Select the seed page ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for extracting Chinese entity attributes. The method includes extracting the text of Baidu Encyclopedia entry pages, using entry tags to filter the pages, using the data in the information boxes in the remaining pages to perform remote labeling to obtain training data, and analyzing the training data. Carry out word segmentation and generalization, convert the generalized training data into word vectors, and then get the classification results after passing through the classifier, and fill them into the attribute slots of the corresponding categories; this Chinese entity attribute extraction method does not require manual definition of features and Other additional resources benefit from the fact that the bidirectional LSTM model can use previous information and future information for feature learning from the positive and negative directions of the sentence, and has a higher accuracy rate in the entity attribute extraction task.

Description

technical field [0001] The invention relates to the technical field of information extraction, in particular to a Chinese entity attribute extraction method. Background technique [0002] With the rapid development of the Internet, the data obtained through the network has also increased exponentially. How to quickly and accurately analyze the really useful information from these massive data is particularly critical and urgent. And this is exactly the problem that the research field of information extraction is trying to solve. Entity attribute and relationship extraction is one of the information extraction tasks, and its purpose is to extract entity attributes and relationships between entities from unstructured text. This task is a deeper research based on named entity recognition, which can provide prerequisites for research in event extraction, automatic question answering, machine translation and natural language processing related fields. [0003] At present, there...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/284
CPCG06F16/355G06F40/284
Inventor 赫中翮王志超周忠诚
Owner 湖南四方天箭信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products