Multi-granularity semantic chunk based entity attribute and attribute value extracting method

A technology of entity attributes and extraction methods, which is applied in natural language data processing, other database retrieval, network data retrieval, etc. Incomplete semantics, etc.

Active Publication Date: 2017-05-31
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF3 Cites 64 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is to solve the problems of existing entity attribute and attribute value knowledge extraction methods, such as incomplete attribute value semantics, difficulty in extracting unspecified attribut

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-granularity semantic chunk based entity attribute and attribute value extracting method
  • Multi-granularity semantic chunk based entity attribute and attribute value extracting method
  • Multi-granularity semantic chunk based entity attribute and attribute value extracting method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0069] Step 1: Construct the attribute and attribute value extraction corpus of entities.

[0070] Using web crawlers based on Python, Selenium and PhantomJS technology to collect entry pages in Wikipedia, Baidu Encyclopedia and Interactive Encyclopedia, save them to the local computer, and construct corpus for entity attribute and attribute value extraction. Further, free text extraction is performed on the webpage, that is, the title and free text of the webpage are extracted, and information such as navigation and pictures in the webpage is removed. For example, for the entity Forbidden City, the entry pages of the entity in Wikipedia, Baidu Encyclopedia and Hudong Baike are collected and saved in the local computer.

[0071] Step 2, perform word segmentation, part-of-speech tagging and phrase recognition on the free text sentences in the attribute and attribute value extraction corpus.

[0072] Use the word segmentation and part-of-speech tagging tool of Harbin Institute ...

Embodiment 2

[0115] A multi-granularity semantic block-based entity attribute and attribute value extraction system based on the above method, such as figure 2 As shown, it includes corpus collection module, word segmentation and phrase recognition module, semantic role labeling module, dependency syntactic analysis module, semantic dependency analysis module, attribute knowledge extraction module based on semantic role granularity, attribute knowledge extraction module based on phrase granularity, word-based Granular attribute knowledge extraction module and attribute knowledge classification module; corpus collection module is connected with word segmentation and phrase recognition module, semantic role labeling module, dependency syntactic analysis module, semantic dependency analysis module; word segmentation and phrase recognition module, semantic role labeling module They are respectively connected to the attribute knowledge extraction module based on semantic role granularity; the w...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a multi-granularity semantic chunk based entity attribute and attribute value extracting method, and belongs to the technical field of Web mining and information extraction. The method comprises the following steps that a corpus set is constructed and free text extraction is performed; a corpus is subjected to word segmentation, part-of-speech tagging and phrase recognition; the corpus is subjected to semantic role labeling; the corpus is subjected to dependency grammar analysis; the corpus is subjected to semantic dependency analysis; candidate entities, attributes and attribute value triads based on three granularities of words, phrases and semantic roles are extracted; the candidate entities, attributes and attribute value triads are corrected and subjected to error classification by means of a trained classifier. Compared with the prior art, the entities, attributes and attribute value triads based on three granularities of words, phrases and semantic roles are automatically extracted from a free text, the entity attribute and attribute value extraction accuracy and efficiency are improved, and the wide application prospect is achieved in the fields of theme detection, information retrieval, automatic abstracting, question and answer systems and the like.

Description

technical field [0001] The invention belongs to the technical field of Web mining and information extraction, and relates to a method and system for extracting entity attributes and attribute values ​​based on multi-granularity semantic blocks. The invention has broad application prospects in the fields of information retrieval, topic detection, automatic question answering and the like. Background technique [0002] Knowledge extraction of entity attributes and attribute values ​​is an important research topic in the field of Web mining and information extraction. Entity attribute and attribute value knowledge extraction refers to the extraction of entity, attribute and attribute value triplets from text. [0003] Entity attribute and attribute value knowledge extraction methods include three categories: rule-based methods, statistics-based methods and hybrid methods. The rule-based method mainly extracts knowledge according to the organizational structure rules of the we...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/35G06F16/951G06F40/295
Inventor 张春霞彭飞郭钰王树良刘振岩
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products