Bootstrap Chinese entity extracting method based on modes

An entity extraction and entity technology, which is applied in natural language data processing, special data processing applications, instruments, etc., can solve problems such as inability to score, features that do not fully consider the characteristics of Chinese word segmentation, etc., to achieve improvement effects, good transferability, and reliability Strong transplant effect

Active Publication Date: 2017-02-22
THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP
View PDF4 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the bootstrap Chinese entity extraction method in the prior art cannot use the internal model of the entity for scoring, and the features extracted when scoring the unlabeled entity do not fully consider the characteristics of Chinese word segmentation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Bootstrap Chinese entity extracting method based on modes
  • Bootstrap Chinese entity extracting method based on modes
  • Bootstrap Chinese entity extracting method based on modes

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] The technical solution of the present invention will be further introduced below in conjunction with the accompanying drawings and specific implementation methods.

[0052] The invention discloses a pattern-based bootstrap Chinese entity extraction method, which performs entity recognition and rule base construction for each entity type, including the following steps:

[0053] S1: The user gives the following input: a. Forward seed entity and reverse seed entity; b. Internal constraints, internal models and confidence levels of forward seed entity and reverse seed entity; c. Forward seed entity and reverse seed entity The external constraints to the seed entities, that is, the context information of the forward seed entities and the reverse seed entities; d. The original unlabeled text; in the above four types of input information, a, d cannot be empty, b, c can Is empty;

[0054] S2: Perform domain-independent word segmentation, part-of-speech tagging, syntax analysis...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a bootstrap Chinese entity extracting method based on modes. Starting from a small number of seed entities, inner modes of the entities and outer modes of the entities, more entities and modes are learnt from a linguistic data in an iterative mode. The bootstrap Chinese entity extracting method based on modes is a method of combining statistics and modes and has the advantages that the method does not need to depend on a large number of manually annotated linguistic data or field mode base. Compared with a current mode bootstrap method, the bootstrap Chinese entity extracting method based on modes uses the inner modes and characteristics of the entities to conduct a grade assessment on candidate modes and entities which can not be marked accurately based on observing entity type modes in specific fields so as to improve precise degrees of modes and grades of entities, and thus the method is applicable to entity extracting and knowledge base establishing in specific fields.

Description

technical field [0001] The invention relates to Chinese natural language processing technology, in particular to a pattern-based bootstrap Chinese entity extraction method. Background technique [0002] Named entity recognition (also known as entity extraction) is a basic task of natural language processing, which is widely used in information extraction, question answering, machine translation and other applications. It was first proposed in the sixth MUC conference held in 1996. Initially, its purpose was to identify named entities such as person names, place names, and organization names in the corpus. With the expansion of the application field, the definition and expansion of entity categories brought great challenges. The main technical methods of named entity recognition are divided into: pattern-based methods, statistical-based methods, and a combination of the two. Statistics-based methods have been widely studied in academia, and are usually used for domain-indepe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/247G06F40/295G06F40/30
Inventor 姜晓夏葛唯益杨岩贺成龙宗士强徐琳王羽
Owner THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products