Method and device for extracting organization names based on semantic information

A technology of semantic information and organization name, applied in the creation of semantic tools, unstructured text data retrieval, special data processing applications, etc., can solve problems such as poor generalization ability and poor effect

Inactive Publication Date: 2016-12-21
INSPUR QILU SOFTWARE IND
View PDF4 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

It has a good effect on the evaluation training corpus, but the effect is poor in the real environment.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extracting organization names based on semantic information
  • Method and device for extracting organization names based on semantic information
  • Method and device for extracting organization names based on semantic information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0063] A method for extracting organization names based on semantic information, the method steps are as follows:

[0064] The first step is to automatically extract the name of the organization from Wikipedia, construct the abbreviation dictionary, and use the abbreviation dictionary to form the characteristics of the abbreviation of the organization name;

[0065] The second step is to combine traditional word segmentation, part-of-speech tagging and dependency tree features from the training data to form the final features;

[0066] The third step is to perform preprocessing such as text extraction and word segmentation from the Wikipedia document, use the CW clustering method to cluster words, and use the category features of words as semantic features;

[0067] Use the word clustering algorithm CW to process a large amount of corpus, and automatically get the category of words.

[0068] The fourth step, when training based on CRF, extract the abbreviation feature of the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and device for extracting organization names based on semantic information. The device comprises an abbreviation dictionary construction module, a word clustering module, a CRF training module and a CRF recognition module. According to the method and device for extracting organization names based on semantic information of the invention, compared with the prior art, a device for extracting organization names based on semantic information is provided, and a method for automatically establishing organization name dictionaries by means of the Wikipedia is provided; a cluster algorithm based on graphs is used for clustering words and class characteristics of words are used as semantic characteristics; the graph clustering algorithm CW is improved and the concussion problem is solved; test corpora containing large quantity of unregistered organization names are established which is more persuasive. Compared with present best open source tools, F1 value of the device of the invention is increased by about 8%.

Description

technical field [0001] The invention relates to the field of organization name identification, in particular to an organization name extraction method and device based on semantic information. Background technique [0002] Named entity recognition and its relationship extraction are the process of extracting entity-related knowledge from text. It is an important task of information extraction and the basis of many natural language processing fields. It has important research significance and application value. [0003] The initial method of named entity recognition is a rule-based method, which uses lexical rules, grammatical rules and even semantic rules to identify named entities. Rules are generally manually written by domain experts, or new rules are learned from training corpus on the basis of manual writing. The process of named entity recognition is the process of rule matching. The rule-based method is easy to implement and has a high accuracy rate, but the recall ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/355G06F16/3325G06F16/36
Inventor 毛立花唐旋崔乐乐
Owner INSPUR QILU SOFTWARE IND
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products