Method for identifying Cambodian organization names

A technology of organization structure and recognition method, applied in the field of recognition of Cambodian organization name and recognition of Cambodian organization name based on Tri-training algorithm

Inactive Publication Date: 2017-05-31
KUNMING UNIV OF SCI & TECH
View PDF6 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The technical problem to be solved by the present invention is to provide a method for identifying the names of Khmer organizati

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for identifying Cambodian organization names
  • Method for identifying Cambodian organization names
  • Method for identifying Cambodian organization names

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] Embodiment 1: as Figure 1-3 As shown, a Cambodian organization name identification method, the specific steps are as follows:

[0037] Step1. Firstly, segment the extracted Cambodian texts. The segmented sentences are word-segmented and part-of-speech tagged. After manual proofreading, Cambodian named entities are marked, and a considerable corpus of Cambodian organization names is obtained;

[0038] Step2. Extract named entity descriptive words from the marked corpus, construct a descriptive thesaurus, construct a feature template, and obtain an organization name recognition model through improved Tri-training algorithm learning;

[0039] Step3. Train the selected test corpus through the organization name recognition model to obtain the labeling result of the organization name.

[0040] Further, the specific steps for obtaining the organization name annotation corpus in Step 1 are as follows:

[0041] Step1.1. First, use the crawler program to crawl out the web page...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for identifying Cambodian organization names and belongs to the technical field of natural language processing. According to the method, firstly, an extracted Cambodian text is segmented; word segmentation and part of speech tagging are performed on the segmented sentences; through manual checking, then Cambodian named entities are marked to obtain a considerable scale of Cambodian organization name corpus; named entity indicating words are extracted through the marked corpus to build an indicating word library and feature templates; through improved Tri-training algorithm learning, an organism name identification model is obtained; the selected test corpus is trained through the organism name identification model to obtain mark results of the organism names. By means of the method, Cambodian organization names can be effectively identified and the method provides support for works such as information extraction and machine translation; currently, there is no report of Cambodian organization name identification; the method of the invention has good effect.

Description

technical field [0001] The invention relates to a method for recognizing the name of an organization in Cambodian language, in particular to a method for identifying an organization name in Cambodian language based on a Tri-training algorithm, and belongs to the technical field of natural language processing. Background technique [0002] Cambodian, also known as Khmer, belongs to the Khmer branch of the Mon-Khmer language family of the Austronesian language family and is the official language of Cambodia today. Due to the increasingly frequent exchanges between my country and Cambodia in various fields, and at present, the lexical analysis of Cambodian texts is relatively scarce. Therefore, the research on Cambodian named entity recognition is very important for the political and economic analysis of Cambodia and the grasp of public opinion. Significance. The lexical analysis of Cambodian language, especially the named entity recognition of Cambodian language requires a lot...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/295G06F40/30
Inventor 严馨王若兰余正涛郭剑毅
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products