Supercharge Your Innovation With Domain-Expert AI Agents!

A neural network method for Cambodian entity recognition based on topic model word vector

A topic model and entity recognition technology, applied in the field of neural network Cambodian entity recognition, can solve the problems of low recognition accuracy, polysemy, homonym polysemy, etc., and achieve a high recognition accuracy effect

Inactive Publication Date: 2019-01-15
KUNMING UNIV OF SCI & TECH
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a neural network Khmer entity recognition method based on topic model word vectors, which is used to solve the problems of low recognition accuracy of Khmer named entities and polysemous and homonymous entities in Khmer entity recognition question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A neural network method for Cambodian entity recognition based on topic model word vector
  • A neural network method for Cambodian entity recognition based on topic model word vector
  • A neural network method for Cambodian entity recognition based on topic model word vector

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] Embodiment 1: as figure 1 As shown, a neural network Khmer entity recognition method based on the topic model word vector, first obtains the Khmer text corpus and preprocesses the corpus; then constructs a topic model for the preprocessed text; uses the constructed topic model to get The topic number of each word in the text, and treat this topic number as a pseudo-word; put the preprocessed text and the pseudo-word obtained above into the same corpus text, use the skip-gram model to process and get each word in the text at the same time The word vector and the topic vector corresponding to the word; the word vector and topic vector obtained in the above steps are concatenated to obtain the topic word vector; finally, the obtained topic word vector is input into the constructed deep learning model as an input feature In order to realize the entity recognition of Khmer language.

[0036] Further, the specific steps of the method are as follows:

[0037] Step1. First, u...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a neural network Cambodian entity recognition method based on a subject model word vector, belonging to the technical field of natural language processing. The invention firstly obtains the Cambodian text corpus and pretreats the corpus; and then constructs the theme model; the topic number of each word in the text is obtained by using the constructed topic model, and thetopic number is regarded as a pseudo word; putting the preprocessed text and the pseudo-words into the same corpus text, and use skip ram model to process word vector of each word and the subject vector corresponding to each word in the text are obtained by gram model processing. Cascading the word vector and the topic vector obtained in the step to obtain a subject term vector; finally, the obtained keyword vector is input as an input feature into the constructed depth learning model, and then the entity recognition of Cambodian is realized. The invention can better solve the problem of polysemy and homophone polysemy existing in the text, and has high recognition accuracy of Cambodian named entity.

Description

technical field [0001] The invention relates to a neural network Cambodian entity recognition method based on a topic model word vector, and belongs to the technical field of natural language processing. Background technique [0002] With the rapid development of the modern economy, exchanges and cooperation between my country and Southeast Asian countries have become more and more frequent, and exchanges and cooperation with Cambodia in economics, culture, education, etc. are also increasing year by year. In the context of the increasingly close development of China and Cambodia, it is particularly important to pay attention to and learn the cultural knowledge of Cambodia, but at the same time, the language barrier between the two countries has brought many difficulties to this task. Therefore, there is an increasing need to use natural language processing techniques to solve these difficulties. [0003] Cambodian language, also known as Khmer language, belongs to the Khme...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F16/951G06N20/00
CPCG06F40/295
Inventor 严馨谢俊徐广义张磊周枫郭剑毅
Owner KUNMING UNIV OF SCI & TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More