A method and system for processing Uyghur language based on Latin alphabet

A Uyghur language and processing method technology, applied in the fields of digital data processing, natural language data processing, neural learning methods, etc., can solve problems such as the inability to form Uyghur vocabulary features, accurate expression, and lack of effective data samples

Active Publication Date: 2020-12-22
北京一览群智数据科技有限责任公司
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of the above problems, the embodiment of the present invention provides a Uyghur language processing method and system based on Latin alphabets, which solves the technical problem that the existing language model training lacks valid data samples and cannot form an accurate expression of Uyghur vocabulary features

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for processing Uyghur language based on Latin alphabet
  • A method and system for processing Uyghur language based on Latin alphabet
  • A method and system for processing Uyghur language based on Latin alphabet

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] In order to make the purpose, technical solution and advantages of the present invention clearer and clearer, the present invention will be further described below in conjunction with the accompanying drawings and specific embodiments. Apparently, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0042] One embodiment of the present invention is based on the Uighur language processing method of Latin letters such as figure 1 shown. exist figure 1 , this example includes:

[0043] Step 100: Establish an alphabetic index of the Uyghur corpus, form the basic vectors of the Uyghur corpus according to the alphabetic index, and use the basic vectors to form a Uyghur sentence training set.

[0044] The Uyghur corp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a Uyghur language processing method and system based on Latin letters, which solves the technical problem that the existing language model training lacks effective data samples and cannot form accurate expressions of Uyghur vocabulary features. The method comprises: establishing the letter index of the Uyghur corpus, forming the basic vector of the Uyghur corpus according to the letter index, using the basic vector to form a Uyghur sentence training set; training the recurrent neural network through the sentence training set to form Uyghur sentence model; according to the Uyghur sentence model, the semantic hidden feature vector of Uyghur vocabulary is obtained to form a word vector. It is conducive to the formation of a vector dimension space suitable for actual semantic processing tasks, and provides a good sample measurement basis for specific semantic processing tasks. It avoids the serious defect that the existing recurrent neural network structure lacks effective identification of Uyghur implicit correlations at the word level.

Description

technical field [0001] The invention relates to the technical field of natural language recognition, in particular to a method and system for processing Uighur language based on Latin letters. Background technique [0002] In the prior art, the semantic processing of human natural language is usually performed by training a language model, and a good language model can greatly improve the processing accuracy of natural language. When using the bytes-pair-encoding algorithm, there will be technical low word frequency missing in the formed corpus dictionary. The Word2Vec algorithm is used to generate a static word vector of a specified dimension for each word, which reflects the hidden features of each word through the richness of the dimension, but it is prone to OOV (Out-of-vocabulary) problems due to the influence of the lexicon capacity. This type of model facilitates the development of natural language semantic processing tasks, but the disadvantage is that it ignores wo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/30G06F40/289G06F40/284G06F40/129G06F16/31G06N3/04G06N3/08
CPCG06F16/316G06N3/08G06N3/044G06N3/045
Inventor 钱泓锦黄真窦志成刘占亮
Owner 北京一览群智数据科技有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products