Uygur language processing method and system based on Latin letters

A Uyghur language and processing method technology, which is applied in electronic digital data processing, natural language data processing, neural learning methods, etc., and can solve the problems of lack of effective data samples, inability to form Uyghur lexical features, and accurate expressions.

Active Publication Date: 2020-07-17
北京一览群智数据科技有限责任公司
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of the above problems, the embodiment of the present invention provides a Uyghur language processing method and system based on Latin alphabets, which solves the technical problem that the existing language model training lacks valid data samples and cannot form an accurate expression of Uyghur vocabulary features

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Uygur language processing method and system based on Latin letters
  • Uygur language processing method and system based on Latin letters
  • Uygur language processing method and system based on Latin letters

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] In order to make the purpose, technical solution and advantages of the present invention clearer and clearer, the present invention will be further described below in conjunction with the accompanying drawings and specific embodiments. Apparently, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0042] One embodiment of the present invention is based on the Uighur language processing method of Latin letters such as figure 1 shown. exist figure 1 , this example includes:

[0043] Step 100: Establish an alphabetic index of the Uyghur corpus, form the basic vectors of the Uyghur corpus according to the alphabetic index, and use the basic vectors to form a Uyghur sentence training set.

[0044] The Uyghur corp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a Uygur language processing method and system based on Latin letters, and solves the technical problem that existing language model training lacks effective data samples and cannot form accurate expression of Uygur vocabulary features. The method comprises the following steps: establishing a letter index of Uygur language materials, forming a basic vector of the Uygur language materials according to the letter index, and forming a statement training set of Uygur language by utilizing the basic vector; training a recurrent neural network through the statement training setto form a Uygur language sentence model; and obtaining semantic recessive feature vectors of Uygur vocabularies according to the Uygur sentence model to form word vectors. The vector dimension spaceadaptive to the actual semantic processing task can be formed, and a good sample measurement basis is provided for the specific semantic processing task. The serious defect that an existing recurrentneural network structure lacks effective recognition for word-level Uygur implicit correlation is avoided.

Description

technical field [0001] The invention relates to the technical field of natural language recognition, in particular to a method and system for processing Uighur language based on Latin letters. Background technique [0002] In the prior art, the semantic processing of human natural language is usually performed by training a language model, and a good language model can greatly improve the processing accuracy of natural language. When using the bytes-pair-encoding algorithm, there will be technical low word frequency missing in the formed corpus dictionary. The Word2Vec algorithm is used to generate a static word vector of a specified dimension for each word, which reflects the hidden features of each word through the richness of the dimension, but it is prone to OOV (Out-of-vocabulary) problems due to the influence of the lexicon capacity. This type of model facilitates the development of natural language semantic processing tasks, but the disadvantage is that it ignores wo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/30G06F40/289G06F40/284G06F40/129G06F16/31G06N3/04G06N3/08
CPCG06F16/316G06N3/08G06N3/044G06N3/045
Inventor 钱泓锦黄真窦志成刘占亮
Owner 北京一览群智数据科技有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products