A method and system for processing Uyghur language based on Latin alphabet

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A Uyghur language and processing method technology, applied in the fields of digital data processing, natural language data processing, neural learning methods, etc., can solve problems such as the inability to form Uyghur vocabulary features, accurate expression, and lack of effective data samples

Active Publication Date: 2020-12-22

北京一览群智数据科技有限责任公司

View PDF3 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] In view of the above problems, the embodiment of the present invention provides a Uyghur language processing method and system based on Latin alphabets, which solves the technical problem that the existing language model training lacks valid data samples and cannot form an accurate expression of Uyghur vocabulary features

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0041] In order to make the purpose, technical solution and advantages of the present invention clearer and clearer, the present invention will be further described below in conjunction with the accompanying drawings and specific embodiments. Apparently, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0042] One embodiment of the present invention is based on the Uighur language processing method of Latin letters such as figure 1 shown. exist figure 1 , this example includes:

[0043] Step 100: Establish an alphabetic index of the Uyghur corpus, form the basic vectors of the Uyghur corpus according to the alphabetic index, and use the basic vectors to form a Uyghur sentence training set.

[0044] The Uyghur corp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a Uyghur language processing method and system based on Latin letters, which solves the technical problem that the existing language model training lacks effective data samples and cannot form accurate expressions of Uyghur vocabulary features. The method comprises: establishing the letter index of the Uyghur corpus, forming the basic vector of the Uyghur corpus according to the letter index, using the basic vector to form a Uyghur sentence training set; training the recurrent neural network through the sentence training set to form Uyghur sentence model; according to the Uyghur sentence model, the semantic hidden feature vector of Uyghur vocabulary is obtained to form a word vector. It is conducive to the formation of a vector dimension space suitable for actual semantic processing tasks, and provides a good sample measurement basis for specific semantic processing tasks. It avoids the serious defect that the existing recurrent neural network structure lacks effective identification of Uyghur implicit correlations at the word level.

Description

technical field [0001] The invention relates to the technical field of natural language recognition, in particular to a method and system for processing Uighur language based on Latin letters. Background technique [0002] In the prior art, the semantic processing of human natural language is usually performed by training a language model, and a good language model can greatly improve the processing accuracy of natural language. When using the bytes-pair-encoding algorithm, there will be technical low word frequency missing in the formed corpus dictionary. The Word2Vec algorithm is used to generate a static word vector of a specified dimension for each word, which reflects the hidden features of each word through the richness of the dimension, but it is prone to OOV (Out-of-vocabulary) problems due to the influence of the lexicon capacity. This type of model facilitates the development of natural language semantic processing tasks, but the disadvantage is that it ignores wo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G06F40/30G06F40/289G06F40/284G06F40/129G06F16/31G06N3/04G06N3/08

CPCG06F16/316G06N3/08G06N3/044G06N3/045

Inventor钱泓锦黄真窦志成刘占亮

Owner北京一览群智数据科技有限责任公司

A method and system for processing Uyghur language based on Latin alphabet

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology