Uygur language processing method and system based on Latin letters

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A Uyghur language and processing method technology, which is applied in electronic digital data processing, natural language data processing, neural learning methods, etc., and can solve the problems of lack of effective data samples, inability to form Uyghur lexical features, and accurate expressions.

Active Publication Date: 2020-07-17

北京一览群智数据科技有限责任公司

View PDF3 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] In view of the above problems, the embodiment of the present invention provides a Uyghur language processing method and system based on Latin alphabets, which solves the technical problem that the existing language model training lacks valid data samples and cannot form an accurate expression of Uyghur vocabulary features

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0041] In order to make the purpose, technical solution and advantages of the present invention clearer and clearer, the present invention will be further described below in conjunction with the accompanying drawings and specific embodiments. Apparently, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0042] One embodiment of the present invention is based on the Uighur language processing method of Latin letters such as figure 1 shown. exist figure 1 , this example includes:

[0043] Step 100: Establish an alphabetic index of the Uyghur corpus, form the basic vectors of the Uyghur corpus according to the alphabetic index, and use the basic vectors to form a Uyghur sentence training set.

[0044] The Uyghur corp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a Uygur language processing method and system based on Latin letters, and solves the technical problem that existing language model training lacks effective data samples and cannot form accurate expression of Uygur vocabulary features. The method comprises the following steps: establishing a letter index of Uygur language materials, forming a basic vector of the Uygur language materials according to the letter index, and forming a statement training set of Uygur language by utilizing the basic vector; training a recurrent neural network through the statement training setto form a Uygur language sentence model; and obtaining semantic recessive feature vectors of Uygur vocabularies according to the Uygur sentence model to form word vectors. The vector dimension spaceadaptive to the actual semantic processing task can be formed, and a good sample measurement basis is provided for the specific semantic processing task. The serious defect that an existing recurrentneural network structure lacks effective recognition for word-level Uygur implicit correlation is avoided.

Description

technical field [0001] The invention relates to the technical field of natural language recognition, in particular to a method and system for processing Uighur language based on Latin letters. Background technique [0002] In the prior art, the semantic processing of human natural language is usually performed by training a language model, and a good language model can greatly improve the processing accuracy of natural language. When using the bytes-pair-encoding algorithm, there will be technical low word frequency missing in the formed corpus dictionary. The Word2Vec algorithm is used to generate a static word vector of a specified dimension for each word, which reflects the hidden features of each word through the richness of the dimension, but it is prone to OOV (Out-of-vocabulary) problems due to the influence of the lexicon capacity. This type of model facilitates the development of natural language semantic processing tasks, but the disadvantage is that it ignores wo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F40/30G06F40/289G06F40/284G06F40/129G06F16/31G06N3/04G06N3/08

CPCG06F16/316G06N3/08G06N3/044G06N3/045

Inventor钱泓锦黄真窦志成刘占亮

Owner北京一览群智数据科技有限责任公司

Uygur language processing method and system based on Latin letters

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology