Unlock instant, AI-driven research and patent intelligence for your innovation.

Natural language model training method and system based on word granularity

A natural language and word granularity technology, applied in natural language data processing, neural learning methods, biological neural network models, etc., can solve the problem of few Chinese natural language models

Pending Publication Date: 2021-08-31
ZHIZHESIHAIBEIJINGTECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] At present, the common Chinese natural language models are often processed at the granularity of words, that is, each Chinese sentence will be split into Chinese characters for processing. At present, there are few Chinese natural language models at the granularity of words.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Natural language model training method and system based on word granularity
  • Natural language model training method and system based on word granularity
  • Natural language model training method and system based on word granularity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] Hereinafter, embodiments of the present disclosure will be described with reference to the drawings. It should be understood, however, that these descriptions are exemplary only, and are not intended to limit the scope of the present disclosure. Also, in the following description, descriptions of well-known structures and techniques are omitted to avoid unnecessarily obscuring the concept of the present disclosure.

[0035] The terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the present disclosure. The words "a", "an" and "the" used herein shall also include the meanings of "plurality" and "multiple", unless the context clearly indicates otherwise. In addition, the terms "comprising", "comprising", etc. used herein indicate the existence of stated features, steps, operations and / or components, but do not exclude the existence or addition of one or more other features, steps, operations or compo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a natural language model training method and system based on word granularity, and the method comprises the steps: carrying out word segmentation of a training corpus according to a dictionary, wherein the granularity of the segmented word is a word; according to the word frequency, sorting word segmentation results according to the size of the word frequency, and dividing the word segmentation results into three groups of words, namely high-frequency words, intermediate-frequency words and low-frequency words; embedding the three groups of words into vectors; encoding and decoding through a deformation layer to obtain a plurality of vectors with floating-point numbers; enabling the plurality of vectors with the floating-point number to pass through a self-adaptive linear layer, selecting the vector of which the probability exceeds a threshold value as output, and converting the vector into a predicted word; when a natural language model is trained, using Chinese sentences as sample data, and using predetermined prediction words as sample tags; training the natural language model by using an adaptive optimizer that compresses variables generated in the training using low rank decomposition of a matrix.

Description

technical field [0001] The present disclosure relates to the technical field of natural language processing, and in particular to a training method, system, electronic device and computer-readable storage medium of a natural language model based on word granularity. Background technique [0002] At present, the common Chinese natural language models are often processed at the granularity of words, that is, each Chinese sentence will be split into Chinese characters for processing. At present, there are few Chinese natural language models at the granularity of words. However, words play a very important role in Chinese. The meanings expressed by many words in Chinese are often not directly related to the characters, such as transliterated country names, transliterated product names, etc. From this point of view, the Chinese natural language model that exists purely in units of words can express semantics that words do not have. In addition, the language model of training wo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/216G06F40/126G06F40/242G06F40/284G06N3/04G06N3/08
CPCG06F40/216G06F40/284G06F40/126G06F40/242G06N3/08G06N3/047
Inventor 李子中刘奕志熊杰薛娇方宽
Owner ZHIZHESIHAIBEIJINGTECH CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More