Word segmentation model building method and apparatus

A model and word segmentation technology, applied in the field of machine translation, can solve the negative effect of wrong word segmentation, ignore the accuracy of Chinese word segmentation and other problems, and achieve the effect of improving the accuracy of word segmentation

Active Publication Date: 2017-02-15
新译信息科技(深圳)有限公司
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these methods usually ignore the accuracy of Chinese word segmentation itse

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word segmentation model building method and apparatus
  • Word segmentation model building method and apparatus
  • Word segmentation model building method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0019] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

[0020] The terms "including" and "having" and any variations of them in the specification and claims of the present invention are intended to cover non-exclusive inclusions. For example, a device that includes a series of steps or a structure is not necessarily limited to clearly listed Instead, those structures or steps may include other steps or structures that are not clearly listed or are inherent to these processes or devices.

[0021...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Embodiments of the invention provide a word segmentation model building method and apparatus. The method comprises the steps of performing alignment on characters in a first corpus and words in a second corpus to obtain an alignment relationship between the first corpus and the second corpus, wherein the first corpus is a corpus without a space division boundary between words; determining boundary information of the words in the first corpus according to the alignment relationship between the first corpus and the second corpus; and performing training according to the boundary information of the words in the first corpus to generate a word segmentation model. According to the word segmentation model building method and apparatus provided by the embodiments of the invention, the word segmentation accuracy, especially the word segmentation accuracy of the corpus without the space division boundary between the words can be improved.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of machine translation, in particular to a method and device for establishing a word segmentation model. Background technique [0002] Today's natural language processing community generally believes that since there is no space between words in Chinese to distinguish boundaries, a higher quality word segmentation is the key to Chinese language processing. A number of experiments have proved that the accuracy of Chinese word segmentation will directly affect the effect of statistical machine translation. The mainstream statistical machine model is also based on the parallel corpus after segmentation, which shows that any training sentence will be segmented. For Chinese, the biggest obstacle is that the training corpus used comes from the marked syntax tree. Obviously, these word segmentation standards only take into account the characteristics of a single language, and do not conf...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/28G06F17/27
CPCG06F40/284G06F40/58
Inventor 田亮
Owner 新译信息科技(深圳)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products