Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Word segmentation model building method and apparatus

A model and word segmentation technology, applied in the field of machine translation, can solve the negative effect of wrong word segmentation, ignore the accuracy of Chinese word segmentation and other problems, and achieve the effect of improving the accuracy of word segmentation

Active Publication Date: 2017-02-15
新译信息科技(深圳)有限公司
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these methods usually ignore the accuracy of Chinese word segmentation itself, and also suffer from the negative effects of wrong alignment on word segmentation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word segmentation model building method and apparatus
  • Word segmentation model building method and apparatus
  • Word segmentation model building method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0020] The terms "comprising" and "having" and any variations thereof in the description and claims of the present invention are intended to cover a non-exclusive inclusion, for example, a process comprising a series of steps or a device of structure need not be limited to the expressly listed Instead, those structures or steps may include other steps or structures not expressly listed or inherent to the process or device.

[0021] figure 1 It is a schematic ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Embodiments of the invention provide a word segmentation model building method and apparatus. The method comprises the steps of performing alignment on characters in a first corpus and words in a second corpus to obtain an alignment relationship between the first corpus and the second corpus, wherein the first corpus is a corpus without a space division boundary between words; determining boundary information of the words in the first corpus according to the alignment relationship between the first corpus and the second corpus; and performing training according to the boundary information of the words in the first corpus to generate a word segmentation model. According to the word segmentation model building method and apparatus provided by the embodiments of the invention, the word segmentation accuracy, especially the word segmentation accuracy of the corpus without the space division boundary between the words can be improved.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of machine translation, in particular to a method and device for establishing a word segmentation model. Background technique [0002] Today's natural language processing community generally believes that since there is no space between words in Chinese to distinguish boundaries, a higher quality word segmentation is the key to Chinese language processing. A number of experiments have proved that the accuracy of Chinese word segmentation will directly affect the effect of statistical machine translation. The mainstream statistical machine model is also based on the parallel corpus after segmentation, which shows that any training sentence will be segmented. For Chinese, the biggest obstacle is that the training corpus used comes from the marked syntax tree. Obviously, these word segmentation standards only take into account the characteristics of a single language, and do not conf...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/28G06F17/27
CPCG06F40/284G06F40/58
Inventor 田亮
Owner 新译信息科技(深圳)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products