Unlock instant, AI-driven research and patent intelligence for your innovation.

Word segmentation method and device, and device used for word segmentation

A word segmentation method and word segmentation technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as inaccurate translation results, affecting machine translation instructions, etc., and achieve the effect of improving accuracy

Active Publication Date: 2018-05-25
BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, none of the existing word segmentation methods can achieve 100% accuracy, that is, there will be inaccurate words in the word segmentation results obtained by the existing word segmentation methods, and the inaccurate words will affect the machine translation instructions.
Taking the source text "Have you seen the word on the left" as an example, the existing word segmentation method divides the source text into: "left", "de", "ci everyone", "du", "see" , "Le", "What", among them, "ci masters" is an inaccurate word, and the machine translation device will translate with "ci masters" as the granularity, and will get inaccurate translation results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word segmentation method and device, and device used for word segmentation
  • Word segmentation method and device, and device used for word segmentation
  • Word segmentation method and device, and device used for word segmentation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0076] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0077] The embodiment of the present invention provides a word segmentation scheme, which can obtain the target vocabulary that does not exist in the preset dictionary from the word segmentation sequence corresponding to the text to be segmented, and segment the target vocabulary according to the preset dictionary processing to obtain the corresponding segmentation results; since the word segmentation sequence corresponding to the text to be segmented is the preliminary word segmentation result obtained by performing word segmentation on the text to be segmented, there may be words that cannot be translated by the machine translation device in the above word segmentation sequence, and the embodiment of the present invention The abo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a word segmentation method and device, and a device used for word segmentation. The method specifically comprises the following steps that: obtaining a word segmentation sequence corresponding to a text to be subjected to word segmentation; obtaining a target vocabulary which is not in the presence in a preset dictionary from the word segmentation sequence,wherein the preset dictionary is used for storing vocabularies; and on the basis of the preset dictionary, carrying out word segmentation processing on the target vocabulary to obtain a correspondingsegmentation result. By use of the embodiment of the invention, the vocabularies which can not be translated by a machine translation device can be segmented, so that the accuracy of a word segmentation result can be improved, and therefore, the accuracy of a translation result is improved.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a word segmentation method and device, and a device for word segmentation. Background technique [0002] Word segmentation technology is an important basic technology in the field of natural language processing. The so-called word segmentation is to divide a sentence into individual words, which is the process of recombining consecutive sentences into word sequences according to certain specifications. Taking Chinese word segmentation technology as an example, the goal of word segmentation technology is to divide a sentence into individual Chinese words. Segmenting a sentence into individual words is the first step in realizing machine recognition of human language. Therefore, word segmentation technology is widely used in the application branches of natural language processing such as text-to-speech conversion, machine translation, speech recognition...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/28
CPCG06F40/284G06F40/247G06F40/40
Inventor 姜里羊王宇光陈伟程善伯
Owner BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD