Unlock instant, AI-driven research and patent intelligence for your innovation.

Word segmentation method and device, device for word segmentation

A word segmentation method and word segmentation technology, which are applied in the fields of instruments, computing, and electrical digital data processing, can solve the problems of inaccurate translation results and affect machine translation instructions, and achieve the effect of improving the accuracy rate.

Active Publication Date: 2022-01-18
BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, none of the existing word segmentation methods can achieve 100% accuracy, that is, there will be inaccurate words in the word segmentation results obtained by the existing word segmentation methods, and the inaccurate words will affect the machine translation instructions.
Taking the source text "Have you seen the word on the left" as an example, the existing word segmentation method divides the source text into: "left", "de", "ci everyone", "du", "see" , "Le", "What", among them, "ci masters" is an inaccurate word, and the machine translation device will translate with "ci masters" as the granularity, and will get inaccurate translation results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word segmentation method and device, device for word segmentation
  • Word segmentation method and device, device for word segmentation
  • Word segmentation method and device, device for word segmentation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0076] For the above objects, features and advantages of the invention more comprehensible, the present invention is further the following detailed description in conjunction with the accompanying drawings and specific embodiments.

[0077] Example embodiments provide an embodiment of the present invention, word, word text word sequence corresponding to the program can be acquired from sub preset target word does not exist in the dictionary, and slicing the target word dictionary according to the preset treatment, to obtain a segmentation corresponding to the result; as word sequence of words corresponding to the text of the initial segmentation result word segmentation obtained is treated word text to be divided, the above-described sub-word sequence may be present in the vocabulary machine translation means can not translate, while the embodiments of the present invention the slicing process can function word sequence target vocabulary secondary segmentation effect, i.e. the seg...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the present invention provides a word segmentation method and device, and a device for word segmentation, wherein the method specifically includes: obtaining the word segmentation sequence corresponding to the text to be segmented; obtaining the word segmentation sequence that does not exist in the preset dictionary from the word segmentation sequence The target vocabulary in; the preset dictionary is used to store vocabulary; and the target vocabulary is segmented according to the preset dictionary to obtain corresponding segmentation results. The embodiment of the present invention can cut words that cannot be translated by the machine translation device, so the accuracy of word segmentation results can be improved, and thus the accuracy of translation results can be improved.

Description

Technical field [0001] The present invention relates to the technical field of natural language processing, more particularly to a method and apparatus for word, and means for the word. Background technique [0002] Segmentation technology to the field of natural language processing is an important basic technology. The so-called word, a sentence is a cut into a single word, a sentence is continuous according to certain specifications recombined into a word sequence process. To the Chinese word for example, the word technology goal is to cut into a sentence in a separate Chinese words. The sentence will be cut into separate words, it is the first step towards machine recognition of human language, so the word technology is widely used in text to speech, machine translation, natural language processing application branch of speech recognition, text summarization, text retrieval middle. [0003] Machine translation technology is the use of the computer in a natural language (source...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/284G06F40/242G06F40/40
CPCG06F40/284G06F40/247G06F40/40
Inventor 姜里羊王宇光陈伟程善伯
Owner BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD