Unlock instant, AI-driven research and patent intelligence for your innovation.

A method for establishing a multi-granularity dictionary, a word segmentation method and a device thereof

A multi-granularity, dictionary technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems that affect the effect of the application, dictionaries are difficult to ensure granularity consistency, ambiguity, etc.

Active Publication Date: 2016-06-29
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the past, according to the needs of different applications, dictionaries at each granularity could be manually sorted out as the basis for word segmentation. However, it is difficult to ensure the consistency of granularity in manually obtained dictionaries, which affects the effect of specific applications.
[0003] On the other hand, there are still some ambiguities in the process of word segmentation
Ambiguity refers to the situation that there are multiple segmentation options in the word segmentation process. For example, "Xinhua Medical Devices" can be segmented into either "Xinhua Medical / Device" or "Xinhua / Medical Devices". If there is ambiguity in , it is difficult for the single-grained dictionary in the prior art to provide a basis for disambiguation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for establishing a multi-granularity dictionary, a word segmentation method and a device thereof
  • A method for establishing a multi-granularity dictionary, a word segmentation method and a device thereof
  • A method for establishing a multi-granularity dictionary, a word segmentation method and a device thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

[0036] The process of establishing a multi-granularity dictionary in the present invention is actually a process of organizing the collected original vocabulary into a compound dictionary with multiple levels. The entry structure of the compound dictionary is shown in the following table:

[0037] Table 1

[0038]

[0039] In the following, each part of the above-mentioned entry structure will be introduced correspondingly through the description of the process of establishing the above-mentioned dictionary. Please refer to figure 1 , figure 1 It is a schematic flow chart of the method for establishing a multi-granularity dictionary in the present invention. Such as figure 1 As shown, the process of building a multi-granularity dictionary mainly inc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for setting a multi-granularity dictionary and segmenting words and a device thereof, wherein the method for setting the multi-granularity dictionary comprises the following steps: A. collecting an original word list; B. identifying basic words and word group words to form a basic word list and a word group word list respectively; C. confirming subordinate words and sub-word group words respectively corresponding to the word group words to take the subordinate words and the sub-word group words respectively corresponding to the word group words as interior components relevant to the word group words; D. saving the basic words and the word group words as dictionary vocabulary entries, and saving the interior components relevant to the word group words as the interior components of the corresponding dictionary vocabulary entries to gain the multi-granularity dictionary. Through the adoption of the manner, a unified segmentation dictionary can be set, and the support can be provided to various applications.

Description

【Technical Field】 [0001] The invention relates to natural language processing technology, in particular to a method for building a multi-granularity dictionary, a method for word segmentation and a device thereof. 【Background technique】 [0002] Word segmentation is very important in natural language processing related applications, and the result of word segmentation will directly affect the effect of specific applications. Different applications have different requirements on the granularity of word segmentation. For example, in the application of machine translation, in order to make the translation result accurate, it is best to segment words with a large granularity, so that proper nouns such as person names, place names, and organization names can be identified and improved translation Accuracy, and for speech recognition applications, small-grained word segmentation can meet the demand. In addition, for search engines, building an index library with small-grained words ca...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 何径舟王丽杰
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD