Unlock instant, AI-driven research and patent intelligence for your innovation.

A method and device for word segmentation based on word weight

A technology of word segmentation and weighting, which is applied in the fields of instruments, calculations, electrical digital data processing, etc., can solve problems such as word segmentation errors, and achieve the effect of improving the success rate and eliminating ambiguity

Active Publication Date: 2017-04-26
BEIJING QIHOO TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, in the way of using word frequency to represent weight, it will be divided into "Cui|Huawei", resulting in word segmentation errors

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and device for word segmentation based on word weight
  • A method and device for word segmentation based on word weight

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0057] refer to figure 1 , shows a flow chart of the steps of an embodiment of a method for word segmentation based on word weights according to an embodiment of the present invention, which may specifically include the following steps:

[0058] Step 101, segmenting the corpus according to one or more segmentation methods to obtain one or more word segmentations;

[0059] In a specific implementation, each segmentation method perf...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a word segmenting method and device based on word weights. The method comprises the step of segmenting linguistic data according to one segmenting mode or a plurality of segmenting modes to obtain one segmented word or a plurality of segmented words, carrying out statistics on the first word frequency of each segmented word in the linguistic data and the second word frequency for characters forming the segmented words to continuously appear in the linguistic data, calculating the weights of the segmented words in the linguistic data according to the first word frequency and the second word frequency and selecting the word segmenting results from one segmented word or a plurality of segmented words according to the weights. By means of the word frequency characteristics of the segmented words, the ambiguity of high-frequency words in the segmented words is relieved or eliminated, and the word segmenting success rate is improved.

Description

technical field [0001] The present invention relates to the technical field of word segmentation, in particular to a method for word segmentation based on word weight and a device for word segmentation based on word weight. Background technique [0002] With the rapid development of the Internet, network applications tend to be diversified, and the amount of information on the Internet has increased dramatically. [0003] In various occasions, users often need to input key information to obtain related information. For example, enter keywords in a search engine to search for web page information, enter keywords in a forum to search for posts, and so on. [0004] Word segmentation is the basis for information processing and information retrieval, and information processing and information retrieval are generally performed after word segmentation. [0005] When there is ambiguity in word segmentation, word segmentation and disambiguation generally use word graphs, that is, c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27
Inventor 陈进平
Owner BEIJING QIHOO TECH CO LTD