Chinese word segmentation method and system

A Chinese word segmentation and Chinese technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as difficulty in improving word segmentation accuracy, high computational complexity, and reduced segmentation efficiency, so as to improve word segmentation efficiency and word segmentation The effect of accuracy

Active Publication Date: 2011-01-19
BEIJING FEINNO COMM TECH
View PDF4 Cites 43 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the disadvantage of the statistical method is that the computational complexity is too high, resulting in a decrease in segmentation efficiency. In addition, due to the limited training corpus, it is difficult to improve word segmentation accuracy.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese word segmentation method and system
  • Chinese word segmentation method and system
  • Chinese word segmentation method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

[0058] Such as figure 1 As shown, it is a flow chart of a Chinese word segmentation method according to an embodiment of the present invention, including:

[0059] Step S101, segment the Chinese text according to the semantics of the words, cut out the ambiguous fields, and output the first text string with word as the unit. The main purpose of this step is to match words from a dictionary containing a large number of examples and effectively cut ambiguous fields. The dictionary is loaded with a large number of names of people, places, organizations, and pseudo-ambiguous fields, which can improve the segmentation speed and the correct rate and recall rate of names and pl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese word segmentation method, which comprises the following steps of: performing word segmentation on a Chinese text according to word semantics, segmenting ambiguous fields and outputting a first text string taking words as units; and identifying and combining Chinese names in the first text string to generate a second text string taking words as units. The ambiguous fields are segmented by combining a dictionary rule method with a statistical method; and the ambiguous fields are segmented and the names are identified by word standard a maximum entropy model in the statistical method. The invention also discloses a Chinese word segmentation system, which comprises a word segmentation module, a name identification module and the like. The method and the system improve word segmentation efficiency and accuracy.

Description

technical field [0001] The invention relates to the technical field of natural language processing (NLP), in particular to a Chinese word segmentation method and system. Background technique [0002] In recent years, with the increasing popularity of the Internet, the scale of text on the Internet has gradually expanded, and information resources have continued to increase. In order to retrieve and mine valuable information from a large number of resources, Internet companies are vigorously developing technologies in the field of natural language processing. Chinese word segmentation is the basis and premise of natural language processing technology. [0003] In the current field of natural language processing, Chinese word segmentation technology is mainly divided into two types: rule-based word segmentation methods and statistics-based word segmentation methods. [0004] In the rule-based word segmentation method, the advantage of dictionary matching word segmentation is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 牟小峰杨正
Owner BEIJING FEINNO COMM TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products