Chinese character word distinguishing method and system

A technology of Chinese word segmentation and word segmentation method, which is applied in special data processing applications, instruments, electrical digital data processing, etc. It can solve the problems that the accuracy of word segmentation cannot be further improved, and new words cannot be recognized in dictionaries, so as to achieve the effect of improving accuracy.

Active Publication Date: 2007-11-28
SHENZHEN SHI JI GUANG SU INFORMATION TECH
View PDF0 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to provide a Chinese word segmentation system, which aims to solve the problem that the existing technology

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese character word distinguishing method and system
  • Chinese character word distinguishing method and system
  • Chinese character word distinguishing method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0038] The present invention first uses the traditional word segmentation method to carry out the first-level word segmentation of the input text, then extracts the continuous word sequence from the first-level word segmentation result, and calculates the word-forming probability of each word, and utilizes the word-forming probability to carry out the continuous word-sequence The second level of word segmentation, so as to identify those new words that are not included in the dictionary and have no regularity, and improve the accuracy of Chinese word segmentation.

[0039] Fig. 1 shows the structur...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese classified vocabulary method and system in computer and Chinese information treating domain, which comprise the following steps: A, proceeding first grade classified vocabulary method with traditional classified vocabulary method for input text; B, extracting continuous single sequence from the result of the first grade classified vocabulary; proceeding second grade classified vocabulary with the building vocabulary probability; C, utilizing the distinguish new vocabulary of the second grade classified vocabulary; updating classified vocabulary result; outgoing. This invention can identify new words without regularity from dictionary, which can increase accuracy of Chinese classified word.

Description

technical field [0001] The invention relates to the field of computer and Chinese information processing, and more specifically, to a Chinese word segmentation method and system. Background technique [0002] Chinese information processing technology has been widely used in computer networks, database technology, software engineering and other computer fields, and Chinese automatic word segmentation is an important basic work for Chinese information processing, and word segmentation is involved in many Chinese information processing projects Problems, such as machine translation, automatic summarization, automatic classification, full-text retrieval of Chinese literature databases, search engines, etc. Since the Chinese text is consecutive and there is no space between words, the first problem encountered in Chinese text processing is the problem of word segmentation. The correct segmentation of words is a necessary condition for Chinese text processing. [0003] Chinese wo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/28
Inventor 张会鹏
Owner SHENZHEN SHI JI GUANG SU INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products