Method and system for updating Chinese word segmentation scene database

A Chinese word segmentation and update method technology, which is applied in the fields of instruments, computing, and electrical digital data processing. It can solve the problems of a single update method and cannot bring in scene information, and achieve the effect of avoiding word segmentation ambiguity and overcoming a single update method.

Active Publication Date: 2019-01-04
北京如布科技有限公司
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a method for updating the Chinese sub-scene database, which is used to solve the problem that the existing scene database cannot be updated with scene information and the update method is single

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for updating Chinese word segmentation scene database
  • Method and system for updating Chinese word segmentation scene database

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

[0047] As we all know, English is based on words, and words are separated by spaces, while Chinese is based on words, and all the words in a sentence can be connected to describe the complete meaning. Chinese word segmentation divides Chinese character sequences into word sequences, which is the basis of Chinese natural language processing. The word segmentation module is a module for performing Chinese word segmentation. Currently commonl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

One embodiment of the invention provides a Chinese word segmentation scene library updating method and a system. The method comprises following steps: selecting a segmented word from a correct word segmentation result of a sentence; establishing features of the segmented word based on the correct word segmentation result; calculating scores of the maximum entropy model of the features; comparing the maximum value of the scores of the maximum entropy model with a first preset threshold value; adding the segmented word to a subject dictionary corresponding to the maximum value if the maximum value is larger than the first preset threshold value. The embodiment of the invention takes scene information in and comprises multiple update modes.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a method and system for updating a Chinese word segmentation scene database. Background technique [0002] Chinese word segmentation is a major difficulty in Chinese analysis and computer processing. The Chinese word segmentation scene library is used to store correct Chinese word segmentation related to specific application scenarios, which is an important part of the Chinese word segmentation algorithm and directly affects the accuracy of word segmentation. Most of the existing Chinese word segmentation algorithms provide some interfaces, allowing users to update the scene database according to the application scenarios, so as to solve the problem that some special words cannot be correctly segmented in specific application scenarios. [0003] However, in existing word segmentation algorithms, the update of the scene database and the execution of word segmentation ar...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27
CPCG06F40/242G06F40/284
Inventor 柳艳红郭祥郭瑞
Owner 北京如布科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products