System and method for optimizing Chinese word segmentation

A Chinese word segmentation and word segmentation technology, applied in the field of data processing, can solve the problems that other characters cannot be effectively distinguished, the word segmentation results are not ideal, and effective word segmentation cannot be performed, and achieve the effect of improving accuracy, integrity and effectiveness.

Pending Publication Date: 2022-04-26
北京思特奇信息技术股份有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] IKAnalyzer is an open source, lightweight Chinese word segmentation language package developed based on the java language. It can only support simple word segmentation ambiguity processing and quantifier merge output. Faced with complex Chinese information, the actual word segmentation results are not ideal. Many phrases cannot be correctly identified as participle, resulting in ineffective application of the results
[0003] English letters, numbers, and other characters cannot be effectively distinguished; especially in the case of multi-character combinations such as Chinese and English phrases, effective word segmentation cannot be performed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and method for optimizing Chinese word segmentation
  • System and method for optimizing Chinese word segmentation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] Exemplary embodiments of the present invention will now be described with reference to the drawings; however, the present invention may be embodied in many different forms and are not limited to the embodiments described herein, which are provided for the purpose of exhaustively and completely disclosing the present invention. invention and fully convey the scope of the invention to those skilled in the art. The terms used in the exemplary embodiments shown in the drawings do not limit the present invention. In the figures, the same units / elements are given the same reference numerals.

[0041] Unless otherwise specified, the terms (including scientific and technical terms) used herein have the commonly understood meanings to those skilled in the art. In addition, it can be understood that terms defined by commonly used dictionaries should be understood to have consistent meanings in the context of their related fields, and should not be understood as idealized or over...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a system and a method for optimizing Chinese word segmentation, and belongs to the technical field of data processing. The system comprises an auxiliary dictionary module which establishes an auxiliary dictionary based on a core dictionary according to business requirements, identifies words and sentences to be identified in business through the auxiliary dictionary, and obtains an identification result; and the Chinese word segmentation optimization module is used for carrying out word segmentation on the recognition result. According to the method, the defects of the existing Chinese word segmentation tool in multiple aspects are overcome, the accuracy, the integrity and the effectiveness of Chinese word segmentation are improved through a Chinese word segmentation optimization algorithm rule, an auxiliary dictionary technology and linear dynamic configuration customization according to scenes, meanwhile, the configuration expansibility of related component tools is enhanced, and the efficiency of Chinese word segmentation is improved. And a safe and reliable general solution / method is provided for Chinese word segmentation operation.

Description

technical field [0001] The present invention relates to the technical field of data processing, and more specifically, to a system and method for optimizing Chinese word segmentation. Background technique [0002] IKAnalyzer is an open source, lightweight Chinese word segmentation language package developed based on the java language. It can only support simple word segmentation ambiguity processing and quantifier merge output. Faced with complex Chinese information, the actual word segmentation results are not ideal. Many phrases cannot be correctly identified for word segmentation, resulting in ineffective application of the results. [0003] English letters, numbers, and other characters cannot be effectively distinguished; especially in the case of multi-character type combinations such as Chinese and English phrases, effective word segmentation cannot be performed. [0004] The present invention is to shield the shortcomings of the prior art. According to the character...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/284G06F40/242G06F40/30
CPCG06F40/284G06F40/242G06F40/30
Inventor 石川
Owner 北京思特奇信息技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products