Chinese word segmentation method and device and search lexicon reading method

A Chinese word segmentation and word segmentation technology, applied in the computer field, can solve problems that affect the accuracy of word segmentation results, wrong word segmentation, etc., and achieve the effects of accurate word segmentation, improved recognition rate, and high word segmentation efficiency

Pending Publication Date: 2021-07-13
深圳市华南城数字科技有限公司
View PDF3 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] It can be seen that for the forward maximum matching algorithm idea and the reverse maximum matching algorithm idea, it is necessary to increase or decrease the process of adding or subtracting one word until the remaining word is left. In this process, if a fixed word is encountered in the middle When , there may be mis-segmentation, which affects the accuracy of word segmentation results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese word segmentation method and device and search lexicon reading method
  • Chinese word segmentation method and device and search lexicon reading method
  • Chinese word segmentation method and device and search lexicon reading method

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment II

[0079] Specific embodiment II: the following details how the forward maximum matching method, the reverse maximum matching method and the two-way maximum matching method are specifically carried out:

[0080] Let me first talk about what is the maximum matching method: the maximum matching method is based on the dictionary, taking the longest word in the dictionary as the first scan string of the number of words, and scanning in the dictionary. For example: the longest word in the dictionary is "People's Republic of China" with 7 Chinese characters, and the maximum number of matching initial words is 7 Chinese characters. Then decrement word by word and look it up in the corresponding dictionary.

[0081] The following is a detailed description of these matching methods with "We are playing in the safari park":

[0082] 1. Forward maximum matching method:

[0083] Forward means taking words from front to back, from 7->1, subtracting one word at a time, until the dictionary h...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

In order to overcome the defects in the prior art, the invention provides a Chinese word segmentation method and device and a search lexicon reading method. The method comprises the steps of performing word segmentation on sentences to be subjected to word segmentation according to the input maximum word length, and obtaining a first-time word segmentation result; gradually reducing the length of the maximum word length, and performing word segmentation on the sentence to be subjected to word segmentation when the maximum word length changes each time to obtain an Nth word segmentation result; and comparing the first word segmentation result to the Nth word segmentation result with a word bank to obtain an output list. According to the invention, the to-be-segmented sentences can be accurately segmented, and particularly, the recognition rate of fixed words in the middle of the to-be-segmented sentences can be improved. The method has the advantages of being high in word segmentation efficiency and accurate in word segmentation result.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a Chinese word segmentation method and device, a system, an electronic device, a storage medium and a method for reading a search lexicon. Background technique [0002] Word segmentation technology belongs to the category of natural language understanding technology and is the first link in semantic understanding. It is a technology that can correctly segment words in a sentence. It is the foundation of text classification, information retrieval, machine translation, automatic indexing, speech input and output of text and so on. However, due to the complexity of Chinese itself and its writing habits, Chinese word segmentation technology has become a difficult point in word segmentation technology. [0003] Basic Algorithm of Chinese Word Segmentation In recent years, people have done some research on Chinese word segmentation technology, and put forward a variety of Chinese wo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/284G06F16/33G06F16/338
CPCG06F40/284G06F16/334G06F16/338
Inventor 叶群莉魏文华李彩秀刘宁农翠华
Owner 深圳市华南城数字科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products