Word segmentation method and device, electronic equipment and storage medium

A technology of electronic equipment and word segmentation method, which is applied in the fields of electrical digital data processing, instruments, and computing, and can solve the problems of low accuracy and low efficiency of word segmentation results.

Pending Publication Date: 2020-01-24
CHINA MOBILE SUZHOU SOFTWARE TECH CO LTD +1
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a word segmentation method, device, electronic equipment and storage medium to solve the problems of low accuracy and low efficiency of word segmentation results when using a word segmentation model obtained after retraining in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word segmentation method and device, electronic equipment and storage medium
  • Word segmentation method and device, electronic equipment and storage medium
  • Word segmentation method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0060] figure 1 A schematic diagram of a word segmentation process provided by an embodiment of the present invention, the process includes the following steps:

[0061] S101: Input word segmentation data into a pre-saved baseline word segmentation model, and determine a preliminary word segmentation result of the word segmentation data based on the baseline word segmentation model.

[0062] The word segmentation method provided by the embodiment of the present invention is applied to an electronic device, and a baseline word segmentation model is pre-stored in the electronic device, and the baseline word segmentation module is an existing word segmentation model.

[0063] The electronic device can obtain the word segmentation data to be segmented. The word segmentation data may be input by the user, or may be collected by the electronic device on other devices through the collection interface.

[0064] After the word segmentation data is obtained by the electronic device, th...

Embodiment 2

[0078] On the basis of the above embodiments, in the embodiments of the present invention, before merging the at least two segmentation units according to the preset merging rules, the method further includes:

[0079] The segmentation result is input into a pre-trained tagger, and based on the tagger, an annotation sequence of the segmentation result is output, wherein the annotation sequence includes each of the at least two segmentation units Word tagging for segmentation units;

[0080] According to the preset merging rule, merging the at least two segmentation units includes:

[0081] Merge each of the segmentation units according to the word tags of each of the segmentation units and a preset merging rule.

[0082] When the electronic device merges at least two segmentation units, if the combination is performed according to the label information corresponding to each segmentation unit, the electronic device can first determine that each segmentation unit corresponds to...

Embodiment 3

[0088] On the basis of the above embodiments, in the embodiment of the present invention, the merging of each segmentation unit according to the word tag and the preset merging rule of each segmentation unit includes:

[0089] Sequentially read each of the segmentation units and the word tags of each of the segmentation units, and merge in the following manner until the merger of each of the segmentation units is completed:

[0090] If there is a word marked as the first segmentation unit of the word start label, search for the second segmentation unit whose adjacent words are marked as the end of the word, and determine that the first segmentation unit and the first segmentation unit are located in the labeling sequence The third segmentation unit between the second segmentation units; according to the order in the label sequence, the first segmentation unit, the third segmentation unit and the second segmentation unit are combined into one complete word;

[0091] If there i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a word segmentation method and device, electronic equipment and a storage medium. The word segmentation method comprises the steps: inputting a word segmentation word stock into a pre-stored baseline word segmentation model, and determining a preliminary word segmentation result of the word segmentation word stock based on the baseline word segmentation model; inputting thepreliminary word segmentation result into a pre-trained word segmentation model, and outputting a segmentation result of the preliminary word segmentation result based on the word segmentation model,the segmentation result comprising a segmentation unit, and the segmentation unit comprising a segmentation character and/or a segmentation character set; and combining the segmentation units according to a preset combination rule, and determining a final word segmentation result of the segmented word stock. According to the word segmentation method, the existing baseline word segmentation modelis not changed, and the convergence rate of the word segmentation model is ensured, and the word segmentation efficiency is improved, and the word segmentation result of the baseline word segmentationmodel is corrected, so that the accuracy of the word segmentation result is improved.

Description

technical field [0001] The present invention relates to the technical field of word segmentation processing, in particular to a word segmentation method, device, electronic equipment and storage medium. Background technique [0002] Word segmentation refers to dividing a sequence of language characters into individual words. Word segmentation technology is the basis of text mining. For a piece of input text, successful word segmentation can achieve the effect of automatic computer recognition of the meaning of words and sentences, and realize natural speech processing. [0003] Commonly used word segmentation models are generally statistical-based word segmentation models or dictionary-based word segmentation models. The generalization ability of these two word segmentation models is generally relatively poor. Even if the supervised word segmentation model based on statistics has a certain generalization ability, due to the small amount of manually labeled corpus, the word ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289
Inventor 唐海庆童超胡小克梁俊
Owner CHINA MOBILE SUZHOU SOFTWARE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products