Word segmentation method, device and equipment based on multilevel dictionary and readable storage medium

A word segmentation method and dictionary technology, applied in the computer field, can solve problems such as poor word segmentation performance

Active Publication Date: 2021-01-12
SUZHOU UNIV
View PDF13 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of this application is to provide a word segmentation method, device, device and readable storage medium based on a multi-level dictionary, to solve the problem that current word segmentation models all use single-level dictionaries, resulting in poor word segmentation performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word segmentation method, device and equipment based on multilevel dictionary and readable storage medium
  • Word segmentation method, device and equipment based on multilevel dictionary and readable storage medium
  • Word segmentation method, device and equipment based on multilevel dictionary and readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0050] The following introduces Embodiment 1 of a word segmentation method based on a multi-level dictionary provided by the present application, see figure 1 , embodiment one includes:

[0051] S101. For the target sentence, generate a vector representation of each character, and generate feature representations of each character in at least two dictionaries;

[0052] Specifically, the above-mentioned process of generating the feature representations of each character in at least two dictionaries specifically includes: for each word, generating its feature representations in each dictionary, performing Splicing to obtain the feature representation of the word in at least two dictionaries.

[0053] The above at least two dictionaries may be dictionaries classified according to their fields, dictionaries classified according to their word-forming probabilities, or even dictionaries classified according to their respective fields and word-forming probabilities at the same time....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a word segmentation method based on multistage dictionaries, and the method comprises the steps: employing at least two dictionaries to assist a word segmentation model in wordsegmentation, generating conventional vector representation and feature representation of a character in the at least two dictionaries during the representation of the character, and finally, determining a word forming label of the character according to the vector representation and the feature representation. According to the method, by distinguishing the status and importance of different words, the word segmentation performance of the whole scheme is improved, and the domain adaptability and the word segmentation accuracy are improved. In addition, the invention further provides a word segmentation device and equipment based on the multilevel dictionary and a readable storage medium, and the technical effect of the word segmentation device and equipment corresponds to the technical effect of the method.

Description

technical field [0001] The present application relates to the field of computer technology, in particular to a word segmentation method, device, device and readable storage medium based on a multi-level dictionary. Background technique [0002] Chinese word segmentation is a process of dividing an input sentence into word sequences. An additional dictionary is usually provided for the model to alleviate the problem of insufficient training data for manual annotation. However, the current word segmentation schemes all use single-level dictionaries, ignoring the problem that different words in the dictionary have different word-forming probabilities, and also ignoring the problem that the same string becomes a word in one field but not in another field, resulting in The word segmentation effect of the word segmentation model is poor. [0003] The word segmentation method based on a single-level dictionary also has the problem of little influence on the actual word segmentati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/242G06F40/289G06N3/02G06N3/08G06N20/00
CPCG06F40/289G06F40/242G06N3/08G06N20/00G06N3/02
Inventor 李正华周厚全侯洋周仕林张民
Owner SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products