Word segmentation method, device, equipment and readable storage medium based on multi-level dictionary

A word segmentation method and dictionary technology, which is applied in the computer field, can solve problems such as poor word segmentation performance, and achieve the effects of improving word segmentation accuracy, improving domain adaptability, and improving word segmentation performance

Active Publication Date: 2021-06-01
SUZHOU UNIV
View PDF13 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of this application is to provide a word segmentation method, device, device and readable storage medium based on a multi-level dictionary, to solve the problem that current word segmentation models all use single-level dictionaries, resulting in poor word segmentation performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word segmentation method, device, equipment and readable storage medium based on multi-level dictionary
  • Word segmentation method, device, equipment and readable storage medium based on multi-level dictionary
  • Word segmentation method, device, equipment and readable storage medium based on multi-level dictionary

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0050] The following introduces Embodiment 1 of a word segmentation method based on a multi-level dictionary provided by the present application, see figure 1 , embodiment one includes:

[0051] S101. For the target sentence, generate a vector representation of each character, and generate feature representations of each character in at least two dictionaries;

[0052] Specifically, the above-mentioned process of generating the feature representations of each character in at least two dictionaries specifically includes: for each word, generating its feature representations in each dictionary, performing Splicing to obtain the feature representation of the word in at least two dictionaries.

[0053] The above at least two dictionaries may be dictionaries classified according to their fields, dictionaries classified according to their word-forming probabilities, or even dictionaries classified according to their respective fields and word-forming probabilities at the same time....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

This application discloses a word segmentation method based on multi-level dictionaries. The method uses at least two dictionaries to assist the word segmentation model for word segmentation. When representing a character, it not only generates a conventional vector representation, but also generates the character in at least two The feature representation in a dictionary, and finally determine the word-forming label of the character according to the vector representation and feature representation. This method improves the word segmentation performance of the overall scheme by distinguishing the status and importance of different words, and improves the domain adaptability and word segmentation accuracy. In addition, the present application also provides a word segmentation device, device and readable storage medium based on a multi-level dictionary, the technical effect of which is corresponding to the technical effect of the above method.

Description

technical field [0001] The present application relates to the field of computer technology, in particular to a word segmentation method, device, device and readable storage medium based on a multi-level dictionary. Background technique [0002] Chinese word segmentation is a process of dividing an input sentence into word sequences. An additional dictionary is usually provided for the model to alleviate the problem of insufficient training data for manual annotation. However, the current word segmentation schemes all use single-level dictionaries, ignoring the problem that different words in the dictionary have different word-forming probabilities, and also ignoring the problem that the same string becomes a word in one field but not in another field, resulting in The word segmentation effect of the word segmentation model is poor. [0003] The word segmentation method based on a single-level dictionary also has the problem of little influence on the actual word segmentati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/242G06F40/289G06N3/02G06N3/08G06N20/00
CPCG06F40/289G06F40/242G06N3/08G06N20/00G06N3/02
Inventor 李正华周厚全侯洋周仕林张民
Owner SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products