Word segmentation method, device and equipment based on multilevel dictionary and readable storage medium

A word segmentation method and dictionary technology, applied in the computer field, can solve problems such as poor word segmentation performance
CN112214994AActive Publication Date: 2021-01-12SUZHOU UNIV

Patent Information

Authority / Receiving Office
CN Β· China
Patent Type
Applications(China)
Current Assignee / Owner
SUZHOU UNIV
Publication Date
2021-01-12

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a word segmentation method based on multistage dictionaries, and the method comprises the steps: employing at least two dictionaries to assist a word segmentation model in wordsegmentation, generating conventional vector representation and feature representation of a character in the at least two dictionaries during the representation of the character, and finally, determining a word forming label of the character according to the vector representation and the feature representation. According to the method, by distinguishing the status and importance of different words, the word segmentation performance of the whole scheme is improved, and the domain adaptability and the word segmentation accuracy are improved. In addition, the invention further provides a word segmentation device and equipment based on the multilevel dictionary and a readable storage medium, and the technical effect of the word segmentation device and equipment corresponds to the technical effect of the method.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The present application relates to the field of computer technology, in particular to a word segmentation method, device, device and readable storage medium based on a multi-level dictionary. Background technique

[0002] Chinese word segmentation is a process of dividing an input sentence into word sequences. An additional dictionary is usually provided for the model to alleviate the problem of insufficient training data for manual annotation. However, the current word segmentation schemes all use single-level dictionaries, ignoring the problem that different words in the dictionary have different word-forming probabilities, and also ignoring the problem that the same string becomes a word in one field but not in another field, resulting in The word segmentation effect of the word segmentation model is poor.

[0003] The word segmentation method based on a single-level dictionary also has the problem of little influence on the actual word segmentati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More