Word segmentation recognition lexicon construction method and Chinese word segmentation method and device

A construction method and word segmentation technology, applied in the computer field, can solve the problem that the amount of words cannot meet the requirements of word segmentation, and the amount of words is limited.

Active Publication Date: 2020-05-19
CCB FINTECH CO LTD
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Compared with the massive text data in the network, the words collected manually are very limited, resulting in a very limited amount of words stored in the thesaurus
Then, when performing word segmentation based on a manually constructed or maintained thesaurus, the amount of words stored in the thesaurus often cannot meet the word segmentation requirements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word segmentation recognition lexicon construction method and Chinese word segmentation method and device
  • Word segmentation recognition lexicon construction method and Chinese word segmentation method and device
  • Word segmentation recognition lexicon construction method and Chinese word segmentation method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0109] Exemplary embodiments of the present invention are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present invention to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

[0110] Short sentences refer to sentences separated by symbols such as commas, commas, quotation marks, periods, question marks, etc. in the article.

[0111] A neuron is an abstract node with electrical signals that constitutes a neural network.

[0112] A word is the smallest language unit that can be used independently.

[0113] Chinese word segmentation is the proces...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a word segmentation recognition lexicon construction method and a Chinese word segmentation method and device, and relates to the technical field of computers. One specific embodiment of the method comprises the steps that for short sentences in a training text set, duplication elimination is conducted on the short sentences, corresponding neurons are constructed for all words in the short sentences obtained after duplication elimination, and the signal types indicated by the neurons are matched with the words corresponding to the neurons; according to the relative position and the occurrence frequency between every two characters in the short sentence, a link relationship is constructed between the two neurons corresponding to every two characters to form a short sentence neural network corresponding to the short sentence, and the link relationship indicates a link coefficient and a signal transmission direction; and the short sentence neural networks are fusedto form a word segmentation recognition word bank. According to the embodiment, the word quantity of the word bank and the word segmentation accuracy can be effectively improved.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method for constructing a word segmentation recognition lexicon, a Chinese word segmentation method and a device. Background technique [0002] Chinese word segmentation based on thesaurus is one of the more commonly used word segmentation methods at present. Therefore, building and maintaining thesaurus is the basis for word segmentation. [0003] The existing lexicon is mainly built and maintained manually, that is, artificially collect some existing words such as words in the "Modern Chinese Standard Dictionary" and some new words that appear on the Internet, and store the collected words to thesaurus. [0004] In the course of realizing the present invention, the inventor finds that there are at least the following problems in the prior art: [0005] Compared with the massive text data in the network, the words collected manually are very limited, resulting in a very l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/216G06N3/04G06N3/08
CPCG06N3/08G06N3/045Y02D10/00
Inventor 李胤文
Owner CCB FINTECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products