Standard lexicon word segmentation method, device and equipment and computer readable storage medium

A word segmentation method and standard word technology, which can be applied in computing, instrumentation, electronic digital data processing, etc., and can solve the problems that the standard thesaurus does not meet the NLP requirements, and the standard thesaurus cannot be used for word segmentation.

Active Publication Date: 2019-06-07
PING AN TECH (SHENZHEN) CO LTD
View PDF14 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] When segmenting the standard thesaurus through a dictionary or manual assisted annotation, due to the limited dictionary or manual assisted annotation, it is impossible to accurately segment the standard thesaurus, making the standard thesaurus unsuitable for NLP applications in specific fields, such as the medical field , it is necessary to re-segment the existing standard thesaurus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Standard lexicon word segmentation method, device and equipment and computer readable storage medium
  • Standard lexicon word segmentation method, device and equipment and computer readable storage medium
  • Standard lexicon word segmentation method, device and equipment and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0043] The standard thesaurus word segmentation method involved in the embodiment of the present invention is mainly applied to a standard thesaurus word segmentation device, which may be a PC (personal computer), a portable computer, a mobile terminal and other devices with display and processing functions.

[0044] refer to figure 1 , figure 1It is a schematic diagram of the hardware structure of the standard thesaurus word segmentation device involved in the solution of the embodiment of the present invention. In the embodiment of the present invention, the standard lexicon word segmentation device may include a processor 1001 (such as a Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein, the communication bus 1002 is used to realiz...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a standard word library word segmentation method, device and equipment and a computer readable storage medium. The method comprises: standard words in a standard word library tobe segmented are scattered into single Chinese characters to form an original Chinese character library, and a first adjacent probability and a first Bayesian probability between every two Chinese characters in the original Chinese character library are calculated; performing a Chinese character merging operation on the original Chinese character library according to the first adjacent probability and the first Bayesian probability to obtain a to-be-adjusted Chinese character library; judging whether the minimum adjacent probability in the second adjacent probability between every two Chinesecharacters in the to-be-adjusted Chinese character library is greater than a preset threshold value or not; if yes, according to a second adjacent probability and a second Bayesian probability, executing a Chinese character combination operation on the to-be-adjusted Chinese character library until the minimum adjacent probability in the adjacent probabilities between every two Chinese charactersin the obtained target Chinese character library is smaller than or equal to a preset threshold value; otherwise, outputting the combined Chinese character groups as standard words. According to themethod, the word segmentation accuracy of the standard lexicon and the universality of the standard lexicon are improved.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a standard thesaurus word segmentation method, device, equipment and computer-readable storage medium. Background technique [0002] NLP (Natural Language Processing, Natural Language Processing) is a sub-field of artificial intelligence. Currently, NLP is mainly calculated through the existing word segmentation database, which can be performed through dictionaries or artificially assisted annotation. [0003] When segmenting the standard thesaurus through a dictionary or manual assisted annotation, due to the limited dictionary or manual assisted annotation, it is impossible to accurately segment the standard thesaurus, making the standard thesaurus unsuitable for NLP applications in specific fields, such as the medical field , it is necessary to re-segment the existing standard thesaurus. [0004] Therefore, how to improve the word segmentation accu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 黄越陈明东
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products