Standard word library word segmentation method, device and equipment and computer readable storage medium

A word segmentation method and standard word technology, applied in the field of data processing

Active Publication Date: 2019-05-17
PING AN TECH (SHENZHEN) CO LTD
View PDF11 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The main purpose of the present invention is to provide a standard lexicon word segmentation method, device, equipment and computer...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Standard word library word segmentation method, device and equipment and computer readable storage medium
  • Standard word library word segmentation method, device and equipment and computer readable storage medium
  • Standard word library word segmentation method, device and equipment and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0057] The invention provides a word segmentation method of a standard thesaurus.

[0058] Please refer to figure 1 , figure 1 It is a schematic flowchart of the first embodiment of the standard thesaurus word segmentation method of the present invention. In this embodiment, the standard thesaurus word segmentation method includes:

[0059] Step S10, splitting the standard words in the standard lexicon to be segmented into individual Chinese characters to form a Chinese character library, and generating the adjacent frequency between every two Chinese characters in the Chinese character library;

[0060] The standard lexicon word segmentation method of the present invention is applied to a server, and is applicable to carry out word segmentation to each standard word in the standard lexicon by the server, and each ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a standard word library word segmentation method, device and equipment and a computer readable storage medium, and the method comprises the steps of splitting standard words ina standard word library to be subjected to word segmentation into single Chinese characters, forming a Chinese character library, and generating the adjacent frequency between every two Chinese characters in the Chinese character library; carrying out merging operation on the Chinese characters in the Chinese character library according to the adjacent frequencies to generate a Chinese charactergroup, and carrying out updating operation on the adjacent frequencies among the Chinese characters in the Chinese character library after the merging operation; judging whether the maximum frequencyvalue of the adjacent frequencies among the Chinese characters in the updated Chinese character library is smaller than a preset threshold value or not; if not, executing the step of carrying out merging operation on each Chinese character in the Chinese character library according to the adjacent frequency; and if yes, forming standard segmented words of the standard word library to be segmentedby the Chinese character groups. According to the scheme, the standard words in the standard word library to be segmented are segmented through the adjacent frequencies among the Chinese characters, and the word segmentation accuracy of the standard word library to be segmented can be effectively improved.

Description

technical field [0001] The present invention mainly relates to the technical field of data processing, and in particular, relates to a standard thesaurus word segmentation method, device, equipment and computer-readable storage medium. Background technique [0002] NLP (Natural Language Processing, Natural Language Processing) is a sub-field of artificial intelligence. Currently, NLP is mainly calculated through the existing word segmentation database, which can be performed through dictionaries or artificially assisted annotation. [0003] When segmenting the standard thesaurus through a dictionary or manual assisted annotation, due to the limited dictionary or manual assisted annotation, it is impossible to accurately segment the standard thesaurus, so that the word segmentation of the standard thesaurus does not meet the requirements of NLP in specific fields, such as the medical field. The application needs to re-segment the existing standard thesaurus. Therefore, how t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
Inventor 黄越陈明东
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products