Unlock instant, AI-driven research and patent intelligence for your innovation.

Chinese word segmentation word frequency method and device suitable for multi-user customized dictionary

A user-defined, Chinese word segmentation technology, applied in digital data processing, text database query, special data processing applications, etc., can solve the problems of multiple initialization operations, long time consumption, etc.

Pending Publication Date: 2021-10-08
上海众言网络科技有限公司
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The main purpose of the present invention is to provide a Chinese word segmentation word frequency method and device suitable for multi-user-defined dictionaries, so as to solve the problem of the need for multiple initialization operations when the existing Chinese word segmentation device provides online word-segmentation word frequency services for multi-user-defined dictionaries. long time problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese word segmentation word frequency method and device suitable for multi-user customized dictionary
  • Chinese word segmentation word frequency method and device suitable for multi-user customized dictionary
  • Chinese word segmentation word frequency method and device suitable for multi-user customized dictionary

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] In order to enable those skilled in the art to better understand the solutions of the present invention, the following will clearly and completely describe the technical solutions in the embodiments of the present invention in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only It is an embodiment of a part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

[0040] It should be noted that the terms "first" and "second" in the description and claims of the present invention and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It should be understood that the data so used may be interchanged under appropriate...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a Chinese word segmentation word frequency method and device suitable for a multi-user customized dictionary. The method comprises the following steps: initializing a hanlp word segmentation service; generating word frequency results of each user customized dictionary according to a first text and the multi-user customized dictionary by adopting an Aho-Corasick algorithm; according to the first text and the multi-user customized dictionary, generating a second text word segmentation word frequency result through the hanlp word segmentation service; and combining the word frequency result of each user customized dictionary and the word frequency result of the second text word segmentation to obtain a final Chinese word segmentation word frequency result. According to the invention, the position of the user customized dictionary in the text is rapidly positioned through the Aho-Corasick algorithm, an original text is replaced with space characters, and for the word segmentation service of the multi-user customized dictionary, the high-performance word segmentation word frequency service supporting the user customized dictionary of multiple users in a high-concurrency mode can be achieved only through one-time initialization operation.

Description

technical field [0001] The invention relates to the technical field of Chinese word segmentation, in particular to a method and device for Chinese word segmentation word frequency applicable to multiple user-defined dictionaries. Background technique [0002] At present, for the online word segmentation word frequency service of user-defined dictionaries provided by open source Chinese word breakers (such as hanlp word breaker and jieba word breaker), it supports adding entries to a single user-defined dictionary, but it needs to load user-defined For custom dictionaries, if you need to support the online word segmentation word frequency function of multiple user-defined dictionaries, you need to re-initialize the Chinese tokenizer for each user-defined dictionary, which takes a long time. [0003] Aiming at the time-consuming problem of multiple initialization operations when the Chinese word segmenter provides online word-segmentation word-frequency services for multiple u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/216G06F16/33
CPCG06F40/289G06F40/216G06F16/3344
Inventor 王平潘成赵鹏
Owner 上海众言网络科技有限公司