Word segmentation method and device

A word segmentation method and word segmentation technology, applied in the field of data processing, can solve the problems of low efficiency, high maintenance cost, manpower consumption, etc., and achieve the effect of avoiding efficiency drop, reducing construction cost, and avoiding high cost.

Pending Publication Date: 2022-06-28
CLOUDMINDS SHANGHAI ROBOTICS CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the above methods, dictionaries or word segmentation texts are usually created manually. Due to the large scale of dictionaries and word segmentation texts, more manpower is required, the efficiency is low, and the establishment and maintenance costs are high.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word segmentation method and device
  • Word segmentation method and device
  • Word segmentation method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] In order to make those skilled in the art better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

[0026] It should be noted that the terms "first", "second" and the like in the description and claims of the present disclosure and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a word segmentation method and device. The method comprises the following steps: dividing a to-be-processed corpus into a plurality of corpus segments according to a preset granularity; inserting mask segments among the plurality of corpus segments, and inputting a to-be-predicted corpus containing the plurality of corpus segments and the mask segments into a pre-training language model; corpus information in the mask fragments adjacent to the corpus fragments is predicted through a pre-training language model; and performing word segmentation processing on the to-be-processed corpus based on the plurality of corpus segments and the corpus information to obtain a target word segmentation result. According to the method, the corpus information of the mask fragment can be predicted through the pre-training language model, so that word segmentation processing is completed through the corpus information obtained through prediction, word segmentation can be completed without the help of a dictionary or a word segmentation text, efficiency reduction caused by manual construction of the dictionary or the word segmentation text is avoided, and word segmentation efficiency is improved.

Description

technical field [0001] The present disclosure relates to the technical field of data processing, and in particular, to a word segmentation method and device. Background technique [0002] With the development of artificial intelligence technology, Natural Language Processing (NLP) has become one of the important branches. In natural language processing, corpus data needs to be segmented to provide a basis for subsequent semantic recognition. [0003] At present, there are two main methods of Chinese word segmentation: one is based on the dictionary word segmentation algorithm, that is, the string to be matched is matched with an artificially constructed dictionary. If the word corresponding to the string is queried in the dictionary, it indicates a match. Success, the word is recognized. For example, forward maximum matching method, reverse maximum matching method, bidirectional matching tokenization method, etc. Another way is to use a statistical-based word segmentation...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/242G06F40/205
CPCG06F40/289G06F40/242G06F40/205
Inventor 罗镇权
Owner CLOUDMINDS SHANGHAI ROBOTICS CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products