Semi-automatic word segmentation corpus labeling and training device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A training device and semi-automatic technology, applied in special data processing applications, instruments, electrical and digital data processing, etc., can solve the problem of organizing various language information into machines that can be directly read, reducing labor costs and improving efficiency and accuracy, reducing the effect of complexity

Active Publication Date: 2019-09-27

10TH RES INST OF CETC

View PDF8 Cites 9 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Due to the generality and complexity of Chinese language knowledge, it is difficult to organize various language information into a form that can be directly read by machines. Therefore, the word segmentation system based on comprehension is still in the experimental stage.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0023] See figure. In the preferred embodiment described below, a semi-automatic word segmentation corpus labeling training device includes: a text corpus labeling preparation module, a semi-automatic corpus word segmentation labeling module, a feedback model learning training module and a word segmentation labeling model effect evaluation module, which The feature is that the text corpus labeling preparation module provides preparation for labeling tasks. By distinguishing data from different sources and selecting corpus sources, pre-labeling the corpus data to be labeled according to the source or subject is performed for a single word segmentation, and the corpus to be labeled and word segmentation are realized. Data management, and then through multiple word segmentation algorithms such as bidirectional maximum matching word segmentation based on integrated dictionaries, conditional random field CRF, JIEBA, bidirectional LSTM network, BI-LSTM, etc., submit the raw corpus wo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a semi-automatic word segmentation corpus labeling and training device, which aims to overcome the defects of the corpora used during the word segmentation corpus labeling and training process. The device of the invention is realized through the following technical schemes of using a text corpus annotation preparation module for managing the to-be-annotated corpora and the segmented word corpora; based on a plurality of word segmentation algorithms, such as the bidirectional maximum matching word segmentation based on an integrated dictionary, CRF, JIEBA, etc., submitting the word segmentation annotation work of the raw corpus to a semi-automatic corpus word segmentation annotation module; creating the segmented word tagging tasks, selecting a labeling applicable algorithm model, carrying out the automatic annotations, on the basis of automatic labeling result fusion, feeding back a training model corpus and a labeling model generated by the text corpus labeling preparation module to the feedback model learning training module; selecting and carrying out model learning training, calling a unified training model interface to generate a core dictionary, updating a word segmentation training model table, establishing a labeling algorithm comprehensive evaluation model to evaluate a model labeling effect, so that a new word segmentation labeling task is completed.

Description

technical field [0001] The invention relates to the technical field of text mining, in particular to a semi-automatic labeling training device for word segmentation data. Background technique [0002] Words are the smallest, independently active, and meaningful language components, but there are no obvious distinguishing marks between words in Chinese. Therefore, Chinese word analysis is the basis and key of Chinese information processing. The accuracy of word segmentation is closely related to the accuracy of part-of-speech tagging. Organically integrating the process of word segmentation and part-of-speech tagging is conducive to eliminating ambiguity and improving overall efficiency. A Chinese sentence is composed of consecutive words, and there is no space separation between words. Part-of-speech tagging refers to the process of determining an appropriate part-of-speech for each word in a sentence. Chinese word segmentation is the first "process" of Chinese information...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F17/27G06K9/62

CPCG06F40/211G06F40/289G06F18/214

Inventor代翔崔莹黄细凤孙涛李强

Owner10TH RES INST OF CETC

Semi-automatic word segmentation corpus labeling and training device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology