Semi-automatic word segmentation material labeling training device

A training device and semi-automatic technology, applied in natural language data processing, instruments, computing, etc., can solve the problem of organizing various language information into machines that can be read directly, so as to improve the efficiency and accuracy of labeling, and reduce manual work. Cost and complexity reduction

Active Publication Date: 2022-07-08
10TH RES INST OF CETC
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the generality and complexity of Chinese language knowledge, it is difficult to organize various language information into a form that can be directly read by machines. Therefore, the word segmentation system based on comprehension is still in the experimental stage.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semi-automatic word segmentation material labeling training device
  • Semi-automatic word segmentation material labeling training device
  • Semi-automatic word segmentation material labeling training device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] See figure. In the preferred embodiment described below, a semi-automatic word segmentation corpus labeling training device includes: a text corpus labeling preparation module, a semi-automatic corpus word segmentation labeling module, a feedback model learning training module and a word segmentation labeling model effect evaluation module, which The feature is: the text corpus labeling preparation module provides preparation for labeling tasks. By distinguishing data from different sources and selecting corpus sources, the labeling corpus data is pre-labeled according to the source or topic for a single word segmentation. Data management, and then submit the raw corpus word segmentation and labeling work to the semi-automatic corpus word segmentation and labeling module through a variety of word segmentation algorithms such as bidirectional maximum matching word segmentation based on integrated dictionary, conditional random field CRF, JIEBA, bidirectional LSTM network,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention is a semi-automatic word segmentation material labeling training device, which aims to solve the drawbacks of using language material in the process of word segmentation material labeling and training. The present invention is realized through the following technical solutions: the text corpus labeling preparation module manages the labeling corpus and the word segmentation material, and through the two-way maximum matching word segmentation, CRF, JIEBA, etc. The work is submitted to the semi-automatic corpus word segmentation and tagging module to create a word segmentation and tagging task, select the appropriate algorithm model for labeling, and carry out automatic labeling. The feedback model learning and training module, select and model learning and training, call the unified training model interface to generate the core dictionary, update the word segmentation training model table, establish a comprehensive evaluation model of the labeling algorithm to evaluate the model labeling effect, and complete the new word segmentation labeling task.

Description

technical field [0001] The invention relates to the technical field of text mining, in particular to a semi-automatic labeling training device for word segmentation materials. Background technique [0002] Words are the smallest, independent and meaningful language components, but there is no obvious distinguishing mark between words in Chinese. Therefore, Chinese word analysis is the basis and key of Chinese information processing. The accuracy of word segmentation is closely related to the accuracy of part-of-speech tagging, and organically integrating the process of word segmentation and part-of-speech tagging is beneficial to eliminate ambiguity and improve overall efficiency. Chinese sentences are composed of consecutive words without spaces between them. Part-of-speech tagging refers to the process of determining an appropriate part of speech for each word in a sentence. Chinese word segmentation is the first "process" of Chinese information processing, and plays an ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/289G06F40/211G06K9/62
CPCG06F40/211G06F40/289G06F18/214
Inventor 代翔崔莹黄细凤孙涛李强
Owner 10TH RES INST OF CETC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products