Short text clustering-based labeling system and method

A text clustering algorithm and short text technology, applied in the field of labeling systems based on short text clustering, can solve problems such as poor accuracy of results, low labeling efficiency, and high communication costs, to ensure correctness, save communication costs, and improve labeling efficiency effect

Active Publication Date: 2018-10-12
思派(北京)网络科技有限公司
View PDF4 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In view of the above analysis, the embodiment of the present invention aims to provide a tagging system and method based on short text clustering to solve the problems of low tagging efficiency, difficult training, poor result accuracy and high communication cost in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text clustering-based labeling system and method
  • Short text clustering-based labeling system and method
  • Short text clustering-based labeling system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0047] Such as figure 2 As shown, a specific embodiment of the present invention discloses a tagging system based on short text clustering, including an input module, a text clustering algorithm module, a result display module, a fast tagging module, and an output module.

[0048] Optionally, in this embodiment, the input module, the text clustering algorithm module, the result display module, the quick labeling module, and the output module are connected sequentially, and other connection methods can also be used to achieve the same effect. Those skilled in the art can understand this technical solution , which will not be repeated here.

[0049] The labeling system based on short text clustering is set on the computer. The input module receives the text to be processed (file to be processed) imported by the user, and converts the text to be processed into at least one single-line subtext. After this operation, each line represents a single-line subtext to be processed, and...

Embodiment 2

[0058] Such as image 3 As shown, based on the optimization of the above-mentioned embodiments, the tagging system based on short text clustering may also include a multi-text alignment module. Optionally, in this embodiment, the multi-text alignment module is placed between the text clustering algorithm module and the result display module, or it can be placed in other places to implement corresponding functions, as those skilled in the art can understand, so it will not be repeated here.

[0059] The multi-text alignment module is used to vertically align all single-line subtexts in each group output by the text clustering algorithm module, that is, to place the same text in different single-line subtexts on the same column as much as possible, and align the results (each group Vertically aligned single-line subtext) is sent to the result display module for visual display in groups, so that users can quickly browse all text information vertically.

[0060] Optionally, the i...

Embodiment 3

[0074] Such as Image 6 As shown, this embodiment provides a method for labeling using the labeling system based on short text clustering described in Example 2, including the following steps:

[0075] S1. In the input module, the input text to be processed is preprocessed. Convert the text to be processed into at least one single-line subtext, perform emptying and deduplication processing on all single-line subtexts, remove empty text and single-line subtexts that are identical in text, and after this step, the content of each single-line subtext It must be different.

[0076] S2. In the text clustering algorithm module, perform clustering algorithm analysis on all single-line subtexts, use hierarchical clustering algorithm to place similar single-line subtexts in the same group, and compare the grouping results and each group by modifying the edit distance Adjust the text similarity of the single-line subtext within to compress the reading volume of the text.

[0077] S3....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a short text clustering-based labeling system and method, belongs to the technical field of clinical medical labeling, and solves the problems of low labeling efficiency, difficult training, poor result accuracy and excessively high communication cost in the prior art. The short text clustering-based labeling system comprises an input module, a text clustering algorithm module, a multi-text alignment module, a result display module, a quick labeling module and an output module. Compared with the prior art, the labeling system and method has the advantages that a text clustering algorithm and a multi-text alignment algorithm are adopted, so that the reading quantity of similar sub-texts is greatly reduced and the reading speed is increased; longitudinal multi-text comparative browsing is adopted, so that the great convenience is provided for a user to perform manual comparison; and furthermore, algorithm training can be performed without any training set, and ITpersonnel do not need to perform algorithm modification for different medical texts, so that the communication cost is extremely low.

Description

technical field [0001] The invention relates to the technical field of clinical medical labeling, in particular to a labeling system and method based on short text clustering. Background technique [0002] In clinical scientific research or drug trials, we often face text processing problems. For example, it is necessary to convert the doctor’s professional description text into multiple preset structured options, or according to the provisions of the trial, important information must be entered in the original text, or the actual Circumstances outside of pre-set options require textual documentation. All of the above situations require the recorder to convert the text to be processed into standard structured options. [0003] There are three main methods for existing text annotation, namely manual annotation, fully automatic annotation based on natural language technology (NLP) or script program, and semi-automatic annotation. At present, almost all hospitals, research in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/21G06F17/27G06K9/62
CPCG06F40/103G06F40/289G06F18/22
Inventor 陶英郑鑫
Owner 思派(北京)网络科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products