Method and system for constructing text label system, method and system for completing iteration and storage medium

A text labeling and system technology, applied in text database indexing, text database query, unstructured text data retrieval, etc., can solve the problems of high labor cost, lack of self-iterative update, etc.

Inactive Publication Date: 2020-03-27
GUANGZHOU BAILING DATA CO LTD
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Regardless of pre-preparation or post-maintenance, this method requires the deep participation of business personnel, the labor cost is too high, and it does not have the function of self-iterative update

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for constructing text label system, method and system for completing iteration and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0041] Such as figure 1 As shown, the text label system of the present invention builds and perfects the iterative method, comprises the following steps:

[0042] S1. Preliminary construction of the labeling system: implement text clustering and aggregate similar text data sets; based on similar texts, use information extraction technology to extract feature words and descriptors of similar texts; use similarity calculation formulas (such as jaccard coefficient, minhash) to calculate similarity Based on the similarity score, similar feature words are merged to obtain the final feature phrase; based on the industry field and industry characteristics, the class label is summarized, and at least one feature word that can be used to describe the label is configured for each label, and the first version of the label is finally formed. system and its configuration lexicon. This step is as follows:

[0043] S101. Data preparation, preparing a data object; in the present invention, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the field of natural language processing, and relates to a method and a system for establishing a text label system and completing iteration, and a storage medium. The methodcomprises the following steps that firstly, a label system is preliminarily built, class labels are concluded, at least one feature word used for describing labels is configured for each label, and aprimary label system and a configuration word bank of the primary label system are formed; the label system is verified and coverage conditions of the current label system and the configuration lexicon thereof are evaluated; the label system is improved and text clustering and information extraction are carried out on the text data which is not covered by the configuration word bank of the currentlabel system again to obtain a new batch of feature words and description words; and the current label system is compared with the configuration word bank thereof based on the text similarity to merge the feature words with high similarity, and class labels are named for newly discovered description words to obtain a latest label system. According to the invention, automatic extraction and integration of labels are realized, the constructed system can be self-perfected and optimized, and manual intervention is reduced.

Description

technical field [0001] The invention belongs to the field of natural language processing, and specifically relates to a method, system and storage medium for building a text label system and improving iteratively. Background technique [0002] Under the wave of big data, the data information that people come into contact with every day is increasing exponentially. As a high-level description of a certain information subject, tags can liberate people from a large amount of redundant information. The huge amount of data makes it too expensive and difficult to manually summarize labels, and subjective bias will also affect the accuracy of label recognition, especially in the field of natural language processing. Therefore, it is particularly important to use text mining technology to realize intelligent recognition of text labels. [0003] At present, most of the patent documents are the application of the existing labeling system in vertical search, index evaluation, marketi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/31G06F16/35G06F16/33
CPCG06F16/313G06F16/353G06F16/334
Inventor 姜磊杨钊赖招展王鹏雨陈南山朱振航何慧沈广盈屈吕杰
Owner GUANGZHOU BAILING DATA CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products