Unlock instant, AI-driven research and patent intelligence for your innovation.

Text multi-label analysis method, device, electronic equipment and storage medium

An analysis method and multi-label technology, applied in the field of text multi-label analysis methods, devices, electronic equipment and storage media, can solve the problem that there is no solution, the subject term correspondence matrix is ​​difficult to meet the precise requirements of Chinese semantics, and the word segmentation results have the essence Impact and other issues

Active Publication Date: 2021-03-02
北京智慧星光信息技术有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since the essence is also a bag-of-words model, the disadvantage is that if the word segmentation is not accurate and the regional term changes have a substantial impact on the results, the final modeled subject term correspondence matrix is ​​difficult to meet the precise requirements of Chinese semantics
[0005] For the above problems, there is no better solution

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text multi-label analysis method, device, electronic equipment and storage medium
  • Text multi-label analysis method, device, electronic equipment and storage medium
  • Text multi-label analysis method, device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0047] The text multi-label analysis method of the present embodiment comprises the following steps:

[0048] S1. Acquire training text data, where the training text data includes multiple texts, and perform word segmentation on the training text data.

[0049] The word segmentation algorithm is an algorithm that divides a sentence into a series of word combinations. For example, "I pass by Peking University" can be segmented into "I / passing / Peking University". You can use the pkuseg word segmentation model provided by Peking University for word segmentation. Since this model has subdivided pre-training models in different fields, and also supports the use of brand-new labeled data for training, you can obtain a self-training model and get more accurate word segmentation results.

[0050] S2, use N-gram (a language m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text multi-label analysis method, device, electronic equipment and storage medium. The method includes: acquiring training text data, the training text data includes multiple texts, and performing word segmentation on the training text data; using N-gram Word segmentation is modeled to form a text collection that includes a plurality of word sequence sets, as a corpus, the word sequence set includes a plurality of word sequences made up of N words; the LDA model is trained using the corpus, and the The training results are scored and verified, and the LDA model with high score is selected for persistence to obtain the persistent LDA model; the persistent LDA model is used to extract the topic cluster of the test text data, and the label is defined in combination with the semi-supervised method of the term to determine the Cluster labels for topic clusters. The invention combines LDA and N-gram, considers the sequence of words, and has the characteristics of LDA to quickly generate topic clusters, can accurately obtain text topic clusters, and obtain text topic meanings.

Description

technical field [0001] The invention relates to the field of data processing, in particular to a text multi-label analysis method, device, electronic equipment and storage medium. Background technique [0002] Automatically define the subject meaning of a text. The current method is mainly keyword definition or text classification. Among them, the keyword definition methods are mostly extraction and generation. The extraction method directly intercepts the words representing the topic of the article from the text. , and the generative model is better than some articles that lack key expressions. However, due to the complexity of the Chinese context, such as the comment text "These can only be explained or not enough", the current mainstream model generative architecture seq2seq is too implicit for these Regardless of the lack of understanding of the short text contained in it, the response performance of the service cannot meet the rapidly increasing data volume processing n...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/289G06K9/62
CPCG06F16/35G06F40/289G06F18/2155
Inventor 龚浩李彦才李青龙白剑波彭璿韜
Owner 北京智慧星光信息技术有限公司