Text classification method based on multi-source features, terminal equipment and storage medium

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A text classification and text technology, applied in text database clustering/classification, neural learning methods, text database query, etc., can solve the problems of shrinking dictionary length, limited keyword extraction ability, and reducing computing costs, etc., to achieve enhanced text semantics features, increasing interpretability, reducing noise

Pending Publication Date: 2022-05-06

XIAMEN MEIYA PICO INFORMATION

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] (1) Traditional method keyword extraction methods include: LDA topic model, TF-IDF, TextRank, etc. and improvements based on these methods. For longer texts, the keyword extraction method based on traditional statistical learning has relatively large noise

The traditional keyword extraction method is single, resulting in limited keyword extraction, especially for the limited ability of keyword extraction in professional fields

[0006] (2) The feature extraction methods based on deep learning include: the method of integrating LSTM and LDA. Although this method can obtain good keyword extraction results based on the word embedding method based on training, each vocabulary pair is classified based on multi-source features. The contribution of is not equivalent, and there is a problem of lexical feature redundancy

Keyword extraction based on the attention mechanism method and the multi-feature fusion method, although to a certain extent makes up for the defect of single etymology for keyword extraction, but the keywords obtained are limited

[0007] (3) Text classification methods based on single etymology or multi-source features based on character features cannot obtain multi-dimensional features at the same time: text classification methods based on lexical features already have certain feature capture and semantic understanding capabilities; The text classification method based on multi-source features based on character embedding can greatly reduce the length of the dictionary, reduce the operation cost, and improve the operation efficiency, but lacks semantic information.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0039] The embodiment of the present invention provides a text classification method based on multi-source features, such as figure 1 and figure 2 As shown, the method includes the following steps:

[0040] S1: Receive the text to be analyzed and perform word segmentation processing on it.

[0041] In this embodiment, the text to be analyzed is chat record text. Since there may be a lot of noise in the chat records, which will affect the effect of text classification based on multi-source features, it also needs to be preprocessed to remove words or words in the text that affect the classification results. The preprocessing in this embodiment includes data cleaning and removing stop words.

[0042] S2: Obtain the word weight matrix M of each word in the text to be analyzed by adding a self-attention mechanism to the LSTM network word and the word weight matrix M of each word char .

[0043] This step S2 is used to obtain the attention weight based on the attention mecha...

Embodiment 2

[0086] The present invention also provides a text classification terminal device based on multi-source features, including a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor executes the computer program The steps in the above method embodiment of Embodiment 1 of the present invention are realized at the same time.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a multi-source feature-based text classification method, terminal equipment and a storage medium. The method comprises the following steps: S1, receiving a text and carrying out word segmentation; s2, obtaining a word attention weight matrix and a word attention weight matrix by adding a self-attention mechanism in the LSTM network; s3, constructing a keyword table, and searching a core keyword table from the keyword table based on a word segmentation result; s4, extracting by adopting N keyword extraction algorithms to obtain N candidate keyword tables; s5, based on the word attention weight matrix and the word attention weight matrix, obtaining an expanded keyword table and an expanded keyword table according to the candidate keyword table; s6, taking all characters and words in the core keyword table, the expanded keyword table and the expanded keyword table as keywords and keywords; s7, performing feature extraction on the keywords and the keywords; and S8, based on the extracted features, predicting the category of the text through a classification network. According to the invention, the text classification accuracy is improved.

Description

technical field [0001] The invention relates to the field of text classification, in particular to a multi-source feature-based text classification method, terminal equipment and storage media. Background technique [0002] In recent years, the mobile Internet has developed rapidly. As of December 2020, the number of Internet users in China has reached 989 million, of which mobile Internet users accounted for 99.7%, and instant messaging apps accounted for 99.2% of mobile Internet users. Communication software represented by QQ and WeChat has become an indispensable part of most people's daily work and life, and it has also become a tool for criminals to publish false news and conduct illegal activities on the Internet. In the field of electronic forensics, keywords are mined from mobile phones, and key topics are quickly and intelligently classified, bringing key progress to the detection of cases. [0003] The diversity of chat topics, the complexity of data, the serious ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F40/289G06F40/30G06F16/33G06F16/35G06N3/04G06N3/08

CPCG06F40/289G06F40/30G06F16/3344G06F16/353G06N3/08G06N3/044

Inventor 刘晓芳杜新胜陈志明赵建强庄灿波

Owner XIAMEN MEIYA PICO INFORMATION

Text classification method based on multi-source features, terminal equipment and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology