Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A multilingual text classification method fusing theme information and BiLSTM-CNN

A technology for subject information and text classification, applied in text database clustering/classification, unstructured text data retrieval, character and pattern recognition, etc. It can solve problems such as reduced efficiency and rely on machine translation accuracy to improve accuracy. Effect

Pending Publication Date: 2019-06-14
YANBIAN UNIV
View PDF4 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method is relatively simple, but it relies heavily on the accuracy of machine translation, resulting in reduced efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A multilingual text classification method fusing theme information and BiLSTM-CNN
  • A multilingual text classification method fusing theme information and BiLSTM-CNN
  • A multilingual text classification method fusing theme information and BiLSTM-CNN

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0090] In the first embodiment, a language text set in the multilingual text classification corpus established in step 1 is selected to conduct an experiment to verify the effectiveness of the sub-model. In the parameter setting of this example, the embedding-size is set to 220 dimensions, the number of neurons in the hidden layer is also set to 150 dimensions, the number of topics is set to 220, and the batch-size is set to 64. The compared model is TextCNN: it consists of a convolutional layer, activation layer, pooling layer and fully connected layer, which verifies the text classification accuracy that the sub-model can improve.

Embodiment 2

[0092] This embodiment is basically the same as Embodiment 1, the difference is that:

[0093] In this embodiment, the multilingual text corpus established in step 1 is selected for multilingual text classification. Extending the model to three languages, training texts in each language at the same time, and cascading in the final neural network layer, this method can accurately classify multilingual texts.

[0094] In summary, the patented method can realize multilingual text classification, and the multilingual neural network trained by the method can classify a single language, which solves the language barrier and improves the accuracy of multilingual text classification, and has the ability to expand sex.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of text classification in natural language processing, in particular to a multilingual text classification method fusing topic information and BiLSTM-CNN,which comprises the following specific implementation processes of firstly, collecting multilingual parallel corpora of Chinese and English to construct a parallel corpus; preprocessing each languagetext in the corpus; utilizing a word embedding technology to train word vectors of all languages; extracting text topic vectors of all languages by utilizing a topic model; and establishing a neural network model suitable for multilingual, fusing theme information, and carrying out multilingual text representation. The text classification method solves the language obstacles, has very high adaptability, can meet the requirement of multi-language text classification, and is high in practicability.

Description

technical field [0001] The invention relates to the technical field of text classification in natural language processing, in particular to a multilingual text classification method that combines subject information and BiLSTM-CNN. Background technique [0002] With the rapid development of the Internet, more and more Internet data exists in the form of text, and with the development of internationalization, multilingual text data is becoming more and more common. People are increasingly dissatisfied with text information in a single language environment, and the demand for multilingual text materials continues to increase. People are eager to find the information they need quickly and efficiently from multilingual text data. As a research direction of natural language processing, multilingual text classification is an effective method to solve the development of multilingual text information. [0003] Multilingual text classification, whose purpose is to extend the existi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06F17/27G06K9/62
Inventor 崔荣一孟先艳赵亚慧易志伟田明杰徐凯斌杨飞扬王琪黄政豪金国哲张振国胡荣王大千
Owner YANBIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products