Supercharge Your Innovation With Domain-Expert AI Agents!

Maritime and seaman long text classification method and device based on fusion features and medium

A technology that combines features and classification methods, applied in neural learning methods, semantic analysis, instruments, etc., can solve problems such as ignoring document hierarchy information, and achieve the effect of reduced complexity and high classification accuracy

Pending Publication Date: 2022-05-31
NANJING UNIV OF INFORMATION SCI & TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For long text, directly inputting the document as a long sequence will not only bring challenges to the performance of the model, but also ignore the hierarchical structure information of the document

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Maritime and seaman long text classification method and device based on fusion features and medium
  • Maritime and seaman long text classification method and device based on fusion features and medium
  • Maritime and seaman long text classification method and device based on fusion features and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0058] This embodiment provides a method for classifying long texts of maritime affairs based on fusion features, the method comprising:

[0059] Firstly, obtain the long texts of maritime affairs to be classified, segment the long texts of maritime affairs to be classified, and send the divided texts to the BERT pre-training model to obtain word vectors and sentence vectors containing partial texts;

[0060] Secondly, the word vector is sent to the convolutional neural network (Convolutional Neural Network, CNN) to generate the feature vector of the local text, and the feature vector of the local text and the BERT sentence vector are fused as the final sentence vector of the local text;

[0061] Then, input the sentence vectors of n groups of text fusion after the long text is divided into the bidirectional long short-term memory network (Bi-directional Long Short-Term Memory, Bi-LSTM) to extract the global information of the text;

[0062] Finally, by introducing an attentio...

Embodiment 2

[0088] This embodiment provides a device for classifying long texts of maritime affairs based on fusion features, including:

[0089] Acquisition module: used to obtain long texts of maritime affairs to be classified;

[0090] Segmentation module: it is used to segment the long text of maritime affairs to be classified, and obtain the divided texts;

[0091] Word embedding layer module: used to send the divided text into the BERT pre-training model to obtain the word vector and BERT sentence vector of the local text;

[0092] CNN layer module: used to send the word vector into the convolutional neural network, generate the feature vector of the local text, and fuse the feature vector of the local text and the BERT sentence vector as the final sentence vector of the local text;

[0093] Bi-LSTM layer module: used to input the final sentence vector of each local text into the two-way long short-term memory network to extract the global information of the text;

[0094] Attenti...

Embodiment 3

[0097] The embodiment of the present invention also provides a device for classifying maritime long texts based on fusion features, including a processor and a storage medium;

[0098]The storage medium is used to store instructions;

[0099] The processor is operable in accordance with the instructions to perform the steps of the following method:

[0100] Firstly, obtain the long texts of maritime affairs to be classified, segment the long texts of maritime affairs to be classified, and send the divided texts to the BERT pre-training model to obtain word vectors and sentence vectors containing partial texts;

[0101] Secondly, the word vector is sent to the convolutional neural network (Convolutional Neural Network, CNN) to generate the feature vector of the local text, and the feature vector of the local text and the BERT sentence vector are fused as the final sentence vector of the local text;

[0102] Then, input the sentence vectors of n groups of text fusion after the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a maritime affair and marine merchant long text classification method and device based on fusion features and a medium, and the method comprises the steps: firstly segmenting a preprocessed long text, respectively sending the segmented small text segments into a BERT pre-training model, and obtaining word vectors and sentence vectors containing local texts; secondly, sending the word vector into a convolutional neural network to generate a feature vector of the local text, and fusing the feature vector of the local text and the BERT sentence vector to serve as a final sentence vector of the local text; inputting sentence vectors fused by n groups of texts after long text division into a bidirectional long short-term memory network to extract global information of the texts; finally, an attention mechanism is introduced to pay attention to key points, softmax is adopted to obtain final probability expression of the long text, and model classification efficiency and accuracy are improved.

Description

technical field [0001] The invention relates to a method, device and medium for classifying long texts of maritime affairs based on fusion features, and belongs to the technical field of natural language processing. Background technique [0002] With the continuous deepening of the reform of my country's judicial system, a large number of judgment documents have been made public through the Internet by major courts, and the lack of document category labels has made it difficult for legal personnel to retrieve massive judgment text information resources. How to quickly and efficiently automatically classify adjudication documents is an urgent problem to be solved. [0003] Judgment document classification belongs to the category of text classification. Text classification, as one of the most classic and basic tasks in the field of Natural Language Processing (NLP), is widely used in topic classification, sentiment analysis, and question-answer matching. According to the pre...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/211G06F40/30G06K9/62G06N3/04G06N3/08
CPCG06F40/211G06F40/30G06N3/08G06N3/044G06N3/045G06F18/2415
Inventor 鲍闯李鹏冯姣王文超
Owner NANJING UNIV OF INFORMATION SCI & TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More