Long text cascade classification method, system and device and storage medium

A cascading classification and long text technology, applied in text database clustering/classification, unstructured text data retrieval, semantic analysis, etc., can solve the problem of limiting the performance of classification models and the scope of application of upper-level models, and the inability to interact across intervals , loss of key information and other issues, to achieve the effect of improving experience and viscosity, improving capture ability, and promoting performance improvement

Pending Publication Date: 2020-11-13
杭州识度科技有限公司
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] (1) Every word in the document is interactively calculated with all other words, resulting in high complexity, especially when the document length exceeds the threshold
[0006] (2) Different interval texts obtained by segmentation ca...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Long text cascade classification method, system and device and storage medium
  • Long text cascade classification method, system and device and storage medium
  • Long text cascade classification method, system and device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057] Embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0058] Such as figure 1 As mentioned above, the embodiment of the present invention provides a long text cascade classification method, comprising the following steps:

[0059] 1. Preprocess the input long text data.

[0060] 1), the first segmentation.

[0061] Segment the long text according to the specified interval length d, and get the first interval s 1 ,, taking the legal field as an example, such as figure 2 As shown, the first interval obtained is: [Zhang Xiulian's debt due to family...].

[0062] 2), continuous segmentation.

[0063]Based on the first segmentation of the input text, perform subsequent interval segmentation according to the specified interval length d and step size overlap to obtain all intervals.

[0064] Take the legal field as an example, such as figure 2 As shown, after continuing to split, the second interval s o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a long text cascade classification method which comprises the following steps: S1, preprocessing input long text data by using a sliding window mechanism, and segmenting the long text data into a plurality of intervals; s2, performing semantic encoding on the generated text in each interval to obtain a local semantic vector of each interval; s3, carrying out keyword extraction on the interval text, encoding the interval text into keyword vectors, carrying out vector splicing on the keyword vectors and the local semantic vectors, and obtaining an overall semantic vectorof the long text; s4, carrying out dimension reduction on the overall semantic vector, and using a classifier to carry out category label probability distribution calculation on the overall semanticvector after dimension reduction; and S5, according to the steps S1-S4, training a classification model of the long text corpus. According to the technical scheme provided by the invention, the performance of the classification model at the bottom layer can be improved to promote the development of other intelligent services in the vertical field, thereby improving the user experience and viscosity of intelligent products.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, and in particular relates to a long text cascade classification method, system, equipment and storage medium. Background technique [0002] With the continuous popularization of artificial intelligence technology, the development of industries in vertical fields is changing with each passing day, and new AI products are constantly emerging. [0003] For example, products such as case retrieval, regulatory retrieval, and intelligent consulting robots in the legal vertical field require the support of intelligent semantic analysis technology. However, intelligent semantic analysis technology includes a variety of algorithm models, such as element extraction models, relationship extraction models, and complex The event extraction models of these models are based on the classification model as the underlying support. The performance of the bottom layer is closely related to the a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06F40/30
CPCG06F16/35G06F40/30
Inventor 刘广峰张卓仁
Owner 杭州识度科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products