Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Unsupervised classification method for long texts

A classification method and long text technology, applied in the field of network information, can solve the problems of long text, less classification of long text, and high time complexity

Pending Publication Date: 2021-09-10
深圳市查策网络信息技术有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Since the long text has the characteristics of long text, text structure and word habits are different from ordinary text, the time complexity of classification is high, and the classification accuracy is low
At present, there are relatively few studies on long text classification.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unsupervised classification method for long texts

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

[0028] An unsupervised classification method for long texts, comprising the following steps:

[0029] (1) Due to the length of the long text, the text structure is mainly composed of three parts: title, text and issuing department. Before classifying the long text to be classified, it is necessary to filter the long text to be classified first, and extract the title text in the long text to be classified t 1 , body text t 2 and the text of the issuing department 3 three parts.

[0030] (2) Then extract the title text t 1 , body text t 2 and the text of the issuing department 3 The weight coefficient c of the three parts 1 、c 2 、c 3 . The weight coefficient is used to indicate the importance index of each text in the text to be measured.

[0031] The preferred ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an unsupervised classification method for long texts, which comprises the following steps of: filtering a to-be-classified long text, and extracting three parts, namely a title text, a body text and a text issuing department text, in the to-be-classified long text; extracting weight coefficients of the title text, the body text and the issuing department text; according to the extracted weight coefficient, fusing the title text, the body text and the text issuing department text into a new long text T; performing Chinese word segmentation on the new long text T, and extracting word segmentation information; inputting the word segmentation information into a word vector model to obtain word vector information; calculating a feature vector of the long text T according to the word vector information; and clustering the feature vectors of the long text T to obtain text classification. According to the invention, the long text classification method is improved, the time complexity of long text classification is reduced, the accuracy of long text classification is improved, and a user can read and classify long texts more conveniently.

Description

technical field [0001] The invention relates to the technical field of network information, in particular to an unsupervised classification method for long texts. Background technique [0002] In order to support the better development of enterprises, the country and the government have issued various texts needed by developing enterprises. Under the guidance of relevant texts, enterprises can more directly and accurately understand the government's guidance, and also understand the market to a large extent, so as to produce products that are more in line with market demand. The text of government preferential support involves various aspects such as departmental investment promotion, tax reduction and exemption, financing support, optimization of the environment, employment of talents, etc., directly or indirectly promotes the healthy development of enterprises. Various texts of the state and the government have become an important basis for the formulation of enterprise d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06F40/284G06F40/30G06N3/08
CPCG06F40/284G06F40/30G06N3/088G06F18/23G06F18/22G06F18/24
Inventor 林正春兰林陈功文
Owner 深圳市查策网络信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products