Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Unsupervised news automatic classification method

An automatic classification and unsupervised technology, applied in the field of information classification, can solve problems such as low efficiency and manpower loss, and achieve the effect of reducing manpower burden, speeding up speed and strengthening professionalism

Active Publication Date: 2020-09-04
王旭
View PDF4 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to overcome the problems of manpower loss and inefficiency caused by manual word segmentation in the above-mentioned prior art, the present invention provides an unsupervised automatic news classification method, training and testing data do not need manual word segmentation or classification, improving the efficiency of classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unsupervised news automatic classification method
  • Unsupervised news automatic classification method
  • Unsupervised news automatic classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0063] like figure 1 Shown is an embodiment of an unsupervised automatic news classification method, comprising the following steps:

[0064] Step 1: Use the simhash method to check the obtained news for plausibility; simhash is an algorithm for comparing the similarity of articles with the main idea of ​​dimensionality reduction, and the output result is the simhash value.

[0065] Step 2: Generate news vocabulary vector table (wordvec) through word2vec; word2vec is a neural network used to map vocabulary into word vector wordvec.

[0066] Step 3: Calculate the term frequency-inverse text frequency index value (TF-IDF, term frequency–inverse document frequency) of the vocabulary in the news, and obtain the weighted average sum of the first k key words according to the vocabulary vector table to obtain the news document Vector table; document vector table (docvec) is a commonly used weighting technology for information retrieval and data mining, and is used to evaluate the im...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the field of information classification, in particular to an unsupervised news automatic classification method, which comprises the following steps of: 1, duplicating acquirednews through a simhash method; 2, generating a vocabulary vector table from the news through word2vec; 3, calculating a word frequency-inverse text frequency index value of vocabularies in the news,and solving a weighted average sum of the first k key vocabularies according to the vocabulary vector table to obtain a document vector table of the news; 4, calculating a classification model for each type of news through logistic regression; and 5, calculating a document vector table in the unclassified news library, and calculating the probability that the text in the news library belongs to acertain classification through the classification model in the step 4. According to the classification method, an unsupervised learning mode is adopted in the training process, human meat marking is not needed, the speed is increased, and the manpower burden is reduced; after the model is trained, the text classification calculation time is short, and a large number of text classification requirements can be met.

Description

technical field [0001] The invention relates to the field of information classification, and more specifically, relates to an unsupervised automatic news classification method. Background technique [0002] News, also called news, is a title of information disseminated through newspapers, radio stations, radio, television and other media channels, and it is a style of dissemination of information. However, in the face of the explosive and disorganized news data, it is more difficult for users to obtain useful information year-on-year, and it is difficult to quickly obtain news of the categories they care about. Therefore, there is an urgent need to classify massive news. [0003] The patent publication "A Chinese Financial News Text Classification Method Based on Convolutional Neural Network" with the application number "CN108399230A" discloses a news text classification method, which mainly includes word vector training, text preprocessing , neural network model training,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06F40/279G06F40/289G06N3/04
CPCG06F16/35G06F40/279G06F40/289G06N3/04Y02D10/00
Inventor 王旭汪金生宋日辉张雷张旭东张恒非钟丹彬季育轩岳毅然谭震超
Owner 王旭
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products