Hadoop platform-based improved parallel KNN online public opinion classification algorithm

A classification algorithm and network public opinion technology, applied in computing, structured data retrieval, instruments, etc., can solve problems such as large amount of data and difficult classification of network public opinion data, and achieve efficient and accurate classification

Inactive Publication Date: 2018-04-20
贵州商学院
View PDF2 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the large amount of data, unstructured, and decentralized nature of Internet public opinion data, it is difficult for traditional algorithms used to process text classification to quickly and efficiently classify Internet public opinion data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hadoop platform-based improved parallel KNN online public opinion classification algorithm
  • Hadoop platform-based improved parallel KNN online public opinion classification algorithm
  • Hadoop platform-based improved parallel KNN online public opinion classification algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0028] see figure 1 , an improved parallel KNN network public opinion classification algorithm based on the Hadoop platform, including the following steps:

[0029] Step 1: Upload the test set and training set data to the HDFS cluster;

[0030] Step 2: The HDFS cluster outputs results in the form of key values ​​through the MAP function. The key value in the MAP function is the row number of the test data set, that is, the offset, and the Value represents th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Hadoop platform-based improved parallel KNN online public opinion classification algorithm. According to the algorithm, Hadoop distributed storage characters and a MapReduceprogram for designing parallel kNN are utilized to solve problems during the processing of bulk data, and test verification on classification ability and classification efficiency of a parallel kNN algorithm is carried out. Experiment results prove that the Hadoop platform-based improved parallel KNN online public opinion classification algorithm is capable of rapidly, efficiently and correctly classifying online public opinion data when being used for processing bulk online public opinion data.

Description

technical field [0001] The invention relates to the technical field of network big data computing, in particular to an improved parallel KNN network public opinion classification algorithm based on Hadoop platform. Background technique [0002] With the rapid development of mobile Internet, mobile terminals and social platforms, online media such as Weibo and blogs have gradually become an important medium for people to obtain information, and also an important channel for people to release information. Therefore, the daily data volume on the Internet is growing geometrically. increase. Internet public opinion has become an important factor affecting social development and stability. Therefore, it is of practical significance to monitor massive Internet public opinion, timely process sensitive information on the Internet, and classify, analyze, early warning, and guide information on different topics. Due to the large amount of data, unstructured, and decentralized nature o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06Q50/00
CPCG06Q50/01G06F16/285G06F16/355
Inventor 杜少波何文华杨露李静陈显祥
Owner 贵州商学院
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products