Spark platform based high efficiency text classification method

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A text classification and high-efficiency technology, applied in the field of big data processing, can solve the problems of not being able to use PCs, low resource utilization, and increased network transmission, and achieve the goal of improving cluster resource utilization, promoting improvement, and improving accuracy Effect

Inactive Publication Date: 2016-07-06

HUNAN UNIV

View PDF4 Cites 48 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0027] At present, most of the machine learning algorithms are still serial. When the amount of data is not large, serial can be used; but with the advent of cloud computing and the era of big data, the data is growing exponentially, and the traditional serial algorithm obviously cannot meet the requirements. Processing requirements, and the previous grid computing and parallel computing resources utilization rate is not high, resulting in high cost, and requires a dedicated server, ordinary PC can not be used, although Hadoop can meet part of the big data processing, but It realizes the function through the map function and the reduce function, and the communication between the map function and the reduce function is through the HDFS file system

In this way, the number of times Hadoop reads and writes the HDFS file system increases, resulting in an increase in network transmission.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0049] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0050] see figure 1 , the high-efficiency text classification method based on Spark platform in the present embodiment comprises the following steps:

[0051] Step 101: Construct the HDFS file system and the Spark platform with the virtual machine on the physical server, and upload the data set to the HDFS file system.

[0052] Step 102: Submit jobs to the Spark cluster through the client, Spark reads data from the HDFS file system, converts the input data into a resilient distributed dataset (RDD) and starts a certain number of partitions according to the number of partitions in the RDD set by th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention provides a Spark based high efficiency text classification method. The method comprises: constructing an HDFS file system with a virtual machine and a Spark platform on a physical server, and uploading a data set into the HDFS file system; enabling the Spark platform to read data from the HDFS file system, and converting the data into RDD and storing the RDD into a memory; dividing all tasks into different stages, and then running each task; preprocessing the RDD; performing training; and testing a classification model. The method provided by the present invention makes up the defects of a naive Bayes model and further improves the processing speed; the method also effectively promotes data mining and machine learning and promotes conversion from a conventional data mining algorithm to a parallel data mining algorithm; the method improves classification precision of improving the Bayes algorithm; the method promotes improvement of a Spark platform based algorithm; and finally, the method improves cluster resource utilization.

Description

technical field [0001] The invention relates to the technical field of big data processing, in particular to a high-efficiency text classification method based on the Spark platform. Background technique [0002] With the rapid development of information technology and the gradual widespread use of the Internet, the Internet has now become the most important source of information. Especially with the advent of the era of cloud computing and big data, the data on the Internet is growing exponentially. They have the following characteristics: large amount of data, high dimensionality, complex and irregular structure, and contain a lot of noise data, but they contain a lot of commercial value. Facing such huge and complex information, how to quickly organize, manage, utilize, and dig out valuable information is some very important challenges. [0003] Most of the data today is stored on the Internet in the form of text. Text classification technology is an important basis fo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

CPCG06F16/35

Inventor唐卓鲁彬李肯立李巧巧陈建国熊燎特

OwnerHUNAN UNIV

Spark platform based high efficiency text classification method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology