Telecom operator mass data processing method based on Hadoop platform

A technology for massive data processing and telecommunications operators, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as ineffective storage and processing of business data, and achieve targeted marketing, improve efficiency, The effect of improving data utilization

Inactive Publication Date: 2013-12-04
NANJING UNIV OF POSTS & TELECOMM
View PDF2 Cites 39 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional relational database-based processing methods have been unable to effectively store and process growing and new types of business data. The development of Hadoop distributed technology provides technical means to solve the above problems

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Telecom operator mass data processing method based on Hadoop platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0040] The data used in this embodiment is the voice settlement list data of a certain provincial telecom operator. The source data of the experiment is the province-wide data of the voice settlement list of a certain day in the ODS warehouse, and the file size is 24GB. The Hadoop cluster used includes 4 servers, 1 Namenode and 3 Datanodes. The hardware configuration is: 8 * Quad-Core AMD Opteron(tm) Processor 2376, memory 4GB, hard disk 126GB * 2. The server operating systems are all Red Hat Enterprise Linux Server release 6.3 (Santiago), the installed version of Hadoop is 1.0.3, the version of Hive is 0.9.0, and the version of Sqoop is 1.4.3.

[0041] Table 1

[0042] relational database approach Hadoop platform approach Time to extract data (hours) 3 2 ETL time (hours) 5 1.5 Query analysis time (hours) 1 0.5 Total time (hours) 9 4

[0043] It can be seen from Table 1 that the processing efficiency of each step-based method ba...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a telecom operator mass data processing method based on a Hadoop platform. The telecom operator mass data processing method based on the Hadoop platform comprises the following steps that firstly a Sqoop tool is used for extracting data in an original data system into a Hadoop local server; then tables are created in Hive and a Hive script is written according to the data model and operation requirements of an operator data warehouse; then the Hive script is executed to convert the source data and load the data into an objective table of the Hive; finally Hive query languages or a MapReduce program is written according to requirements to carry out query analysis on the data in the objective table. According to the telecom operator mass data processing method based on the Hadoop platform, all kinds of tools are sufficiently used in the Hadoop platform on the foundation that operational requirements are met to achieve telecom operator mass data processing, and working efficiency is greatly improved.

Description

[0001] technical field [0002] The invention proposes a method for processing massive data of telecom operators based on a Hadoop platform, which belongs to the fields of computer communication and big data processing. [0003] Background technique [0004] The rapid development of the mobile Internet has led to a rapid increase in the data generated and applied by users. The emergence of massive data and changes in data structures have brought huge challenges to operators in the telecom industry to manage, analyze and process data. Traditional relational database-based processing methods have been unable to effectively store and process growing and new types of business data. The development of Hadoop distributed technology provides technical means to solve the above problems. [0005] Hadoop is an open source project managed by the Apache organization. It is a software implementation based on Google cloud computing theory Big Table, MapReduce and GFS. Hadoop enables use...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F9/44
Inventor 沈建华王翔
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products