Parallel clustering method for processing large geographical grid data

A technology of raster data and clustering method, applied in geographic information database, electronic digital data processing, structured data retrieval, etc., can solve the problem of system inoperability and operating efficiency, and achieve the effect of small communication cost

Inactive Publication Date: 2015-11-11
CHANGCHUN INST OF TECH
View PDF7 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is to solve the problem that the geographical grid is very large, exceeding the loading limit of a single machine, causing the system to be unable to operate or the operating efficiency is very low, and the mutual exchange of calculation results of each process in the process of multi-computer parallel clustering, and proposed A Parallel Clustering Method for Large Geographic Raster Data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel clustering method for processing large geographical grid data
  • Parallel clustering method for processing large geographical grid data
  • Parallel clustering method for processing large geographical grid data

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0048] Specific implementation mode one: combine figure 1 A parallel clustering method for processing large geographic grid data in this embodiment is specifically prepared according to the following steps:

[0049] Step 1. On the computer cluster, use the management node to start the management process. The management process calculates the number of computing nodes participating in the calculation according to the large geographic raster data volume, and starts the computing process on each computing node, and at the same time numbers each computing process ; Among them, a computer cluster includes 5 to 100 computers connected to the Internet, choose a computer in the computer cluster to act as a management node, and other nodes in the computer cluster except the management node serve as computing nodes; large-scale geographic raster data The amount of data is greater than 1000M;

[0050] Step 2: The management process reads the large geographic grid data row by row, loads...

specific Embodiment approach 2

[0072] Specific embodiment 2: The difference between this embodiment and specific embodiment 1 is that in step 1, on the computer cluster, the management node is used to start the management process, and the management process calculates the number of computing nodes participating in the calculation according to the amount of large-scale geographic raster data, and Start the calculation process on each calculation node, and number specific steps for each calculation process (the processing flow of this step is as follows figure 2 shown) as follows:

[0073] (1) The management node starts the management process;

[0074] (2) The management process reads the row number RowNum, the column number ColNum, and the total file size SumSize of the large-scale geographic raster data to be clustered; the calculation method of the size RowSize of each row of the large-scale geographic raster data to be clustered is:

[0075] RowSize=SumSize / ColNum;

[0076] (3) The management process c...

specific Embodiment approach 3

[0083] Specific embodiment three: the difference between this embodiment and specific embodiment one or two is that in step two, the management process reads the large-scale geographic grid data line by line, and loads the entire large-scale geographic grid data into N computing processes in a distributed manner. Each row of large-scale geographic raster data is sent to the calculation process with the corresponding number as ID. The specific steps are as follows (such as image 3 ):

[0084] (1) variable counter=0 in the setting management process;

[0085] (2) counter=counter+1;

[0086] (3) The management process reads the data in the counter row in the large geographic raster data, and sends the data in the counter row in the large geographic raster data to the calculation process whose number is ID corresponding to the data in the counter row. The calculation process is stored in the counter row of the large geographic raster data in the memory space of the calculation ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a parallel clustering method, in particular to a parallel clustering method for processing large geographical grid data and belongs to the field of parallel clustering for processing the large geographical grid data. The problems that because geographical grids are quite enormous and exceed a stand-alone loading limit, a system can not operate or operation efficiency is quite low, and all progress computing results interchange mutually when multiple computers perform parallel clustering are solved. The method includes the steps that step1, the number of nodes is calculated, and each computing progress is numbered; step2, the data are transmitted to computing progresses, and the numbers of the computing progresses are IDs; step3, M sets of clustering solution initial values are generated; step4, a clustering vector central table is sent to the computing progresses; step5, a management progress controls an iteration solution progress; step6, content of the row, with the highest value, of a field 4 is sent to all the computing progresses; step7, grid clustering results are written into a geographical grid data file. In this way, the parallel clustering method for processing the large geographical grid data is implemented through the steps.

Description

technical field [0001] The invention relates to a parallel clustering method, in particular to a parallel clustering method for processing large geographic grid data. Background technique [0002] In the field of geographic information system technology, geographic raster data is an important data type. In geographic raster data, each grid records the spatial, social, economic, and environmental attributes of a surface area, which provides a data basis for describing the information on the surface. [0003] The clustering of geographic raster data is a method to automatically divide the categories of raster data according to the numerical distribution of the raster position and attribute characteristics without inputting known samples in advance. Through clustering, people can divide the geographic raster data into homogeneous or approximate areas without any additional data input, and obtain more general knowledge of an area, which can then be used for vector map drawing, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/29
Inventor 潘欣赵健孙宏彬任斌徐宏年
Owner CHANGCHUN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products