A high-resolution remote sensing image distributed processing system and method
By employing a distributed storage and computing framework and utilizing deep convolutional neural network training, the efficiency problem of high-resolution remote sensing image storage and computing was solved, achieving efficient image data processing.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HENAN UNIVERSITY OF TECHNOLOGY
- Filing Date
- 2022-08-23
- Publication Date
- 2026-06-23
AI Technical Summary
The centralized storage mode for high-resolution remote sensing images has a storage limit that cannot meet data storage needs, and the serial processing on a single machine is extremely time-consuming, which cannot meet the timeliness requirements.
A distributed storage model is adopted, in which remote sensing images are distributed and stored on different nodes. A distributed computing framework is used for parallel data processing, and deep convolutional neural network training and batch updates are used to improve processing efficiency.
It enables efficient storage and computation of remote sensing image data, improves the efficiency and adaptability of image processing, and solves the limitations of centralized storage and stand-alone processing.
Abstract
Description
Technical Field
[0001] This invention belongs to the field of remote sensing image processing technology, specifically relating to a high-resolution remote sensing image distributed processing system and method. Background Technology
[0002] The surge in high-resolution remote sensing images presents significant challenges to their storage and computation.
[0003] In terms of storage, centralized storage has a storage limit, and as the number of remote sensing images increases further, it will be unable to meet the storage needs of the image data. Conversely, distributed storage, with its expandable storage nodes, offers excellent scalability and adaptability in data storage.
[0004] In terms of computation, processing images serially on a single machine is extremely time-consuming and unsuitable for the time-sensitive requirements of remote sensing image processing. In contrast, distributed computing methods allocate computational tasks to various storage nodes, with each storage node completing a portion of the data computation, thus greatly improving the computational efficiency of image processing.
[0005] It can be seen that distributed storage and distributed computing models are suitable for the processing needs of remote sensing image data. Summary of the Invention
[0006] To address the problems existing in current technologies, this invention aims to provide a distributed processing system and method for high-resolution remote sensing images. This invention utilizes the scalability of a distributed storage model, allowing remote sensing images to be distributed and stored across different nodes. As the number of storage nodes increases, the image storage capacity also increases accordingly. By employing a corresponding distributed computing framework, the "computation is moved to the data," enabling simultaneous computation on remote sensing image data distributed across different nodes, significantly improving the processing efficiency of image data.
[0007] To achieve the above objectives, the present invention adopts the following technical solution:
[0008] A high-resolution remote sensing image distributed processing system includes a remote sensing image input file format extension module, a remote sensing image metadata management and optimization module, and a remote sensing image distributed computing module.
[0009] The remote sensing image input file format extension module is used to extend the input file format of the distributed file system for high-resolution remote sensing image data, so that the distributed file system can effectively store the remote sensing image data. After constructing GfFileName, GfFileContent, GfImageFileInputFormat and GfImageFileRecordReader, the input file format of the remote sensing image data is extended, and the remote sensing images in the distributed file system HDFS are stored in GfFileName and GfFileContent respectively as image name and image content.
[0010] The remote sensing image metadata management and optimization module uses the metadata of the remote sensing image data to establish indexes and references for the data itself. By referencing the metadata, it achieves access to the data itself, realizing the structured processing of remote sensing image information. In particular, accessing the data itself by referencing the metadata enables distributed algorithms to effectively read remote sensing image data.
[0011] The distributed computing module for remote sensing images adopts a data parallel approach, with multiple nodes jointly training a deep convolutional neural network, thereby improving the training speed and processing efficiency of remote sensing image data.
[0012] It should be noted that the training and parameter update method of the deep convolutional neural network adopts batch update; the batch update performs overall calculation on the remote sensing image before the next iteration.
[0013] The present invention also provides a method for implementing a distributed processing system for high-resolution remote sensing images, characterized in that the method includes the following steps:
[0014] S1 extends the input file format for remote sensing images;
[0015] S2 manages and optimizes the metadata of remote sensing images in the extended format;
[0016] S3 performs distributed computation on the remote sensing images processed in step S2.
[0017] It should be noted that step S1 further includes:
[0018] S1.1 GfFileName and GfFileContent are used to store the image name and image content of the remote sensing image;
[0019] S1.2 GfImageFileInputFormat inherits from FileInputFormat, which in turn implements the InputFormat interface; GfImageFileRecordReader implements the RecorderReader interface;
[0020] S1.3 Data is read using InputFormat and assigned to a Mapper for processing. Finally, the Mapper reads the key / value pairs.
[0021] S1.4 FileInputFormat enables the use of files as input data to the system;
[0022] S1.5 GfImageFileInputFormat inherits from the FileInputFormat class and includes the following member methods: configure(), isSplitable(), getSplits(), and getRecordReader(). Configure() configures relevant properties, isSplitable() determines whether to split data into blocks, getSplits() splits the data into blocks, and getRecordReader() reads the corresponding records. Furthermore, getRecordReader(), which reads records, calls the GfImageFileRecordReader() method in the GfImageFileRecordReader class.
[0023] S1.6 GfImageFileInputFormat first executes configure() for configuration operations, then uses isSplitable() to determine whether the data block needs to be split. If splitting is required, it executes the data block splitting function getSplits(); then it starts executing the getRecordReader() method to read records, which will call the constructor of the last of the four classes built earlier, namely the GfImageFileRecordReader() method of the GfImageFileRecordReader class;
[0024] The S1.7 GfImageFileRecordReader class mainly includes the constructors GfImageFileRecordReader(), createKey(), createValue(), and next(). The createKey() and createValue() methods return key-value pairs, the next() method performs continuous reading operations on the record, and then GfImageFileRecordReader() reads the remote sensing image data.
[0025] It should be noted that InputFormat is a distributed framework that includes validating the correctness of job input; using the getSplits method to split the input data into logical InputSplits, with each InputSplit assigned to a separate Mapper for processing. Logically, each InputSplit contains all the Key / Value pairs provided to a particular Mapper for processing. The getLength method in InputSplit is used to obtain the size of the InputSplit, and the getLocation method is used to return the corresponding location list. The CreateRecordReader method returns a RecordReader object, which the Mapper uses to read the Key / Value pairs in the split.
[0026] It should be noted that the management of remote sensing image metadata in step S2, which involves expanding the format, is related to this.
[0027] The master node manages the metadata, which includes three types: file directory, file block, and location information. The master node stores the metadata of all remote sensing images and folders in a file system tree in the form of namespace mirrors and modification log files, which includes file data block handles and distribution node information. At the same time, the master node provides corresponding feedback to the client's information requests. The client can perform local import, export, file deletion, directory creation, and other operations through program commands, thereby processing the remote sensing image data.
[0028] It should be noted that the optimization of the extended format remote sensing image metadata in step S2 includes:
[0029] S2.1 Remote sensing image file redundancy; multiple slave nodes are used to back up image information to improve the fault tolerance of the system; a block is stored and transferred simultaneously; if a storage node fails or malfunctions, it can be guaranteed that a block copy of the remote sensing image data can be found on another slave node, ensuring efficient access to data information and achieving fault tolerance for remote sensing image data.
[0030] S2.2 Heartbeat Detection of Remote Sensing Image Data: Slave nodes periodically send heartbeat signals to the master node to verify the current status and connectivity of the slave nodes. If no heartbeat signal is detected, it is determined that a slave node storing the remote sensing image has failed, and subsequent remote sensing data storage will not be scheduled for that slave node. Secondly, through the heartbeat detection method, the slave nodes periodically send the Block information list and summarize it in the corresponding mapping table in the master node. This facilitates the detection and retrieval of the data itself using the metadata of the remote sensing image data, thereby ensuring the integrity and availability of the data.
[0031] S2.3 Distributed file system security mode optimization and replica threshold setting. When the distributed storage system for remote sensing images starts up, the master node checks the Block information of all data in the entire system, and then determines whether the correctness coefficient of each Block is equal to or greater than the set replica threshold. Otherwise, the Block of the remote sensing image data is copied.
[0032] S2.4 Remote sensing image data integrity detection: When the distributed file system creates a data block, it calculates the checksum of the block. The checksum mainly helps to determine the integrity of the data. If the data is incomplete, the nearest corresponding data copy will be read. When storing remote sensing image data, considering load balancing, different storage strategies can be adopted according to the capacity of different slave nodes. Finally, a pipelined replication method is used for remote sensing image data. When the cache in the client reaches the block's set size, it will notify the master node of the distributed system and start replicating the remote sensing image data block according to the slave nodes provided by the master node. While the first slave node writes to the disk, it also starts sending data to the second slave node. This method reduces the time consumed in writing remote sensing image data blocks and ensures data integrity.
[0033] It should be noted that step S3 includes the Mapper performing forward propagation and backward error calculation on the remote sensing image data, solving for the local changes in model parameters, and generating a model containing...<key = w , value =Δw> The intermediate key-value pairs are then processed. The Combiner is executed to summarize the calculation results of the model parameters locally, while reducing the I / O transmission consumption of the data. Finally, the Reducer receives the output results of the Combiner, summarizes the local changes of the parameters of each node to obtain the global changes, and performs batch updates.
[0034] The beneficial effects of this invention are as follows:
[0035] 1. By leveraging the scalability of the distributed storage model, remote sensing images can be distributed and stored on different nodes. As the number of storage nodes increases, the storage capacity for images also increases accordingly.
[0036] 2. By utilizing the corresponding distributed computing framework, the computation is moved to the data, and the remote sensing image data distributed and stored on different nodes are processed simultaneously, which greatly improves the processing efficiency of image data. Detailed Implementation
[0037] The present invention will be further described below. It should be noted that the following embodiments are based on the present technical solution and provide detailed implementation methods and specific operation processes, but the protection scope of the present invention is not limited to these embodiments.
[0038] This invention relates to a high-resolution remote sensing image distributed processing system. The system includes a remote sensing image input file format extension module, a remote sensing image metadata management and optimization module, and a remote sensing image distributed computing module.
[0039] The remote sensing image input file format extension module is used to extend the input file format of the distributed file system for high-resolution remote sensing image data, so that the distributed file system can effectively store the remote sensing image data. After constructing GfFileName, GfFileContent, GfImageFileInputFormat and GfImageFileRecordReader, the input file format of the remote sensing image data is extended, and the remote sensing images in the distributed file system HDFS are stored in GfFileName and GfFileContent respectively as image name and image content.
[0040] The remote sensing image metadata management and optimization module uses the metadata of the remote sensing image data to establish indexes and references for the data itself. By referencing the metadata, it achieves access to the data itself, realizing the structured processing of remote sensing image information. In particular, accessing the data itself by referencing the metadata enables distributed algorithms to effectively read remote sensing image data.
[0041] The distributed computing module for remote sensing images adopts a data parallel approach, with multiple nodes jointly training a deep convolutional neural network, thereby improving the training speed and processing efficiency of remote sensing image data.
[0042] Furthermore, the training and parameter update method of the deep convolutional neural network of the present invention adopts batch update; the batch update performs overall calculation on the remote sensing image before the next iteration.
[0043] The present invention also provides a method for implementing a distributed processing system for high-resolution remote sensing images, characterized in that the method includes the following steps:
[0044] S1 extends the input file format for remote sensing images;
[0045] S2 manages and optimizes the metadata of remote sensing images in the extended format;
[0046] S3 performs distributed computation on the remote sensing images processed in step S2.
[0047] Furthermore, step S1 of the present invention further includes:
[0048] S1.1 GfFileName and GfFileContent are used to store the image name and image content of the remote sensing image;
[0049] S1.2 GfImageFileInputFormat inherits from FileInputFormat, which in turn implements the InputFormat interface; GfImageFileRecordReader implements the RecorderReader interface;
[0050] S1.3 Data is read using InputFormat and assigned to a Mapper for processing. Finally, the Mapper reads the key / value pairs.
[0051] S1.4 FileInputFormat enables the use of files as input data to the system;
[0052] S1.5 GfImageFileInputFormat inherits from the FileInputFormat class and includes the following member methods: configure(), isSplitable(), getSplits(), and getRecordReader(). Configure() configures relevant properties, isSplitable() determines whether to split data into blocks, getSplits() splits the data into blocks, and getRecordReader() reads the corresponding records. Furthermore, getRecordReader(), which reads records, calls the GfImageFileRecordReader() method in the GfImageFileRecordReader class.
[0053] S1.6 GfImageFileInputFormat first executes configure() for configuration operations, then uses isSplitable() to determine whether the data block needs to be split. If splitting is required, it executes the data block splitting function getSplits(); then it starts executing the getRecordReader() method to read records, which will call the constructor of the last of the four classes built earlier, namely the GfImageFileRecordReader() method of the GfImageFileRecordReader class;
[0054] The S1.7 GfImageFileRecordReader class mainly includes the constructors GfImageFileRecordReader(), createKey(), createValue(), and next(). The createKey() and createValue() methods return key-value pairs, the next() method performs continuous reading operations on the record, and then GfImageFileRecordReader() reads the remote sensing image data.
[0055] Furthermore, in this invention, the InputFormat distributed framework includes verifying the correctness of the job input; using the getSplits method to split the input data into logical InputSplits, with each InputSplit assigned to a separate Mapper for processing. Logically, each InputSplit contains all the Key / Value pairs provided to a particular Mapper for processing. The getLength method in InputSplit is used to obtain the size of the InputSplit, and the getLocation method is used to return a list of corresponding locations. The CreateRecordReader method returns a RecordReader object, which the Mapper uses to read the Key / Value pairs from the split.
[0056] Furthermore, in step S2 of the present invention, the management of remote sensing image metadata in the extended format...
[0057] The master node manages the metadata, which includes three types: file directory, file block, and location information. The master node stores the metadata of all remote sensing images and folders in a file system tree in the form of namespace mirrors and modification log files, which includes file data block handles and distribution node information. At the same time, the master node provides corresponding feedback to the client's information requests. The client can perform local import, export, file deletion, directory creation, and other operations through program commands, thereby processing the remote sensing image data.
[0058] Furthermore, the optimization of the extended format remote sensing image metadata in step S2 of the present invention includes:
[0059] S2.1 Remote sensing image file redundancy; multiple slave nodes are used to back up image information to improve the fault tolerance of the system; a block is stored and transferred simultaneously; if a storage node fails or malfunctions, it can be guaranteed that a block copy of the remote sensing image data can be found on another slave node, ensuring efficient access to data information and achieving fault tolerance for remote sensing image data.
[0060] S2.2 Heartbeat Detection of Remote Sensing Image Data: Slave nodes periodically send heartbeat signals to the master node to verify the current status and connectivity of the slave nodes. If no heartbeat signal is detected, it is determined that a slave node storing the remote sensing image has failed, and subsequent remote sensing data storage will not be scheduled for that slave node. Secondly, through the heartbeat detection method, the slave nodes periodically send the Block information list and summarize it in the corresponding mapping table in the master node. This facilitates the detection and retrieval of the data itself using the metadata of the remote sensing image data, thereby ensuring the integrity and availability of the data.
[0061] S2.3 Distributed file system security mode optimization and replica threshold setting. When the distributed storage system for remote sensing images starts up, the master node checks the Block information of all data in the entire system, and then determines whether the correctness coefficient of each Block is equal to or greater than the set replica threshold. Otherwise, the Block of the remote sensing image data is copied.
[0062] S2.4 Remote sensing image data integrity detection: When the distributed file system creates a data block, it calculates the checksum of the block. The checksum mainly helps to determine the integrity of the data. If the data is incomplete, the nearest corresponding data copy will be read. When storing remote sensing image data, considering load balancing, different storage strategies can be adopted according to the capacity of different slave nodes. Finally, a pipelined replication method is used for remote sensing image data. When the cache in the client reaches the block's set size, it will notify the master node of the distributed system and start replicating the remote sensing image data block according to the slave nodes provided by the master node. While the first slave node writes to the disk, it also starts sending data to the second slave node. This method reduces the time consumed in writing remote sensing image data blocks and ensures data integrity.
[0063] Furthermore, step S3 of the present invention includes the Mapper performing forward propagation and backward error calculation on the remote sensing image data, solving for the local changes in model parameters, and generating a model containing...<key = w , value =Δw> The intermediate key-value pairs are then processed. The Combiner then summarizes the calculation results of the model parameters locally, while reducing the I / O transmission cost of the data. Finally, the Reducer accepts the output of the Combiner and localizes the parameters of each node.
[0064] Example
[0065] 1. Remote sensing image input file format extension
[0066] For high-resolution remote sensing image data, the input file format of the distributed file system needs to be extended accordingly to enable the distributed file system to effectively store remote sensing image data. For the HDFS distributed file system, based on the TextInputFormat and SequenceFileInputFormat functions of the distributed file system, GfImageFileInputFormat and GfImageFileRecordReader were built and implemented to extend the input file format of remote sensing image data. The remote sensing images in HDFS are stored in GfFileName and GfFileContent respectively, using the image name and image content as the storage locations.
[0067] To enable the distributed file system to support the storage of remote sensing image data, it is necessary to utilize the GfFileName, GfFileContent, GfImageFileInputFormat, and GfImageFileRecordReader constructed above. These will be analyzed below:
[0068] First, GfFileName and GfFileContent are used to store the image name and image content of remote sensing images.
[0069] Second, GfImageFileInputFormat inherits from FileInputFormat, which in turn implements the InputFormat interface. GfImageFileRecordReader implements the RecorderReader interface.
[0070] Third, InputFormat is used to read data and assigns it to a Mapper for processing. Finally, the Mapper reads the key / value pairs. InputFormat provides the following functions for the distributed framework: 1) Validating the correctness of job input; 2) Using the getSplits method to split the input data into logical InputSplits. Each InputSplit is assigned to a separate Mapper for processing. Logically, each InputSplit contains all the key / value pairs provided to a particular Mapper for processing. The getLength method in InputSplit is used to get the size of the InputSplit, and the getLocation method is used to return the corresponding location list; 3) Using the CreateRecordReader method to return a RecordReader object, which the Mapper uses to read the key / value pairs in the split.
[0071] Fourth, FileInputFormat allows files to be used as input data to the system. Therefore, if you need to pass files to a distributed processing framework, you need to inherit the FileInputFormat class.
[0072] Fifth, `GfImageFileInputFormat` inherits from the `FileInputFormat` class and includes the following member methods: `configure()`, `isSplitable()`, `getSplits()`, and `getRecordReader()`. `configure()` configures relevant properties, `isSplitable()` determines whether to split the data into blocks, `getSplits()` splits the data into blocks, and `getRecordReader()` reads the corresponding records. Furthermore, `getRecordReader()`, which reads records, calls the `GfImageFileRecordReader()` method within the `GfImageFileRecordReader` class.
[0073] Sixth, GfImageFileInputFormat first executes configure() for configuration, then uses isSplitable() to determine if the data block needs to be split. If splitting is required, it executes the data block splitting function getSplits(). Then it starts executing the getRecordReader() method to read records, which will call the constructor of the last of the four classes built earlier, namely the GfImageFileRecordReader class's GfImageFileRecordReader() method.
[0074] Seventh, the GfImageFileRecordReader class mainly includes the constructors GfImageFileRecordReader(), createKey(), createValue(), and next(). The createKey() and createValue() methods return key-value pairs, the next() method performs continuous reading operations on the record, and then GfImageFileRecordReader() reads the remote sensing image data.
[0075] By extending the input file format of the distributed file system as described above, support for importing and storing remote sensing image data by the distributed file system is realized.
[0076] 2. Remote sensing image metadata management and optimization
[0077] By utilizing the metadata of remote sensing image data to establish indexes and references for the data itself, and by referencing the metadata, access to the data itself is achieved, thus enabling structured processing of remote sensing image information. Simultaneously, accessing the data itself through metadata references allows distributed algorithms to efficiently read remote sensing image data.
[0078] The metadata of remote sensing image data is managed by the master node. Metadata mainly includes three types: file directories, file blocks, and location information. By referencing the metadata, the remote sensing image data itself can be accessed, greatly facilitating subsequent data computation. The master node stores the metadata of all remote sensing images and folders in a file system tree in the form of namespace mirrors and modification log files, which includes information such as file data block handles and distribution nodes. Simultaneously, the master node provides corresponding responses to client information requests. Clients can process the remote sensing image data by performing local import, export, file deletion, directory creation, and other operations through program commands.
[0079] In a distributed storage file system for remote sensing images, the master node can manage multiple slave nodes. The master node's main functions include storing remote sensing image metadata, filenames, and directory information for individual remote sensing image files. The master node can receive heartbeat messages from slave nodes to confirm their operational status. Slave nodes are responsible for storing and backing up remote sensing image data.
[0080] To ensure the effective storage and management of remote sensing images and their metadata, the system can be optimized accordingly:
[0081] 1) Remote sensing image file redundancy. Multiple slave nodes are used to back up image information to improve system fault tolerance. A block is stored and then transferred / copied simultaneously. If a storage node fails or malfunctions, a copy of the remote sensing image data block can be found on another slave node. This ensures both efficient access to data and fault tolerance for remote sensing image data.
[0082] 2) Heartbeat Detection of Remote Sensing Image Data. Slave nodes periodically send heartbeat signals to the master node to verify their current status and connectivity. If no heartbeat signal is detected, it is assumed that a slave node storing the remote sensing image data has failed, and subsequent remote sensing data storage will not be scheduled for that slave node. Furthermore, the heartbeat detection method ensures that slave nodes periodically send and aggregate their block information lists into the corresponding mapping table in the master node. This facilitates the detection and retrieval of the data itself using the metadata of the remote sensing image data, thereby ensuring data integrity and availability.
[0083] 3) Distributed file system security mode optimization and replica threshold setting. When the distributed storage system for remote sensing images starts up, the master node checks the Block information of all data in the entire system, and then determines whether the correctness coefficient of each Block is equal to or greater than the set replica threshold. Otherwise, the Block of the remote sensing image data is copied.
[0084] 4) Remote sensing image data integrity detection: When the distributed file system creates a data block, it calculates the checksum of that block. The checksum primarily helps determine the integrity of the data. If the data is incomplete, the nearest corresponding data copy is read. During remote sensing image data storage, considering load balancing, different storage strategies can be adopted based on the capacity of different slave nodes. Finally, a pipelined replication method is used for remote sensing image data. Once the cache in the client reaches the set block size, it notifies the master node of the distributed system and starts replicating the remote sensing image data block according to the slave nodes provided by the master node. While the first slave node writes to disk, it also begins sending data to the second slave node. This method reduces the time consumed in writing the remote sensing image data block while ensuring data integrity.
[0085] Through the above-mentioned metadata management and optimization of remote sensing images, the reliability, adaptability and robustness of the distributed storage system are ensured. Through the indexing and referencing of metadata, the remote sensing image data itself can be accessed, which greatly improves the processing efficiency of remote sensing images.
[0086] 3. Distributed computing of remote sensing images
[0087] A data-parallel approach is adopted, with multiple nodes collaboratively training a deep convolutional neural network, thereby improving the training speed and processing efficiency of remote sensing image data. The training and parameter updates of the deep convolutional network employ batch updates. Batch updates perform overall computation on the remote sensing image before proceeding to the next iteration. The order of input does not affect the network's training, which also provides conditions for distributed computation of remote sensing image data.
[0088] Distributed training of deep convolutional networks is performed, with each computing node storing an identical and complete deep neural network. Each node optimizes and trains the deep neural network model based on the corresponding data, and then summarizes and updates the data.
[0089] First, the Mapper performs forward propagation and reverse error calculation on the remote sensing image data, solves for the local change amount of the model parameters, and generates intermediate key-value pairs such as <key = w, value = Δw>. Then, the Combiner is executed to summarize the calculation results of the model parameters locally while reducing the I / O transmission consumption of the data. Finally, the Reducer accepts the output result of the Combiner, aggregates the local change amounts of the parameters of each node to obtain the global change amount, and performs batch updates.
[0090] The distributed training method of the deep neural network of the present invention is as follows:
[0091] Mapper process:
[0092] 1. Initialize the network parameters and read them.
[0093] 2. Parse the <remote sensing image target category, sample value> key-value pair.
[0094] 3. Solve for the local change amount of the network parameters through forward parameter calculation and reverse error calculation.
[0095] 4. Output the <key = w, value = Δw>, <key = b, value = Δb> key-value pairs.
[0096] Combiner process:
[0097] The Combiner performs a merging operation on the output result of the map side to reduce the data network transmission bandwidth consumption and improve the data processing efficiency.
[0098] Reducer process:
[0099] 1. Input <key = w, value = Δw>, <key = b, value = Δb>.
[0100] 2. Traverse the input local gradient change amount to obtain the global gradient change amount.
[0101] 3. Output <key = w, value = global gradient change amount>.
[0102] 4. Use the Reducer result to perform batch updates on the parameters of the deep convolutional neural network model.
[0103] For those skilled in the art, various corresponding changes and deformations can be given according to the above technical solutions and concepts, and all such changes and deformations should be included within the protection scope of the claims of the present invention.
Claims
1. A high-resolution remote sensing image distributed processing system, characterized in that, The system includes a remote sensing image input file format extension module, a remote sensing image metadata management and optimization module, and a distributed computing module for remote sensing images, wherein: The remote sensing image input file format extension module is used to extend the input file format of the distributed file system for high-resolution remote sensing image data, so that the distributed file system can effectively store the remote sensing image data. After constructing GfFileName, GfFileContent, GfImageFileInputFormat and GfImageFileRecordReader, the input file format of the remote sensing image data is extended, and the remote sensing images in the distributed file system HDFS are stored in GfFileName and GfFileContent respectively as image name and image content. GfFileName and GfFileContent are used to store the image name and image content of remote sensing images; GfImageFileInputFormat inherits from FileInputFormat, which in turn implements the InputFormat interface; GfImageFileRecordReader implements the RecorderReader interface. The remote sensing image metadata management and optimization module uses the metadata of the remote sensing image data to establish indexes and references for the data itself. By referencing the metadata, it achieves access to the data itself, realizing the structured processing of remote sensing image information. In particular, by referencing the metadata to access the data itself, the distributed algorithm can effectively read the remote sensing image data. The distributed computing module for remote sensing images adopts a data parallel approach, with multiple nodes jointly training a deep convolutional neural network, thereby improving the training speed and processing efficiency of remote sensing image data.
2. The high-resolution remote sensing image distributed processing system according to claim 1, characterized in that, The training and parameter update method of the deep convolutional neural network adopts batch update; the batch update performs overall calculation on the remote sensing image before the next round of iteration.
3. A method for implementing the high-resolution remote sensing image distributed processing system as described in claim 1, characterized in that, The method includes the following steps: S1 extends the input file format for remote sensing images; S2 manages and optimizes the metadata of remote sensing images in the extended format; S3 performs distributed computation on the remote sensing images processed in step S2.
4. The method of the high-resolution remote sensing image distributed processing system according to claim 3, characterized in that, Step S1 further includes: S1.1 Data is read using InputFormat and assigned to a Mapper for processing. Finally, the Mapper reads the key / value pairs. S1.2 FileInputFormat enables the use of files as input data to the system; The `GfImageFileInputFormat` class (S1.3) contains member methods: `configure()`, `isSplitable()`, `getSplits()`, and `getRecordReader()`. `configure()` configures relevant properties, `isSplitable()` determines whether data splitting is required, `getSplits()` splits the data into blocks, and `getRecordReader()` reads the corresponding records. Furthermore, `getRecordReader()`, which reads records, calls the `GfImageFileRecordReader()` method within the `GfImageFileRecordReader` class. S1.4 GfImageFileInputFormat first executes configure() for configuration operations, then uses isSplitable() to determine whether the data block needs to be split. If splitting is required, the data block splitting function getSplits() is executed; then the getRecordReader() method for reading records is executed, which will call the constructor of the last of the four classes constructed earlier, namely the GfImageFileRecordReader class's GfImageFileRecordReader() method; The S1.5 GfImageFileRecordReader class contains constructors: GfImageFileRecordReader(), createKey(), createValue(), and next(). The createKey() and createValue() methods are used to return key-value pairs, the next() method implements continuous reading operations on the record, and then GfImageFileRecordReader() performs the work of reading remote sensing image data.
5. The method of the high-resolution remote sensing image distributed processing system according to claim 4, characterized in that, InputFormat is a distributed framework that includes validating the correctness of job input; splitting input data into logical InputSplits using the getSplits method, with each InputSplit assigned to a separate Mapper for processing. Logically, each InputSplit contains all the Key / Value pairs available for processing by a given Mapper. The getLength method of InputSplit is used to obtain the size of the InputSplit, and the getLocation method is used to return a list of corresponding locations. The CreateRecordReader method returns a RecordReader object, which the Mapper uses to read the Key / Value pairs from the split.
6. The method of the high-resolution remote sensing image distributed processing system according to claim 3, characterized in that, The management of extended format remote sensing image metadata in step S2 The master node manages the metadata, which includes three types: file directory, file block, and location information. The master node stores the metadata of all remote sensing images and folders in a file system tree in the form of namespace mirrors and modification log files, which includes file data block handles and distribution node information. At the same time, the master node provides corresponding feedback to the client's information requests. The client processes the remote sensing image data by performing local import, export, file deletion, and directory creation operations through program instructions.
7. The method of the high-resolution remote sensing image distributed processing system according to claim 3, characterized in that, The optimization of the extended format remote sensing image metadata in step S2 includes: S2.1 Remote sensing image file redundancy; multiple slave nodes are used to back up image information to improve the fault tolerance of the system; a block is stored and transferred simultaneously; if a storage node fails or malfunctions, it is guaranteed that a block copy of the remote sensing image data can be found on another slave node, ensuring efficient access to data information and achieving fault tolerance for remote sensing image data. S2.2 Heartbeat Detection of Remote Sensing Image Data: Slave nodes periodically send heartbeat signals to the master node to verify the current status and connectivity of the slave nodes. If no heartbeat signal is detected, it is determined that a slave node storing the remote sensing image has failed, and subsequent remote sensing data storage will not be scheduled for that slave node. Secondly, through the heartbeat detection method, the slave nodes periodically send the Block information list and summarize it in the corresponding mapping table in the master node. This facilitates the detection and retrieval of the data itself using the metadata of the remote sensing image data, thereby ensuring the integrity and availability of the data. S2.3 Distributed file system security mode optimization and replica threshold setting: When the distributed storage system for remote sensing images is started, the master node checks the Block information of all data in the entire system, and then determines whether the correctness coefficient of each Block is equal to or greater than the set replica threshold. Otherwise, the Block of the remote sensing image data is copied. S2.4 Remote sensing image data integrity detection: When the distributed file system creates a data block, it calculates the checksum of the block. The checksum mainly helps to determine the integrity of the data. If the data is incomplete, the nearest corresponding data copy will be read. When storing remote sensing image data, considering load balancing, different storage strategies are adopted according to the capacity of different slave nodes. Finally, a pipelined replication method is used for remote sensing image data. When the cache in the client reaches the block's set size, it will notify the master node of the distributed system and start replicating the remote sensing image data block according to the slave nodes provided by the master node. While the first slave node writes to the disk, it also starts sending data to the second slave node. This method reduces the time consumed in writing remote sensing image data blocks and ensures data integrity.
8. The method of the high-resolution remote sensing image distributed processing system according to claim 3, characterized in that, Step S3 includes the Mapper performing forward propagation and backward error calculations on the remote sensing image data, solving for the local changes in model parameters, and generating a dataset including...<key = w , value =Δw> The intermediate key-value pairs are then processed. The Combiner is executed to summarize the calculation results of the model parameters locally, while reducing the I / O transmission consumption of the data. Finally, the Reducer receives the output results of the Combiner, summarizes the local changes of the parameters of each node to obtain the global changes, and performs batch updates.