Analysis method, system and electronic device for vector data
By obtaining the fixed-level grid index and morphological parameters of vector data, calculating the optimal grid index level, and performing correlation calculations and deduplication, the problems of computational resource consumption and low efficiency in distributed overlay analysis of vector data are solved, achieving more efficient calculation and data storage.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- GUANGDONG SOUTH DIGITAL TECH
- Filing Date
- 2023-04-06
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies suffer from excessive computational resource consumption and low execution efficiency when performing distributed overlay analysis of vector data at different scales. In particular, when the grid index level is inappropriate, it leads to an unbalanced load among computing nodes, affecting overall computing performance and data storage efficiency.
By obtaining the fixed-level grid index and morphological parameters of the vector data to be analyzed, the scale coefficient and the optimal grid index level are calculated. Correlation calculations are performed using the calculated grid index, and cross-grid data is processed using a preset data deduplication strategy. Finally, the data is overlaid to obtain the analysis results.
It achieves load-balanced distributed computing, improves the computing performance of vector data at different scales, shortens the computing time, ensures the efficiency of data entry, and avoids additional pressure on storage and preprocessing.
Smart Images

Figure CN116383276B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data analysis technology, and in particular to a method, system and electronic device for analyzing vector data. Background Technology
[0002] Traditional Geohash uses a grid index to store vector data. During distributed overlay computation, the data is uniformly divided according to a fixed scale based on its range. The data blocks are then overlaid and computed within the computation nodes according to the relationships between the grid indexes. The computation results from each node are deduplicated and merged to obtain the final result.
[0003] Building upon traditional methods, existing technologies also employ multi-level grid indexes for data preprocessing. During the distributed storage phase of vector data, multiple levels of grid indexes are established. In distributed overlay analysis, the grid indexes are selected based on the higher-level data in the distribution attributes of the vector patches, and then grid segmentation and overlay analysis are performed using traditional methods.
[0004] Traditional technical solutions do not consider the differences in distribution and density of different vector data. When performing overlay analysis of two vector layers, there will inevitably be data imbalance during grid division. Since the data is evenly distributed among the distributed computing nodes according to the number of grids during the calculation, the uneven data between the grids will lead to an uneven computational load between the nodes during the calculation process. The slow calculation of some nodes will lead to the barrel effect, which will reduce the overall computing performance and ultimately result in low computing performance.
[0005] In big data environments, using multi-level grid indexes for data preprocessing during vector data storage can, to some extent, solve the problem of superimposed computation of vector data at different scales. During computation, using appropriate grid indexes to partition the data can achieve relative load balancing. However, the degree of load balancing is affected by the level of the multi-level grid index. If the grid index level is too small, data will accumulate in various grids, resulting in low parallelism during computation, inability to fully utilize cluster resources, high computational load for each subtask, and long computation time. Calculations using higher-level grid indexes increase the computational load during data ingestion and consume more storage space, reducing write efficiency. During computation, higher-level grids lead to more computational subtasks. Although each subtask has a smaller and faster computational load, the startup, execution, and destruction of a single computational task increase the performance cost of task scheduling, also slowing down the overall computation.
[0006] In summary, existing technologies still suffer from low execution efficiency and excessive computational resource consumption when performing distributed overlay analysis of vector data at different scales. Summary of the Invention
[0007] In view of this, the purpose of this invention is to provide a method, system, and electronic device for analyzing vector data. This method can create a more balanced distributed computing load, thereby greatly improving the distributed computing performance of vector data at different scales, making full use of computing resources, shortening computing time, and, based on the dynamic index pruning algorithm, will not put pressure on data storage and preprocessing, greatly shortening computing time while ensuring data entry efficiency.
[0008] In a first aspect, embodiments of the present invention provide a method for analyzing vector data, the method comprising:
[0009] Obtain the vector data to be analyzed, and determine the fixed-level grid index and morphological parameters corresponding to the vector data;
[0010] Calculate the scale factor and optimal grid index level of vector data using a fixed-level grid index and morphological parameters;
[0011] The grid index corresponding to the optimal grid index level is determined as the computational grid index, and the computational grid index is used to perform correlation calculations on the vector data to obtain overlay data pairs;
[0012] The preset data deduplication strategy is used to deduplicatize cross-grid data contained in the overlay data pairs;
[0013] The superimposed data pairs that have undergone deduplication are superimposed to obtain the analysis results of the vector data.
[0014] In some implementations, the steps of acquiring the vector data to be analyzed and determining the multi-level grid index corresponding to the vector data include:
[0015] The overlay vector data is stored using a pre-defined big data cluster, and the overlay vector data is identified as the vector data to be analyzed.
[0016] The geohash algorithm is used to generate a fixed-level index for vector data, and the fixed-level index is used to determine the multi-level grid index corresponding to the vector data.
[0017] In some implementations, the steps of calculating the scale factor of the vector data and the optimal grid index level using a fixed-level grid index and morphological parameters include:
[0018] Obtain the first vector data and the second vector data contained in the vector data, and obtain the vector sample data corresponding to the first vector data and the second vector data respectively through a random sampling method;
[0019] Using the morphological parameters corresponding to the vector sample data, the first scale coefficient corresponding to the first vector data and the second scale coefficient corresponding to the second vector data are determined respectively.
[0020] The maximum value of the first and second scale coefficients is determined as the scale coefficient, and the value of the scale coefficient is used to determine the optimal grid index level corresponding to the fixed-level grid index.
[0021] In some implementations, the step of determining the first scale coefficient corresponding to the first vector data and the second scale coefficient corresponding to the second vector data using the morphological parameters corresponding to the vector sample data includes:
[0022] Obtain the polygon perimeter, polygon area, and number of polygon vertices from the shape parameters of the vector layer;
[0023] Based on the polygon's perimeter, area, and number of vertices, the first and second scale coefficients are calculated using a pre-defined scale coefficient formula; the scale coefficient formula is as follows:
[0024] k = w1*p + w2*n;
[0025] p = c 2 / s;
[0026] p is the shape index; c is the perimeter of the polygon; s is the area of the polygon; n is the number of vertices of the polygon; w1 and w2 are constants, and w1 + w2 = 1.
[0027] In some implementations, the optimal grid index level corresponding to a fixed-level grid index is determined using the numerical value of the scale coefficient, including:
[0028] Obtain the numerical value of the scaling factor;
[0029] The optimal grid index level corresponding to the scale coefficient value is obtained from the fixed-level grid index using a pre-defined correspondence table. Specifically, in the correspondence table, when the scale coefficient value is greater than 231485.15, the corresponding optimal grid index level is level 0; when the scale coefficient value is greater than 105772.52 and less than 231485.15, the corresponding optimal grid index level is level 1; when the scale coefficient value is greater than 53694.83 and less than 105772.52, the corresponding optimal grid index level is level 2; and when the scale coefficient value is greater than 31773.26 and less than 53694.83, the corresponding optimal grid index level is level 3. The optimal grid index level is layer 3 when the scale coefficient is greater than 17662.14 and less than 31173.26; the optimal grid index level is layer 4 when the scale coefficient is greater than 9337.42 and less than 17662.14; the optimal grid index level is layer 5 when the scale coefficient is greater than 5784.61 and less than 9337.42; the optimal grid index level is layer 6 when the scale coefficient is greater than 3177.49 and less than 5784.61; the optimal grid index level is layer 7 when the scale coefficient is greater than 2165.23 and less than 3177.49. The optimal grid index level is level 8; when the scale factor is greater than 1253.57 and less than 2165.23, the corresponding optimal grid index level is level 9; when the scale factor is greater than 742.15 and less than 1253.57, the corresponding optimal grid index level is level 10; when the scale factor is greater than 519.52 and less than 742.15, the corresponding optimal grid index level is level 11; when the scale factor is greater than 405.29 and less than 519.52, the corresponding optimal grid index level is level 12; when the scale factor is greater than 312.47 and less than 405.29, the corresponding optimal grid index level is level 13. The optimal grid index level is layer 13 when the scale coefficient is greater than 231.86 and less than 312.47; layer 14 when the scale coefficient is greater than 167.43 and less than 231.86; layer 15 when the scale coefficient is greater than 113.62 and less than 167.43; layer 16 when the scale coefficient is greater than 78.57 and less than 113.62; and layer 18 when the scale coefficient is greater than 57.61 and less than 78.57.
[0030] In some implementations, the step of determining the grid index corresponding to the optimal grid index level as the computational grid index, and using the computational grid index to perform correlation calculations on the vector data to obtain overlay data pairs includes:
[0031] Obtain the grid index corresponding to the optimal grid index level, and determine the grid index as the computation grid index;
[0032] The vector data is divided according to the computational grid index, and the vector data located in the same grid after division are correlated and calculated to obtain superimposed data pairs.
[0033] In some implementations, the step of deduplicating cross-grid data contained in overlay data pairs using a preset data deduplication strategy includes:
[0034] Retrieve the first and second geometric objects contained in cross-grid data;
[0035] Calculate the first bounding rectangle of the first geometric object and the second bounding rectangle of the second geometric object;
[0036] Calculate the intersection of the first and second bounding rectangles, and obtain the top-left vertex of the intersection as a reference point;
[0037] Calculate the grid index of the reference point and determine whether the grid index is consistent with the grid indices of the first and second geometric objects; if so, save the overlay data pair; otherwise, delete the overlay data pair.
[0038] In some implementations, the step of performing overlay calculations on the deduplicated overlay data pairs to obtain the analysis results of the vector data includes:
[0039] The JTS overlay analysis operator was used to perform overlay calculations on the deduplicated overlay data to obtain the first analysis result.
[0040] Iterate through the first analysis results and filter out the empty data contained in the first analysis results to obtain the second analysis results;
[0041] The second analysis result is saved to a preset distributed database to obtain the analysis result of the vector data.
[0042] Secondly, embodiments of the present invention provide a vector data analysis system, the system comprising:
[0043] The first analysis module is used to acquire the vector data to be analyzed and determine the fixed-level grid index and morphological parameters corresponding to the vector data.
[0044] The second analysis module is used to calculate the scale factor and optimal grid index level of vector data using a fixed-level grid index and morphological parameters.
[0045] The third analysis module is used to determine the grid index corresponding to the optimal grid index level as the computational grid index, and to use the computational grid index to perform correlation calculations on the vector data to obtain superimposed data pairs.
[0046] The fourth analysis module is used to perform deduplication on cross-grid data contained in the overlay data pairs using a preset data deduplication strategy;
[0047] The fifth analysis module is used to perform superposition calculations on the superimposed data pairs that have undergone deduplication to obtain the analysis results of the vector data.
[0048] Thirdly, embodiments of the invention also provide an electronic device, including a memory and a processor, wherein the memory stores a computer program that can run on the processor, wherein when the processor executes the computer program, it implements the steps of the vector data analysis method mentioned in the first aspect above.
[0049] Fourthly, embodiments of the present invention also provide a readable storage medium storing a computer program, wherein the computer program, when run by a processor, implements the steps of the vector data analysis method mentioned in the first aspect.
[0050] The embodiments of the present invention bring at least the following beneficial effects:
[0051] This invention provides a method, system, and electronic device for analyzing vector data. The method first acquires the vector data to be analyzed and determines the fixed-level grid index and morphological parameters corresponding to the vector data. Then, it calculates the scale coefficient and optimal grid index level of the vector data using the fixed-level grid index and morphological parameters. Next, it determines the grid index corresponding to the optimal grid index level as the computational grid index and uses the computational grid index to perform correlation calculations on the vector data, obtaining overlay data pairs. Then, it uses a preset data deduplication strategy to deduplicate cross-grid data within the overlay data pairs. Finally, it performs overlay calculations on the deduplicated overlay data pairs to obtain the analysis results of the vector data. This method can create a more balanced distributed computing load, thereby greatly improving the distributed computing performance of vector data at different scales, making full use of computing resources, shortening computing time, and, based on a dynamic index pruning algorithm, does not put pressure on data storage and preprocessing, greatly shortening computing time while ensuring data entry efficiency.
[0052] Other features and advantages of the invention will be set forth in the following description, or some features and advantages may be inferred from the description or determined without doubt, or may be learned by practicing the techniques described above.
[0053] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, preferred embodiments are described below in detail with reference to the accompanying drawings. Attached Figure Description
[0054] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0055] Figure 1 A flowchart illustrating a vector data analysis method provided in an embodiment of the present invention;
[0056] Figure 2 The flowchart of step S101, which involves acquiring the vector data to be analyzed and determining the multi-level grid index corresponding to the vector data, is provided in an embodiment of the present invention for analyzing vector data.
[0057] Figure 3 A flowchart of step S102, which involves calculating the scale factor of vector data and the optimal grid index level using a multi-level grid index, is provided in an embodiment of the present invention for a vector data analysis method.
[0058] Figure 4 A flowchart of step S302 in a vector data analysis method provided in an embodiment of the present invention, which uses the morphological parameters of a vector layer to determine the first scale coefficient corresponding to the first vector data and the second scale coefficient corresponding to the second vector data respectively;
[0059] Figure 5 A flowchart illustrating the process of determining the optimal grid index level using the numerical value of the scale coefficient in a vector data analysis method provided in this embodiment of the invention.
[0060] Figure 6 In a vector data analysis method provided by an embodiment of the present invention, the flowchart of step S103 is as follows: the grid index corresponding to the optimal grid index level is determined as the computational grid index, and the computational grid index is used to perform correlation calculation on the vector data to obtain the superimposed data pair.
[0061] Figure 7 A flowchart of step S104 in a vector data analysis method provided in an embodiment of the present invention, which uses a preset data deduplication strategy to deduplicate cross-grid data contained in an overlay data pair;
[0062] Figure 8A flowchart of step S105 in a vector data analysis method provided in an embodiment of the present invention, which involves performing superposition calculations on superimposed data pairs that have undergone deduplication to obtain the analysis results of the vector data;
[0063] Figure 9 A flowchart of another vector data analysis method provided in an embodiment of the present invention;
[0064] Figure 10 This is a schematic diagram illustrating the two-level grid index used to partition vector data in a vector data analysis method provided in an embodiment of the present invention;
[0065] Figure 11 This is a schematic diagram of the cross-grid geometric object used in a vector data analysis method provided in an embodiment of the present invention;
[0066] Figure 12 This is a schematic diagram of the structure of a vector data analysis system provided in an embodiment of the present invention;
[0067] Figure 13 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention.
[0068] icon:
[0069] 1210 - First Analysis Module; 1220 - Second Analysis Module; 1230 - Third Analysis Module; 1240 - Fourth Analysis Module; 1250 - Fifth Analysis Module;
[0070] 101 - Processor; 102 - Memory; 103 - Bus; 104 - Communication interface. Detailed Implementation
[0071] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0072] Traditional Geohash uses a grid index for vector data storage. During distributed overlay computation, the data is uniformly divided according to a fixed scale based on its range. The divided data is then overlaid within computation nodes according to the relationships between the grid indices. The computation results from each node are deduplicated and merged to obtain the final result. Building upon this traditional method, current technologies also employ multi-level grid indexes for data preprocessing. Multiple levels of grid indexes are established for the distributed storage of vector data. During distributed overlay analysis, the grid index is selected based on the higher-level data in the distribution attributes of the vector patches, and then grid partitioning and overlay analysis are performed using traditional methods.
[0073] Traditional technical solutions fail to consider the differences in distribution and density of different vector data. When performing overlay analysis of two vector layers, data imbalance during grid partitioning is inevitable. Since data is evenly distributed across distributed computing nodes according to the number of grids, uneven data distribution between grids leads to uneven computational loads among nodes. Slower computation on some nodes can cause a "weakest link" effect, reducing overall computational performance and ultimately resulting in poor overall performance. In big data environments, using multi-level grid indexes for data preprocessing during vector data storage can, to some extent, solve the problem of overlaying vector data at different scales. Using appropriate grid indexes to partition the data during computation can achieve relative load balancing. However, the degree of balance is affected by the grid index hierarchy. If the grid index hierarchy is too small, data will accumulate in various grids, resulting in low parallelism, inability to fully utilize cluster resources, high computational load for each subtask, and long computation time. Calculating larger-level grid indexes increases the computational load when data is written to the database and consumes more storage space, reducing the writing efficiency of data when it is written to the database. During computation, larger-level grids result in more computational subtasks. Although each subtask has a smaller and faster computation load, the start-up, execution, and destruction of a single computational task will increase the performance cost of task scheduling during computation and will also slow down the overall computation.
[0074] In summary, existing technologies for distributed overlay analysis of vector data at different scales still suffer from low execution efficiency and excessive computational resource consumption. Therefore, this invention provides a vector data analysis method, system, and electronic device. This method can create a more balanced distributed computing load, thereby significantly improving the distributed computing performance of vector data at different scales, fully utilizing computing resources, shortening computation time, and, based on a dynamic index pruning algorithm, not putting pressure on data storage and preprocessing. This greatly reduces computation time while ensuring efficient data entry.
[0075] To facilitate understanding of this embodiment, a method for analyzing vector data disclosed in this invention will first be described in detail. Specifically, as follows... Figure 1 As shown, the method includes:
[0076] Step S101: Obtain the vector data to be analyzed and determine the fixed-level grid index and morphological parameters corresponding to the vector data.
[0077] This step involves acquiring the vector data to be analyzed through methods such as data import into a database, and then using the vector data to calculate a multi-level grid index with a fixed hierarchy. In practice, the number of grid index levels should not exceed 18, as computational performance degrades significantly after 18 levels. The morphological parameters are the geometric parameters of the vector data, mainly including perimeter, area, and number of vertices.
[0078] Step S102: Calculate the scale factor and optimal grid index level of the vector data using a fixed-level grid index and morphological parameters.
[0079] By using the corresponding vector data in the fixed-level grid index to obtain the vector layers it contains, and then using morphological parameters to perform vector data scale and distribution analysis on the vector layers to obtain the corresponding scale coefficients, the optimal grid index level is then determined using the scale coefficients.
[0080] Step S103: Determine the grid index corresponding to the optimal grid index level as the computational grid index, and use the computational grid index to perform correlation calculations on the vector data to obtain the superimposed data pairs.
[0081] After obtaining the optimal grid index level, the specific grid index is determined using the grid index value corresponding to the optimal grid index level, and this is recorded as the computational grid index. Then, the vector data is partitioned according to the computational grid index, and the partitioned vector data is correlated and calculated to obtain overlay data pairs.
[0082] Step S104: Use a preset data deduplication strategy to deduplicate cross-grid data contained in the overlay data pair.
[0083] When dividing data by grid index, there may be cases where geometric objects cross grids. A preset data deduplication strategy can be used to delete cross-grid data in the overlay data, thereby deduplicating the cross-grid data.
[0084] Step S105: Perform superposition calculations on the superimposed data pairs that have completed deduplication to obtain the analysis results of the vector data.
[0085] In some implementations, step S101, which involves acquiring the vector data to be analyzed and determining the multi-level grid index corresponding to the vector data, is as follows: Figure 2As shown, it includes:
[0086] Step S201: Store the overlay vector data using a preset big data cluster, and determine the overlay vector data as the vector data to be analyzed;
[0087] Step S202: Use the geohash algorithm to generate a fixed-level index for the vector data, and use the fixed-level index to determine the multi-level grid index corresponding to the vector data.
[0088] In practice, the overlay vectors include two data sets, A and B. These two data sets can be stored using a large data cluster. During storage, a fixed-level grid index is generated using the geohash algorithm, with a maximum depth of 18 levels, and this index is inserted as a separate column. This grid index supports calculations at any level; that is, grid indexes with any level less than 18 can be obtained through dynamic pruning techniques.
[0089] In some implementations, step S102, which calculates the scale factor of the vector data and the optimal grid index level using a fixed-level grid index and morphological parameters, is as follows: Figure 3 As shown, it includes:
[0090] Step S301: Obtain the first vector data and the second vector data contained in the vector data, and obtain the vector sample data corresponding to the first vector data and the second vector data respectively by random sampling method;
[0091] Step S302: Using the morphological parameters corresponding to the vector sample data, determine the first scale coefficient corresponding to the first vector data and the second scale coefficient corresponding to the second vector data respectively.
[0092] Step S303: Determine the maximum value of the first scale coefficient and the second scale coefficient as the scale coefficient, and use the value of the scale coefficient to determine the optimal grid index level corresponding to the fixed-level grid index.
[0093] In real-world scenarios, Spark can be used to load vector data A and B, and the two vector layers A and B can be analyzed for vector data scale and distribution through random sampling to obtain the scale coefficients k1 and k2 of the two vector data.
[0094] In some implementations, step S302 involves determining the first scale coefficient corresponding to the first vector data and the second scale coefficient corresponding to the second vector data using the morphological parameters corresponding to the vector sample data, such as... Figure 4 As shown, it includes:
[0095] Step S401: Obtain the polygon perimeter, polygon area, and number of polygon vertices included in the shape parameters of the vector layer;
[0096] Step S402: Based on the perimeter of the polygon, the area of the polygon, and the number of vertices of the polygon, calculate the first scale coefficient and the second scale coefficient respectively using the preset scale coefficient formula.
[0097] The scaling factor is calculated as follows:
[0098] k = w1*p + w2*n;
[0099] p = c 2 / s;
[0100] p is the shape index; c is the perimeter of the polygon; s is the area of the polygon; n is the number of vertices of the polygon; w1 and w2 are constants, and w1 + w2 = 1.
[0101] In some implementations, the optimal grid index level corresponding to a fixed-level grid index is determined using the numerical value of the scale coefficient, such as... Figure 5 As shown, it includes:
[0102] Step S501: Obtain the value of the scaling factor;
[0103] Step S502: Use a preset correspondence table to obtain the optimal grid index level corresponding to the scale coefficient value from the fixed-level grid index;
[0104] In the corresponding relationship table, when the scale coefficient is greater than 231485.15, the corresponding optimal grid index level is level 0.
[0105] When the scale factor is greater than 105772.52 and less than 231485.15, the corresponding optimal grid index level is level 1.
[0106] When the scale factor is greater than 53694.83 and less than 105772.52, the corresponding optimal grid index level is the second level.
[0107] When the scale factor is greater than 31773.26 and less than 53694.83, the corresponding optimal grid index level is the 3rd level.
[0108] When the scale factor is greater than 17662.14 and less than 31173.26, the corresponding optimal grid index level is the 4th level.
[0109] When the scale factor is greater than 9337.42 and less than 17662.14, the corresponding optimal grid index level is the 5th level.
[0110] When the scale factor is greater than 5784.61 and less than 9337.42, the corresponding optimal grid index level is the 6th level.
[0111] When the scale factor is greater than 3177.49 and less than 5784.61, the corresponding optimal grid index level is the 7th level.
[0112] When the scale factor is greater than 2165.23 and less than 3177.49, the corresponding optimal grid index level is the 8th level.
[0113] When the scale factor is greater than 1253.57 and less than 2165.23, the corresponding optimal grid index level is the 9th level.
[0114] When the scale factor is greater than 742.15 and less than 1253.57, the corresponding optimal grid index level is the 10th level.
[0115] When the scale factor is greater than 519.52 and less than 742.15, the corresponding optimal grid index level is the 11th level.
[0116] When the scale factor is greater than 405.29 and less than 519.52, the corresponding optimal grid index level is the 12th level.
[0117] When the scale factor is greater than 312.47 and less than 405.29, the corresponding optimal grid index level is the 13th level.
[0118] When the scale factor is greater than 231.86 and less than 312.47, the corresponding optimal grid index level is the 14th level.
[0119] When the scale factor is greater than 167.43 and less than 231.86, the corresponding optimal grid index level is the 15th level.
[0120] When the scale factor is greater than 113.62 and less than 167.43, the corresponding optimal grid index level is the 16th level.
[0121] When the scale factor is greater than 78.57 and less than 113.62, the corresponding optimal grid index level is the 17th level.
[0122] When the scale factor is greater than 57.61 and less than 78.57, the corresponding optimal grid index level is the 18th level.
[0123] After obtaining the optimal grid index level corresponding to the scale coefficient value using a preset correspondence table, in some implementations, the grid index corresponding to the optimal grid index level is determined as the computational grid index, and the computational grid index is used to perform correlation calculations on the vector data to obtain the overlay data pair, as in step S103. Figure 6 As shown, it includes:
[0124] Step S601: Obtain the grid index corresponding to the optimal grid index level, and determine the grid index as the computation grid index;
[0125] Step S602: Divide the vector data according to the computational grid index, and perform correlation calculation on the vector data located in the same grid after division to obtain superimposed data pairs.
[0126] In specific scenarios, a dynamic pruning technique using grid indexes can be employed to calculate an m-level grid index. Vector data A and B are then partitioned according to this m-level grid index. Simultaneously, the vector data A and B located within the same grid after partitioning are correlated to obtain overlay data pairs. For details, please refer to [reference needed]. Figure 10 The diagram shows a two-level grid index partitioning vector data.
[0127] In some implementations, step S104 involves using a preset data deduplication strategy to deduplicatize cross-grid data contained in overlay data pairs, such as... Figure 7 As shown, it includes:
[0128] Step S701: Obtain the first and second geometric objects contained in the cross-grid data;
[0129] Step S702: Calculate the first outer rectangle of the first geometric object and the second outer rectangle of the second geometric object;
[0130] Step S703: Calculate the intersection of the first outer rectangle and the second outer rectangle, and obtain the upper left corner vertex of the intersection as a reference point;
[0131] Step S704: Calculate the grid index of the reference point and determine whether the grid index is consistent with the grid index corresponding to the first geometric object and the second geometric object; if yes, save the overlay data pair; if no, delete the overlay data pair.
[0132] like Figure 10 and Figure 11 As shown, when dividing data by grid index, there are cases where geometric objects cross grid boundaries (e.g., Figure 10The data includes geometric objects a and b. For geometric objects that span multiple grids, a record is stored in each grid. Therefore, when performing data association, duplicate overlapping data pairs may occur (e.g., overlapping data pairs a and b exist in grids 0, 1, 2, and 3). Therefore, it is necessary to deduplicate the duplicate overlapping data pairs.
[0133] The specific operation steps are as follows (each overlay data pair is processed during the data association process):
[0134] Calculate the bounding rectangles R1 and R2 of geometric objects a and b respectively; calculate the intersection of R1 and R2, and take the top-left vertex of the intersection as the reference point t; calculate the grid index i of the reference point t; determine whether the grid index i is consistent with the grid index of geometric objects a and b. If they are consistent, save the overlay data pair; otherwise, discard it. This completes the deduplication operation for cross-grid data.
[0135] In some implementations, step S105 involves performing overlay calculations on the deduplicated overlay data pairs to obtain the analysis results of the vector data, such as... Figure 8 As shown, it includes:
[0136] Step S801: Use the JTS overlay analysis operator to perform overlay calculations on the overlay data that has been deduplicated, and obtain the first analysis result;
[0137] Step S802: Traverse the first analysis results and filter out the empty data contained in the first analysis results to obtain the second analysis results;
[0138] Step S803: Save the second analysis result to a preset distributed database to obtain the analysis result of the vector data.
[0139] The JTS overlay analysis operator is used to overlay the grouped and deduplicated data, while filtering out records with empty calculation results, and storing the results in a distributed database, thus completing the vector data analysis process.
[0140] like Figure 9 Another method for analyzing vector data, as shown, involves first controlling the input of vector data into the database and calculating an 18-level grid index; then calculating the vector data scale coefficient k and the optimal grid index layer m; then calculating the m-level grid index and data association; subsequently, deduplicating cross-grid data; and finally, overlaying the calculations and saving the analysis results.
[0141] As can be seen from the vector data analysis method mentioned in the above embodiments, this method can create a more balanced distributed computing load, thereby greatly improving the distributed computing performance of vector data at different scales, making full use of computing resources, shortening computing time, and, based on the dynamic pruning algorithm of the index, will not put pressure on data storage and preprocessing, greatly shortening computing time while ensuring data entry efficiency.
[0142] Corresponding to the above method embodiments, this invention provides a vector data analysis system, such as... Figure 12 As shown, the system includes:
[0143] The first analysis module 1210 is used to acquire the vector data to be analyzed and determine the fixed-level grid index and morphological parameters corresponding to the vector data.
[0144] The second analysis module 1220 uses a fixed-level grid index and morphological parameters to calculate the scale factor and the optimal grid index level of the vector data.
[0145] The third analysis module 1230 is used to determine the grid index corresponding to the optimal grid index level as the computational grid index, and to use the computational grid index to perform correlation calculations on the vector data to obtain superimposed data pairs.
[0146] The fourth analysis module 1240 is used to perform deduplication on cross-grid data contained in the overlay data pairs using a preset data deduplication strategy;
[0147] The fifth analysis module 1250 is used to perform superposition calculations on the superimposed data pairs that have undergone deduplication to obtain the analysis results of the vector data.
[0148] In some implementations, the first analysis module 1210 is used to: store the superimposed vector data using a preset big data cluster, and determine the superimposed vector data as the vector data to be analyzed; generate a fixed-level index of the vector data using the geohash algorithm, and determine the multi-level grid index corresponding to the vector data using the fixed-level index.
[0149] In some implementations, the second analysis module 1220 is used to: acquire first vector data and second vector data contained in the vector data; acquire vector sample data corresponding to the first vector data and second vector data respectively through random sampling method; determine the first scale coefficient corresponding to the first vector data and the second scale coefficient corresponding to the second vector data respectively using the morphological parameters corresponding to the vector sample data; determine the maximum value of the first scale coefficient and the second scale coefficient as the scale coefficient, and determine the optimal grid index level corresponding to the fixed-level grid index using the value of the scale coefficient.
[0150] In some implementations, the second analysis module 1220, while determining the first scale coefficient corresponding to the first vector data and the second scale coefficient corresponding to the second vector data using the shape parameters of the vector layer, is further configured to: obtain the polygon perimeter, polygon area, and number of polygon vertices contained in the shape parameters of the vector layer; and calculate the first scale coefficient and the second scale coefficient respectively using a preset scale coefficient formula based on the polygon perimeter, polygon area, and number of polygon vertices; wherein the scale coefficient formula is:
[0151] k = w1*p + w2*n;
[0152] p = c 2 / s;
[0153] p is the shape index; c is the perimeter of the polygon; s is the area of the polygon; n is the number of vertices of the polygon; w1 and w2 are constants, and w1 + w2 = 1.
[0154] In some implementations, the second analysis module 1220, in determining the corresponding optimal grid index level using the scale coefficient value, is further configured to: obtain the scale coefficient value; and obtain the optimal grid index level corresponding to the scale coefficient value from a fixed-level grid index using a preset correspondence table. In the correspondence table, when the scale coefficient value is greater than 231485.15, the corresponding optimal grid index level is level 0; when the scale coefficient value is greater than 105772.52 and less than 231485.15, the corresponding optimal grid index level is level 1; and when the scale coefficient value is greater than 53694.83 and less than 105772.52, the corresponding optimal grid index level is... The optimal grid index level is layer 2 when the scale coefficient is greater than 31773.26 and less than 53694.83; the optimal grid index level is layer 3 when the scale coefficient is greater than 17662.14 and less than 31173.26; the optimal grid index level is layer 4 when the scale coefficient is greater than 9337.42 and less than 17662.14; the optimal grid index level is layer 5 when the scale coefficient is greater than 5784.61 and less than 9337.42; the optimal grid index level is layer 6 when the scale coefficient is greater than 3177.49 and less than 5784.61; the optimal grid index level is layer 7 when the scale coefficient is greater than 31773.49 and less than 5784.61. When the scale coefficient is greater than 2165.23 and less than 3177.49, the corresponding optimal grid index level is level 8; when the scale coefficient is greater than 1253.57 and less than 2165.23, the corresponding optimal grid index level is level 9; when the scale coefficient is greater than 742.15 and less than 1253.57, the corresponding optimal grid index level is level 10; when the scale coefficient is greater than 519.52 and less than 742.15, the corresponding optimal grid index level is level 11; when the scale coefficient is greater than 405.29 and less than 519.52, the corresponding optimal grid index level is level 12; when the scale coefficient is greater than 312.47 and less than 4... At 05:29, the corresponding optimal grid index level is level 13; when the scale coefficient is greater than 231.86 and less than 312.47, the corresponding optimal grid index level is level 14; when the scale coefficient is greater than 167.43 and less than 231.86, the corresponding optimal grid index level is level 15; when the scale coefficient is greater than 113.62 and less than 167.43, the corresponding optimal grid index level is level 16; when the scale coefficient is greater than 78.57 and less than 113.62, the corresponding optimal grid index level is level 17; when the scale coefficient is greater than 57.61 and less than 78.57, the corresponding optimal grid index level is level 18.
[0155] In some implementations, the third analysis module 1230 is used to: obtain the grid index corresponding to the optimal grid index level and determine the grid index as the computational grid index; divide the vector data according to the computational grid index and perform correlation calculation on the vector data located in the same grid after division to obtain superimposed data pairs.
[0156] In some implementations, the fourth analysis module 1240 is used to: acquire a first geometric object and a second geometric object contained in the cross-grid data; calculate a first outer rectangle of the first geometric object and a second outer rectangle of the second geometric object; calculate the intersection of the first outer rectangle and the second outer rectangle, and acquire the upper left vertex of the intersection as a reference point; calculate the grid index of the reference point, and determine whether the grid index is consistent with the grid index corresponding to the first geometric object and the second geometric object; if yes, save the overlay data pair; if no, delete the overlay data pair.
[0157] In some implementations, the fifth analysis module 1250 is used to: perform superposition calculations on the superimposed data that has undergone deduplication using the JTS superposition analysis operator to obtain a first analysis result; traverse the first analysis result and filter out the empty data contained in the first analysis result to obtain a second analysis result; save the second analysis result to a preset distributed database to obtain the analysis result of the vector data.
[0158] As can be seen from the vector data analysis system mentioned in the above embodiments, the system can create a more balanced distributed computing load, thereby greatly improving the distributed computing performance of vector data at different scales, making full use of computing resources, shortening computing time, and without putting pressure on data storage and preprocessing based on the dynamic index pruning algorithm. It greatly shortens computing time while ensuring data entry efficiency.
[0159] The vector data analysis system provided in this embodiment of the invention has the same technical features as the vector data analysis method provided in the above embodiments, and therefore can solve the same technical problems and achieve the same technical effects. For the sake of brevity, any parts not mentioned in the embodiments can be referred to the corresponding content in the foregoing vector data analysis system embodiments.
[0160] This embodiment also provides an electronic device, as shown in the structural schematic diagram below. Figure 13 As shown, the device includes a processor 101 and a memory 102; wherein, the memory 102 is used to store one or more computer instructions, which are executed by the processor to implement the above-mentioned vector data analysis method.
[0161] Figure 13The electronic device shown also includes a bus 103 and a communication interface 104, with the processor 101, communication interface 104 and memory 102 connected via the bus 103.
[0162] The memory 102 may include high-speed random access memory (RAM), and may also include non-volatile memory, such as at least one disk storage device. The bus 103 may be an ISA bus, PCI bus, or EISA bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 13 The symbol is represented by a single double-headed arrow, but this does not mean that there is only one bus or one type of bus.
[0163] The communication interface 104 is used to connect to at least one user terminal and other network units through a network interface, and to send encapsulated IPv4 packets or IPv4 packets to the user terminal through the network interface.
[0164] Processor 101 may be an integrated circuit chip with signal processing capabilities. In implementation, each step of the above method can be completed by the integrated logic circuitry in the hardware of processor 101 or by instructions in software form. The processor 101 can be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc.; it can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this disclosure. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this disclosure can be directly manifested as execution by a hardware decoding processor, or execution by a combination of hardware and software modules in the decoding processor. The software module can reside in a mature storage medium in the art, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory 102. The processor 101 reads the information in memory 102 and, in conjunction with its hardware, completes the steps of the method described in the foregoing embodiments.
[0165] This invention also provides a readable storage medium storing a computer program, which, when executed by a processor, performs the steps of the vector data analysis method described in the foregoing embodiments.
[0166] In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods can be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. Furthermore, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Additionally, the coupling or direct coupling or communication connection shown or discussed may be through some communication interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.
[0167] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0168] In addition, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
[0169] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a processor-executable, non-volatile, computer-readable storage medium. Based on this understanding, the technical solution of this invention, essentially, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0170] Finally, it should be noted that the above-described embodiments are merely specific implementations of the present invention, used to illustrate the technical solutions of the present invention, and not to limit it. The scope of protection of the present invention is not limited thereto. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that any person skilled in the art can still modify or easily conceive of changes to the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present invention, or make equivalent substitutions for some of the technical features; and these modifications, changes, or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be covered within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A method for analyzing vector data, characterized in that, The method includes: Obtain the vector data to be analyzed, and determine the fixed-level grid index and morphological parameters corresponding to the vector data; The scale factor and optimal grid index level of the vector data are calculated using the fixed-level grid index and the morphological parameters. The grid index corresponding to the optimal grid index level is determined as the computational grid index, and the vector data is correlated and calculated using the computational grid index to obtain the superimposed data pair; The cross-grid data contained in the overlay data pair is deduplicated using a preset data deduplication strategy; The superimposed data pairs that have undergone deduplication are superimposed to obtain the analysis results of the vector data; The step of calculating the scale factor and optimal grid index level of the vector data using the fixed-level grid index and the morphological parameters includes: Obtain the first vector data and the second vector data contained in the vector data, and obtain the vector sample data corresponding to the first vector data and the second vector data respectively by random sampling method; Using the morphological parameters corresponding to the vector sample data, the first scale coefficient corresponding to the first vector data and the second scale coefficient corresponding to the second vector data are determined respectively. The maximum value of the first scale coefficient and the second scale coefficient is determined as the scale coefficient, and the optimal grid index level corresponding to the fixed-level grid index is determined using the value of the scale coefficient. The step of determining the first scale coefficient corresponding to the first vector data and the second scale coefficient corresponding to the second vector data using the morphological parameters corresponding to the vector sample data includes: Obtain the polygon perimeter, polygon area, and number of polygon vertices included in the shape parameters; Based on the perimeter of the polygon, the area of the polygon, and the number of vertices of the polygon, the first scale factor and the second scale factor are calculated using a preset scale factor formula; wherein, the scale factor formula is: k = w1 * p + w2 * n; p=c 2 / s; p is the shape index; c is the perimeter of the polygon; s is the area of the polygon; n is the number of vertices of the polygon; w1 and w2 are constants, and w1 + w2 = 1.
2. The vector data analysis method according to claim 1, characterized in that, The steps of acquiring the vector data to be analyzed and determining the fixed-level mesh index and morphological parameters corresponding to the vector data include: The superimposed vector data is stored using a pre-set big data cluster, and the superimposed vector data is determined as the vector data to be analyzed; A fixed-level index for the vector data is generated using the geohash algorithm, and the multi-level grid index corresponding to the vector data is determined using the fixed-level index.
3. The method for analyzing vector data according to claim 1, characterized in that, Determining the optimal grid index level corresponding to the fixed-level grid index using the numerical value of the scale coefficient includes: Obtain the numerical value of the scaling factor; The optimal grid index level corresponding to the scale coefficient value is obtained from the fixed-level grid index using a preset correspondence table. Specifically, in the correspondence table, when the scale coefficient value is greater than 231485.15, the corresponding optimal grid index level is level 0; when the scale coefficient value is greater than 105772.52 and less than 231485.15, the corresponding optimal grid index level is level 1; when the scale coefficient value is greater than 53694.83 and less than 105772.52, the corresponding optimal grid index level is level 2; when the scale coefficient value is greater than 31773.26 and less than 53694.83, the corresponding optimal grid index level is level 3; when the scale coefficient value is greater than 17662.14 and less than 31173.26, the corresponding optimal grid index level is level 4; and when the scale coefficient value is greater than 9337.42 and less than 17662... At 14:00, the corresponding optimal grid index level is level 5; when the scale coefficient is greater than 5784.61 and less than 9337.42, the corresponding optimal grid index level is level 6; when the scale coefficient is greater than 3177.49 and less than 5784.61, the corresponding optimal grid index level is level 7; when the scale coefficient is greater than 2165.23 and less than 3177.49, the corresponding optimal grid index level is level 8; when the scale coefficient is greater than 1253.57 and less than 2165.23, the corresponding optimal grid index level is level 9. The optimal grid index level is determined as follows: when the scale factor is greater than 742.15 and less than 1253.57, the corresponding optimal grid index level is level 10; when the scale factor is greater than 519.52 and less than 742.15, the corresponding optimal grid index level is level 11; when the scale factor is greater than 405.29 and less than 519.52, the corresponding optimal grid index level is level 12; when the scale factor is greater than 312.47 and less than 405.29, the corresponding optimal grid index level is level 13; when the scale factor is greater than 231.86 and less than 742.15, the corresponding optimal grid index level is level 10; when the scale factor is greater than 742.15 and less than 1253.57, the corresponding optimal grid index level is level 10; when the scale factor is greater than 742.15 and less than 1253.57, the corresponding optimal grid index level is level 11; when the scale factor is greater than 742.15 and less than 1253.57, the corresponding optimal grid index level is level 12; when the scale factor is greater than 742.15 and less than 1253.57, the corresponding optimal grid index level is level 13; when the scale factor is greater than 742.15 and less than 1253.57, the corresponding optimal grid index level is level 14; when the scale factor is greater than 742.15 and less than 1253.57, the corresponding optimal grid index level is level 15 ... When the scale factor is 312.47, the corresponding optimal grid index level is level 14; when the scale factor is greater than 167.43 and less than 231.86, the corresponding optimal grid index level is level 15; when the scale factor is greater than 113.62 and less than 167.43, the corresponding optimal grid index level is level 16; when the scale factor is greater than 78.57 and less than 113.62, the corresponding optimal grid index level is level 17; when the scale factor is greater than 57.61 and less than 78.57, the corresponding optimal grid index level is level 18.
4. The method for analyzing vector data according to claim 1, characterized in that, The step of determining the grid index corresponding to the optimal grid index level as the computational grid index, and using the computational grid index to perform correlation calculations on the vector data to obtain overlay data pairs, includes: Obtain the grid index corresponding to the optimal grid index level, and determine the grid index as the computation grid index; The vector data is divided according to the computational grid index, and the vector data located in the same grid after division are correlated and calculated to obtain the superimposed data pair.
5. The method for analyzing vector data according to claim 1, characterized in that, The step of using a preset data deduplication strategy to deduplicate cross-grid data contained in the overlay data pair includes: Obtain the first and second geometric objects contained in the cross-grid data; Calculate the first bounding rectangle of the first geometric object and the second bounding rectangle of the second geometric object; Calculate the intersection of the first bounding rectangle and the second bounding rectangle, and obtain the top left corner vertex of the intersection as a reference point; Calculate the grid index of the reference point and determine whether the grid index is consistent with the grid index corresponding to the first geometric object and the second geometric object; if yes, save the overlay data pair; if no, delete the overlay data pair.
6. The method for analyzing vector data according to claim 1, characterized in that, The step of performing superposition calculations on the superimposed data pairs that have undergone deduplication to obtain the analysis results of the vector data includes: The JTS overlay analysis operator is used to perform overlay calculations on the overlay data pairs that have undergone deduplication to obtain the first analysis result; The first analysis result is iterated through, and the empty data contained in the first analysis result is filtered out to obtain the second analysis result; The second analysis result is saved to a preset distributed database to obtain the analysis result of the vector data.
7. A vector data analysis system, characterized in that, The system includes: The first analysis module is used to acquire the vector data to be analyzed and determine the fixed-level grid index and morphological parameters corresponding to the vector data. The second analysis module is used to calculate the scale factor and the optimal grid index level of the vector data using the fixed-level grid index and the morphological parameters. The third analysis module is used to determine the grid index corresponding to the optimal grid index level as the computational grid index, and use the computational grid index to perform correlation calculations on the vector data to obtain superimposed data pairs; The fourth analysis module is used to perform deduplication processing on the cross-grid data contained in the overlay data pair using a preset data deduplication strategy; The fifth analysis module is used to perform superposition calculations on the superimposed data pairs that have undergone deduplication to obtain the analysis results of the vector data; The second analysis module is further configured to: acquire first vector data and second vector data contained in the vector data; acquire vector sample data corresponding to the first vector data and second vector data respectively through random sampling method; determine the first scale coefficient corresponding to the first vector data and the second scale coefficient corresponding to the second vector data respectively using the morphological parameters corresponding to the vector sample data; determine the maximum value of the first scale coefficient and the second scale coefficient as the scale coefficient; and determine the optimal grid index level corresponding to the fixed-level grid index using the value of the scale coefficient. In the process of determining the first scale coefficient corresponding to the first vector data and the second scale coefficient corresponding to the second vector data using the morphological parameters corresponding to the vector sample data, the second analysis module is further configured to: obtain the polygon perimeter, polygon area, and number of polygon vertices contained in the morphological parameters; and calculate the first scale coefficient and the second scale coefficient respectively using a preset scale coefficient formula based on the polygon perimeter, polygon area, and number of polygon vertices; wherein the scale coefficient formula is: k=w1*p+w2*n; p=c 2 / s; p is the shape index; c is the perimeter of the polygon; s is the area of the polygon; n is the number of vertices of the polygon; w1 and w2 are constants, and w1 + w2 = 1.
8. An electronic device, characterized in that, include: A processor and a storage device; the storage device stores a computer program that, when executed by the processor, implements the steps of the vector data analysis method according to any one of claims 1 to 6.