Space-time address intelligent matching method and system based on multi-source data fusion
By constructing an enhanced knowledge graph and combining relational graph convolutional networks with graph isomorphic networks, a unified model of multi-level geographic structures and cross-relational links is achieved, solving the problems of insufficient address matching accuracy and poor adaptability in existing technologies, and realizing accurate matching and continuous optimization of complex addresses.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- INFORMATION CENT LAND & RESOURCES OF ZHEJIANG PROVINCE
- Filing Date
- 2026-05-13
- Publication Date
- 2026-06-12
AI Technical Summary
Existing address processing technologies struggle to guarantee matching accuracy when faced with complex, diverse, and time-varying address data. Traditional methods lack the ability to fuse multi-source data and fail to fully utilize structured information from user input, business records, and external knowledge bases, leading to frequent matching errors. Furthermore, existing relational graph convolutional networks cannot distinguish the differences between the positions of different relational chains, and multi-hop path modeling techniques lack the comprehensive expressive ability across relations and multi-hop links, resulting in poor model adaptability.
By constructing an enhanced knowledge graph and combining relational graph convolutional networks and graph isomorphic networks, a unified model of multi-level geographic structures and cross-relational links is achieved. By utilizing multi-hop link modeling mechanisms and online update mechanisms, accurate matching and real-time optimization of complex addresses are realized.
It improves the integrity and consistency of address resolution, enhances the distinguishability and robustness of address matching, and can continuously adapt to changes in address naming and user error correction data, thus realizing the real-time adaptability of the address matching model.
Smart Images

Figure CN122196199A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of intelligent address processing, and in particular to a spatiotemporal address intelligent matching method and system based on multi-source data fusion. Background Technology
[0002] Existing address processing technologies primarily rely on rule matching, keyword retrieval, or comparison using static address databases. However, when faced with complex, diverse, and time-varying address data, matching accuracy is difficult to guarantee. Due to frequent administrative division adjustments and changes in road and community naming, static address databases from a single source cannot fully reflect the dynamic geographic entity relationships in real-world scenarios, leading to matching errors when processing aliases, historical names, or hierarchical changes. Traditional methods generally lack the ability to integrate multi-source data, failing to fully utilize structured information from user input, business records, and external knowledge bases, thus limiting address understanding and matching capabilities in complex scenarios.
[0003] Some studies have attempted to model geographic entity relationships using graph convolutional structures. However, existing relational graph convolutional networks typically use a uniform aggregation method for different relationship types, failing to distinguish the differences in node positions across different relationship chains and making it difficult to express multi-level, cross-spatial structural associations in addresses. Multi-hop path modeling techniques in existing research mostly target single-relationship links, lacking the comprehensive ability to express cross-relationship and multi-hop links, resulting in insufficient stability and robustness of address matching under complex path conditions. Traditional address matching models often employ offline training methods, unable to be updated online based on user error correction and business feedback, leading to poor adaptability to real-time scene changes.
[0004] Therefore, how to provide a spatiotemporal address intelligent matching method and system based on multi-source data fusion is a problem that urgently needs to be solved by those skilled in the art. Summary of the Invention
[0005] One objective of this invention is to propose a spatiotemporal address intelligent matching method and system based on multi-source data fusion. This invention uses a relational graph convolutional network and a graph isomorphic network to uniformly model multi-level geographical structures and cross-relationship links, thereby achieving accurate matching of complex addresses. It has the advantages of strong structural expression capabilities and high accuracy in cross-relationship reasoning.
[0006] The spatiotemporal address intelligent matching method and system based on multi-source data fusion according to embodiments of the present invention includes the following steps: S1. Obtain multi-source address text data, perform preprocessing, construct an initial heterogeneous graph, and define address relationships as edge types; S2. Introduce external geographic knowledge graphs to perform entity alignment and relationship completion on nodes in the initial heterogeneous graph, and integrate administrative divisions, historical evolution and spatial adjacency information to generate an enhanced knowledge graph; S3. Construct a relational graph convolutional network based on the enhanced knowledge graph, define transformation matrices for different relations, perform relation-aware convolution on the nodes in the graph, and obtain the address embedding vector and the standard address embedding vector. S4. Embed the address into the vector input graph isomorphic network, and use the recursive matching mechanism of the isomorphic subgraph to recursively expand and parse the local isomorphic pattern of the structural chain generated in the enhanced knowledge graph to generate the structural expression vector. S5. Using the structural representation vector and the relationship chain extracted from the enhanced address knowledge graph as input, multi-hop link modeling is performed on the relationship chain to obtain the link representation vector, and a matching feature vector is generated based on the joint calculation of the structural representation vector and the link representation vector. S6. Calculate the similarity between the matching feature vector and the standard address embedding vector, and use supervised signals to optimize the relational graph convolutional network and graph isomorphic network to complete the training of the address matching model. S7. Deploy the trained address matching model in a real-time address processing environment, perform matching on the input address and output the standard address matching result, generate incremental samples from the real-time feedback, and periodically update the model parameters online using the incremental samples.
[0007] Optionally, S2 specifically includes: S21. Obtain the administrative division entities, road entities, community entities and spatial adjacency relationships in the external geographic knowledge graph, compare the entity numbers in the external knowledge graph with the node numbers in the initial heterogeneous graph, and generate entity alignment results. S22. Based on the entity alignment results, complete the administrative region hierarchy, geographical inclusion, and spatial adjacency relationships in the external knowledge graph to the missing relationship positions in the initial heterogeneous graph. S23. Based on the completion of the relationship, merge the node attributes in the initial heterogeneous graph and the entity attributes in the external knowledge graph to form an enhanced knowledge graph that includes administrative division levels, historical evolution relationships and spatial adjacency relationships.
[0008] Optionally, S3 specifically includes: S31. Read the node set and relation set from the enhanced knowledge graph, and map the relation type identifier of each type of relation to the relation encoding of the relation graph convolutional network; S32. Based on the relation encoding, set an independent transformation matrix for each relation type, and input the initial features of the nodes into the transformation matrix to generate relation feature representations; S33. In each layer of the relation graph convolutional network, according to the aggregation order of relation feature representation, the features of adjacent nodes belonging to different relation types are weighted and aggregated with the features of the current node, and the convolution result of the address node is output. S34. Repeatedly perform relation feature transformation and aggregation operations through several layers of relation graph convolutional networks to obtain address embedding vectors and standard address embedding vectors.
[0009] Optionally, S4 specifically includes: S41. For each address embedding vector, retrieve administrative division nodes and building-related nodes that are associated with the address node from the enhanced knowledge graph. Arrange the nodes in order to form a structural chain according to the administrative division from high to low and the building level from outside to inside. Only one node identifier and the corresponding address embedding vector are retained at each position in the structural chain. S42. Construct isomorphic subgraphs by sliding along the structural chain with a fixed window length. Each isomorphic subgraph consists of several adjacent nodes in the structural chain and the edge connections between these nodes in the enhanced knowledge graph. Save the node embedding vectors and edge connections of all isomorphic subgraphs as a set of isomorphic subgraphs. S43. For each isomorphic subgraph in the set of isomorphic subgraphs, according to a fixed node traversal order, sum the embedding vector of each node in the subgraph with the embedding vector of the adjacent node, input the summation result into a preset nonlinear mapping function to obtain the local pattern representation vector of the isomorphic subgraph, and save all local pattern representation vectors according to their starting position index in the structural chain. S44. Starting from the bottom of the structure chain, for a set of local pattern representation vectors corresponding to the same starting position index, add them sequentially according to the hierarchical order of the structure chain and divide them by the number of vectors to obtain the average vector. Use the average vector as the intermediate structure representation vector of the current level and use the intermediate structure representation vector as the input for the calculation of the next level. Repeat this process from bottom to top along the structure chain until all levels of the structure chain have been traversed. S45. After completing the bottom-up recursive aggregation of all levels, the intermediate structure representation vector corresponding to the top position of the structure chain and the original address embedding vector of the address node are linearly weighted and summed according to the preset weight coefficients. The weighted sum is used as the structure expression vector of the address node.
[0010] Optionally, S5 specifically includes: S51. Starting from the address node in the enhanced knowledge graph, expand outward along the geographical inclusion relationship, spatial adjacency relationship and temporal evolution relationship. Record the identifiers of all nodes in the expansion path according to the node access order, and form a relationship chain with each expansion path. Each node in the relationship chain is represented by the node embedding vector generated by the relationship graph convolutional network. S52. For each relation chain, start from the first node in the chain and traverse sequentially. Concatenate the current node's embedding vector with the previous node's embedding vector. Input the concatenated vector into a linear transformation function to generate the hop count feature vector corresponding to the hop count. Arrange all hop count feature vectors into a link feature sequence in hop count order. S53. Calculate the dot product between each hop count feature vector and the structural expression vector in the link feature sequence to obtain the attention score of the hop count feature, and normalize all attention scores to form attention weights. S54. Multiply each hop count feature vector in the link feature sequence with the corresponding attention weight and then add them together to generate the link representation vector of the relationship chain; S55. For all relational chains of the same address node, add the link representation vector of each relational chain in order of the shortest to the longest relational chain length, and obtain the link aggregation vector by dividing by the number of relational chains. S56. Perform a linear weighted summation of the link aggregation vector and the structure expression vector according to a preset ratio coefficient, and use the weighted summation result as the matching feature vector of the address node.
[0011] Optionally, S6 specifically includes: S61. Calculate the cosine similarity between the matching feature vector and the standard address embedding vector, and construct a loss function based on the similarity and matching labels; S62. Perform backpropagation on the loss function and update the linear transformation weight parameters in the relational graph convolutional network, graph isomorphic network, and link modeling steps based on the gradient values of the loss function with respect to the network parameters. S63. Repeat the similarity calculation and parameter update. Stop training when the decrease value of the loss function is less than the preset convergence threshold in several consecutive iterations. S64. After training stops, save all network parameters to form a trained address matching model.
[0012] Optionally, S7 specifically includes: S71. Deploy the trained address matching model to the real-time address processing environment, calculate the matching feature vector for the input address data and output the standard address matching result; S72. Collect user error correction information and business feedback information generated in the real-time address processing environment, and construct an incremental sample set based on the feedback content; S73. Perform vector similarity calculation and loss function calculation on the incremental sample set and the current parameters of the model, and use the gradient value of the loss function to update the linear transformation weight parameters in the relational graph convolutional network, graph isomorphic network and link modeling steps; S74. Repeatedly perform parameter updates according to the preset update cycle. When the update cycle reaches the set condition, write the updated parameters into the address matching model to complete the online model update.
[0013] Optionally, the relational graph convolutional network specifically includes: The input layer receives the initial feature vectors of nodes in the enhanced knowledge graph and constructs a multi-relationship adjacency structure based on the relationship types between nodes. The relation transformation layer sets an independent relation transformation matrix for each relation type and adds a chain order encoding vector to each relation transformation matrix; The relation aggregation layer performs weighted aggregation based on the chain order encoding vector. It calculates the weighting coefficients for adjacent nodes that belong to the same relation type but are in different relation chain positions, and adds the node feature vector to the weighted aggregation result to obtain the node's convolution output vector. The multi-layer stacked structure repeats the relation transformation layer and relation aggregation layer in a fixed order for several layers. Node features propagate along several relation chains in the enhanced knowledge graph to generate node embedding vectors.
[0014] Optionally, the graph isomorphic network specifically includes: The isomorphic subgraph input layer receives several isomorphic subgraphs constructed from structural chains according to a fixed window length, and uses the node embedding vectors and edge connection relationships in each isomorphic subgraph as input data for the isomorphic subgraph. The local pattern computation layer adds the node embedding vector and the neighboring node embedding vector in a fixed traversal order of the window nodes, and inputs the sum into a nonlinear mapping function to generate a local pattern representation vector. The recursive aggregation layer adopts a bottom-up recursive aggregation method, using the intermediate structure representation vector as the input of the previous level of recursive aggregation; The global structure generation layer performs a linear weighted summation of the intermediate structure representation vector and the initial embedding vector of the address node according to a preset scaling factor to obtain the structure representation vector.
[0015] The spatiotemporal address intelligent matching system based on multi-source data fusion according to an embodiment of the present invention includes the following modules: The data processing module is used to acquire multi-source address text data and perform word segmentation, entity recognition and structured parsing to generate a set of nodes and a set of relationships for graph construction. The knowledge graph construction module is used to build an initial heterogeneous graph based on the node set and the relationship set, and to perform entity alignment and relationship completion on the external geographic knowledge graph to form an enhanced knowledge graph; The relation graph convolution feature generation module is used to perform multi-layer relation graph convolution operations on the enhanced knowledge graph to generate node embedding vectors for structural parsing and standard address comparison. The graph isomorphic structure parsing module is used to construct structural chains based on node embedding vectors and generate structural expression vectors through recursive isomorphic parsing. The training and inference module is used to calculate matching feature vectors based on structural representation vectors and node embedding vectors, and to perform similarity calculation, loss calculation and parameter update based on supervised samples, and output address matching results. The online update module is used to receive real-time business feedback, build an incremental sample set, and update the model parameters based on the incremental sample set within a preset update cycle to achieve dynamic model optimization.
[0016] The beneficial effects of this invention are: (1) This invention constructs an enhanced knowledge graph that integrates administrative divisions, spatial adjacency and temporal evolution information, and combines it with a relational graph convolutional network to generate multi-relation features, so that address entities can obtain an embedded representation with hierarchy and relevance in a multi-source data environment. This solves the problem that traditional methods cannot accurately express complex address structures and improves the completeness and consistency of address resolution.
[0017] (2) This invention utilizes the recursive structure parsing capability and multi-hop link modeling mechanism of graph isomorphic networks to jointly learn the representation of address structure chains and cross-relationship paths, enabling the model to capture local structural patterns and cross-relationship link semantics at the same time. This effectively solves the defects of existing technologies, such as difficulty in distinguishing path structures and insufficient expression of link semantics, and improves the overall discriminability and robustness of address matching.
[0018] (3) This invention introduces an online update mechanism, which uses real-time business feedback to build an incremental sample set and periodically update the model parameters, enabling the model to continuously adapt to changes in address naming, administrative division adjustments and user error correction data. This overcomes the problem that traditional offline training mode cannot evolve dynamically and realizes the continuous optimization and real-time adaptability of the address matching model. Attached Figure Description
[0019] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used in conjunction with embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings: Figure 1 This is a flowchart of the spatiotemporal address intelligent matching method based on multi-source data fusion proposed in this invention; Figure 2 This is a structural diagram of the spatiotemporal address intelligent matching system based on multi-source data fusion proposed in this invention; Figure 3This is a flowchart illustrating the knowledge graph construction process of the spatiotemporal address intelligent matching method based on multi-source data fusion proposed in this invention. Detailed Implementation
[0020] The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic diagrams, illustrating only the basic structure of the invention, and therefore only show the components relevant to the invention.
[0021] refer to Figure 1-3 A spatiotemporal address intelligent matching method and system based on multi-source data fusion includes the following steps: S1. Obtain multi-source address text data, perform preprocessing, construct an initial heterogeneous graph, and define address relationships as edge types; S2. Introduce external geographic knowledge graphs to perform entity alignment and relationship completion on nodes in the initial heterogeneous graph, and integrate administrative divisions, historical evolution and spatial adjacency information to generate an enhanced knowledge graph; S3. Construct a relational graph convolutional network based on the enhanced knowledge graph, define transformation matrices for different relations, perform relation-aware convolution on the nodes in the graph, and obtain the address embedding vector and the standard address embedding vector. S4. Embed the address into the vector input graph isomorphic network, and use the recursive matching mechanism of the isomorphic subgraph to recursively expand and parse the local isomorphic pattern of the structural chain generated in the enhanced knowledge graph to generate the structural expression vector. S5. Using the structural representation vector and the relationship chain extracted from the enhanced address knowledge graph as input, multi-hop link modeling is performed on the relationship chain to obtain the link representation vector, and a matching feature vector is generated based on the joint calculation of the structural representation vector and the link representation vector. S6. Calculate the similarity between the matching feature vector and the standard address embedding vector, and use supervised signals to optimize the relational graph convolutional network and graph isomorphic network to complete the training of the address matching model. S7. Deploy the trained address matching model in a real-time address processing environment, perform matching on the input address and output the standard address matching result, generate incremental samples from the real-time feedback, and periodically update the model parameters online using the incremental samples.
[0022] In this embodiment, S1 specifically includes: S11. Receive multi-source address text data, which comes from user input records, business history records and external address databases, and divide the data from each source into independent data entries according to time stamps. S12. Perform word segmentation on each address text, and segment the place names, road names, community names, building numbers and house numbers in the text into word units to obtain a structured word unit sequence; S13. Perform entity recognition operation on the structured word sequence, identify provincial entities, municipal entities, district entities, street entities, community entities, building and door number entities according to the preset address entity label table, and assign a unique entity number to the identified entities. S14. Construct a node set based on entity number, and construct address entity nodes, geographical unit nodes and attribute nodes respectively for content belonging to administrative division, building unit, geographical region and address text fragment; S15. Extract structural location information from the word sequence, construct address inclusion relationship based on adjacency order, construct spatial adjacency relationship based on geographical rules, and construct time evolution relationship based on data timestamps. Record the above relationships as an address relationship set in the form of edges. S16. Based on the set of nodes and the set of address relationships, construct an initial heterogeneous graph according to the data structure format of the heterogeneous graph. The initial heterogeneous graph includes node identifiers, node types, edge types, and relationship directions.
[0023] In this embodiment, S2 specifically includes: S21. Obtain the administrative division entities, road entities, and community entities and their spatial adjacency relationships from the external geographic knowledge graph. Compare the entity numbers in the external knowledge graph with the node numbers in the initial heterogeneous graph to generate entity alignment results. Load the standard name, alias name, and entity number of the entity into the entity name mapping table. Perform regularization processing on each node name in the initial heterogeneous graph. Regularization processing includes removing administrative suffixes, removing spaces, and converting traditional Chinese characters to simplified Chinese characters to obtain regularized names. Compare the standard name and alias name in the mapping table based on the regularized names to form entity alignment results. S22. Based on the entity alignment results, complete the administrative region hierarchy, geographical inclusion, and spatial adjacency relationships in the external knowledge graph to the missing relationship positions in the initial heterogeneous graph. S23. Based on the completed relationships, merge the node attributes in the initial heterogeneous graph and the entity attributes in the external knowledge graph to form an enhanced knowledge graph that includes administrative division levels, historical evolution relationships, and spatial adjacency relationships. When merging node attributes, attribute fields of the same entity are selected according to a preset priority. Among them, the administrative division level and spatial adjacency attributes provided by the external knowledge graph have higher priority than the attributes in the initial heterogeneous graph. Non-conflicting fields are merged in a set manner, and conflicting fields are updated according to a priority strategy.
[0024] In this embodiment, S3 specifically includes: S31. Read the node set and relation set from the enhanced knowledge graph, and map the relation type identifier of each relation to the relation encoding of the relation graph convolutional network; the initial feature information of the node is composed of word segmentation index features generated by structured word sequence, entity category features generated by address entity label table, sequence number features generated by structural position information, entity number features generated by entity alignment operation, administrative division level features generated by administrative division level relation completion process, spatial adjacency attribute features generated by spatial adjacency relation completion process, and node type features generated by node construction process. Each feature is combined according to a fixed encoding method to form the initial feature vector of the node, and input into the relation graph convolutional network; S32. Based on the relation encoding, set an independent transformation matrix for each relation type, and input the initial features of the nodes into the transformation matrix to generate relation feature representations; S33. In each layer of the relation graph convolutional network, according to the aggregation order of relation feature representation, the features of adjacent nodes belonging to different relation types are weighted and aggregated with the features of the current node, and the convolution result of the address node is output. In this embodiment, the aggregation in S33 specifically includes: S331. Aggregation Order: Based on the relation type identifier recorded for each edge in the enhanced knowledge graph, adjacent nodes belonging to different relation types are divided into relation groups; each relation group is traversed in a fixed order, which is based on the relation type encoding from smallest to largest; when processing each relation group, the features of all adjacent nodes in the group are linearly transformed with the transformation matrix of the corresponding relation type in sequence, and then traversed and aggregated in the order of node index from smallest to largest; after completing the aggregation of all relation groups, the self-loop features of the current node are processed, and the features of the current node are input into the self-loop transformation matrix as the final self-loop group aggregation result; S332. Aggregation rule: For all adjacent node features within the same relation group, a summation aggregation method is used to sum the linearly transformed adjacent node feature vectors according to the group; the aggregation result for each relation type is multiplied by the corresponding relation normalization coefficient, which is determined by the reciprocal of the number of adjacent nodes under that relation type; the weighted aggregation results of all relation types are added in the aggregation order to form a relation aggregation vector; the relation aggregation vector is added to the linear transformation result of the self-loop features to form the output feature vector of the current node in this convolutional layer; the output feature vector of the current convolutional layer is input into the next layer of the relation graph convolutional network, and the above aggregation process is repeated until the calculation of all convolutional layers is completed; S34. Repeatedly perform relation feature transformation and aggregation operations through several layers of relation graph convolutional networks to obtain address embedding vectors and standard address embedding vectors.
[0025] In this embodiment, S4 specifically includes: S41. For each address embedding vector, retrieve administrative division nodes and building-related nodes that are associated with the address node from the enhanced knowledge graph. Arrange the nodes in order to form a structural chain according to the administrative division from high to low and the building level from outside to inside. Only one node identifier and the corresponding address embedding vector are retained at each position in the structural chain. In this embodiment, the retrieval method in S41 is as follows: Based on the hierarchical relationship of administrative divisions in the enhanced knowledge graph, starting from the target address node, the higher-level administrative nodes are visited sequentially along the upward edge of the administrative division until the top-level node of the administrative structure is reached. All visited administrative nodes are recorded as a set of administrative division nodes. The administrative divisions are arranged in descending order as follows: provincial level, prefecture-level city level, district / county level, street / township level, and community / village level. The relevant administrative nodes are arranged according to this fixed hierarchical order. Based on the geographic inclusion relationships in the enhanced knowledge graph, starting from the target address node, the system traverses downwards along the inclusion relationship edges, with the access order proceeding from the outermost to the innermost layer of the building. The accessed building-related nodes are recorded as a set of building nodes. The building hierarchy from the outside to the inside is as follows: community, building, unit, floor, room number. The relevant building nodes are arranged in this fixed order. The set of administrative division nodes and the set of building nodes are arranged in a fixed order from high to low administrative division level and from outside to inside building level. The resulting node sequence is used as the node order of the structural chain.
[0026] In this embodiment, the method of retaining a node in S41 is as follows: For each candidate administrative node, read its standard name; calculate the similarity between the text fragment of the corresponding level in the address text and the standard name of each candidate node, using the longest common subsequence similarity formula; select the candidate node with the highest similarity score as the final administrative node of the level; if there is a tie for the highest similarity, select the node with the smallest administrative division code number as the final node. For each candidate node, read the building number or naming field; calculate the strict text matching degree between the corresponding level words in the address text and the building number of the candidate node, and prioritize the candidate nodes with completely identical text; if there are no completely identical ones, select the candidate node with the smallest distance in order of increasing edit distance; if there are still ties, select the candidate node with the most edges connected to the node at the previous level in the enhanced knowledge graph as the final node.
[0027] S42. Construct isomorphic subgraphs by sliding along the structural chain with a fixed window length. Each isomorphic subgraph consists of several adjacent nodes in the structural chain and the edge connections between these nodes in the enhanced knowledge graph. Save the node embedding vectors and edge connections of all isomorphic subgraphs as a set of isomorphic subgraphs. In this embodiment, the window length is set to 3. Take each node in the structural chain as the starting position of the window and cut off 3 consecutive nodes to form a set of window nodes. Extract the edge connections between the nodes in the set of window nodes from the enhanced knowledge graph to form an isomorphic subgraph. The window moves backward one node at a time until it is impossible to cut off 3 consecutive nodes. All constructed isomorphic subgraphs constitute a set of isomorphic subgraphs.
[0028] S43. For each isomorphic subgraph in the isomorphic subgraph set, according to a fixed node traversal order, the embedding vector of each node in the subgraph is summed with the embedding vector of its adjacent nodes. The summation result is input into a preset nonlinear mapping function to obtain the local pattern representation vector of the isomorphic subgraph. All local pattern representation vectors are saved according to their starting position index in the structural chain. Adjacent nodes refer to nodes in the enhanced knowledge graph that are directly connected to the current node through at least one edge. To ensure the independence of the isomorphic subgraph structure, only the adjacency relationship between nodes within the window is retained. By reading the outgoing and incoming edges of the nodes and filtering out the nodes belonging to the current window node set, an adjacent node set is formed. The adjacent nodes are arranged in ascending order of node identifier, and the node embedding vector is added to the adjacent node embedding vector in sequence to form an aggregate vector. In this embodiment, the nonlinear mapping function is the ReLU function. S44. Starting from the bottom of the structure chain, for a set of local pattern representation vectors corresponding to the same starting position index, add them sequentially according to the hierarchical order of the structure chain and divide them by the number of vectors to obtain the average vector. Use the average vector as the intermediate structure representation vector of the current level and use the intermediate structure representation vector as the input for the calculation of the next level. Repeat this process from bottom to top along the structure chain until all levels of the structure chain have been traversed. S45. After completing the bottom-up recursive aggregation of all levels, the intermediate structure representation vector corresponding to the top-level position in the structure chain and the original address embedding vector of the address node are linearly weighted and summed according to preset weight coefficients. The weighted sum is used as the structure representation vector of the address node. All weights are randomly initialized according to a uniform distribution from -0.01 to 0.01. In this embodiment, S5 specifically includes: S51. Starting from the address node in the enhanced knowledge graph, expand outward along geographical inclusion relationships, spatial adjacency relationships, and temporal evolution relationships. Record the identifiers of all nodes in the expansion path according to the node access order, and form a relationship chain with each expansion path. Each node in the relationship chain is represented by a node embedding vector generated by the relationship graph convolutional network. The maximum number of hops for the expansion path is set to a fixed integer. When the relationship chain expansion reaches the maximum number of hops or there are no more nodes to expand, the path growth is terminated. In this embodiment, it is set to 2. S52. For each relation chain, traverse sequentially starting from the first node in the chain, concatenate the current node's embedding vector with the previous node's embedding vector, input the concatenated vector into a linear transformation function to generate a hop count feature vector corresponding to the hop count, and assemble all hop count feature vectors into a link feature sequence in hop count order; in the first hop of the relation chain, the previous node's embedding vector is set to zero and concatenated with the current node's embedding vector; the weight matrix of the linear transformation function is initialized using uniformly distributed random sampling, and the bias vector is set to 0; S53. Calculate the dot product between each hop count feature vector and the structural representation vector in the link feature sequence to obtain the attention score of the hop count feature, and normalize all attention scores to form attention weights; in this embodiment, the Softmax function is used for normalization. S54. Multiply each hop count feature vector in the link feature sequence with the corresponding attention weight and then add them together to generate the link representation vector of the relationship chain; S55. For all relational chains of the same address node, add the link representation vectors of each relational chain in ascending order of relational chain length, and divide by the number of relational chains to obtain the link aggregation vector; when two relational chains have the same length, sort them according to the lexicographical order of the node identifier sequence in the relational chain. S56. Perform a linear weighted summation of the link aggregation vector and the structure expression vector according to a preset ratio coefficient, and use the weighted summation result as the matching feature vector of the address node.
[0029] In this embodiment, S6 specifically includes: S61. Calculate the cosine similarity between the matching feature vector and the standard address embedding vector, and construct a loss function based on the similarity and matching labels; the loss function is set as the cross-entropy loss function. S62. Perform backpropagation on the loss function, and update the linear transformation weight parameters in the relational graph convolutional network, graph isomorphic network, and link modeling steps based on the gradient values of the loss function with respect to the network parameters; the parameter update method is stochastic gradient descent. S63. Repeat the similarity calculation and parameter update. Stop training when the decrease value of the loss function is less than the preset convergence threshold in a number of consecutive iterations. In this embodiment, the number of consecutive iterations is set to 10 and the loss decrease threshold is set to 10-4. S64. After training stops, save all network parameters to form a trained address matching model.
[0030] In this embodiment, S7 specifically includes: S71. Deploy the trained address matching model to the real-time address processing environment, calculate the matching feature vector for the input address data and output the standard address matching result; S72. Collect user error correction information and business feedback information generated in the real-time address processing environment, and construct an incremental sample set based on the feedback content; S73. Perform vector similarity calculation and loss function calculation on the incremental sample set and the current parameters of the model, and use the gradient value of the loss function to update the linear transformation weight parameters in the relational graph convolutional network, graph isomorphic network and link modeling steps; S74. Repeatedly execute parameter updates according to the preset update cycle. When the update cycle reaches the set condition, write the updated parameters into the address matching model to complete the online model update. In this embodiment, the system automatically triggers the model update process once every 24 hours, using the incremental sample set collected in the previous cycle to perform similarity calculation and parameter update operations. When the number of consecutive parameter updates reaches 7, or the loss function decreases below 10⁻⁴, write all updated network parameters into the model to replace the currently used parameter version. After writing the new version of parameters, reset the parameter update count and enter the next 24-hour update cycle.
[0031] In this embodiment, the relational graph convolutional network specifically includes: The input layer receives the initial feature vectors of nodes in the enhanced knowledge graph and constructs a multi-relationship adjacency structure based on the relationship types between nodes. The relation transformation layer sets an independent relation transformation matrix for each relation type and adds a chain order encoding vector to each relation transformation matrix; The relation aggregation layer performs weighted aggregation based on the chain order encoding vector. It calculates the weighting coefficients for adjacent nodes that belong to the same relation type but are in different relation chain positions, and adds the node feature vector to the weighted aggregation result to obtain the node's convolution output vector. The multi-layer stacked structure repeats the relation transformation layer and relation aggregation layer in a fixed order for several layers. Node features propagate along several relation chains in the enhanced knowledge graph to generate node embedding vectors.
[0032] In this embodiment, the graph isomorphic network specifically includes: The isomorphic subgraph input layer receives several isomorphic subgraphs constructed from structural chains according to a fixed window length, and uses the node embedding vectors and edge connection relationships in each isomorphic subgraph as input data for the isomorphic subgraph. The local pattern computation layer adds the node embedding vector and the neighboring node embedding vector in a fixed traversal order of the window nodes, and inputs the sum into a nonlinear mapping function to generate a local pattern representation vector. The recursive aggregation layer adopts a bottom-up recursive aggregation method, using the intermediate structure representation vector as the input of the previous level of recursive aggregation; The global structure generation layer performs a linear weighted summation of the intermediate structure representation vector and the initial embedding vector of the address node according to a preset scaling factor to obtain the structure representation vector.
[0033] The spatiotemporal address intelligent matching system based on multi-source data fusion according to an embodiment of the present invention includes the following modules: The data processing module is used to acquire multi-source address text data and perform word segmentation, entity recognition and structured parsing to generate a set of nodes and a set of relationships for graph construction. The knowledge graph construction module is used to build an initial heterogeneous graph based on the node set and the relationship set, and to perform entity alignment and relationship completion on the external geographic knowledge graph to form an enhanced knowledge graph; The relation graph convolution feature generation module is used to perform multi-layer relation graph convolution operations on the enhanced knowledge graph to generate node embedding vectors for structural parsing and standard address comparison. The graph isomorphic structure parsing module is used to construct structural chains based on node embedding vectors and generate structural expression vectors through recursive isomorphic parsing. The training and inference module is used to calculate matching feature vectors based on structural representation vectors and node embedding vectors, and to perform similarity calculation, loss calculation and parameter update based on supervised samples, and output address matching results. The online update module is used to receive real-time business feedback, build an incremental sample set, and update the model parameters based on the incremental sample set within a preset update cycle to achieve dynamic model optimization.
[0034] Example 1: To verify the feasibility of this invention in practice, it was applied to a typical large-scale address processing scenario. This scenario involves a large amount of address text data from multiple channels, including user-inputted non-normalized addresses, historical address representations, geographical entity names with aliases or abbreviations, and address formats containing time-varying information. Traditional methods in such scenarios generally face problems such as high matching error rates, difficulty in handling hierarchical changes, and inability to utilize multi-relationship structure information. This invention aims to address these pain points, enabling stable, accurate, and continuously optimized matching results for complex addresses in a multi-source environment.
[0035] In this scenario, multi-source address text data from the business operating system, historical databases, and other external sources are collected and processed through word segmentation, entity recognition, and structuring to form initial nodes and relational structures. Subsequently, administrative division information, road structures, historical evolution records, and other content from an external geographic knowledge base are integrated into the initial heterogeneous graph, forming an enhanced knowledge graph with more complete structural representation capabilities. Based on this graph, the relational graph convolutional network of this invention performs independent transformations on different relation types and introduces chain order encoding, enabling nodes at different chain positions to obtain distinguishable feature representations. The graph isomorphic network performs recursive pattern parsing on the structural chains generated in the graph, enabling nodes to obtain complete hierarchical structural semantics. A multi-hop link modeling mechanism further performs semantic modeling on cross-relational paths, allowing the final matching features to simultaneously express link structure, node semantics, and hierarchical information.
[0036] In practical applications, this invention inputs the processed address data into the model, which outputs the matching probability between the data and the standard address. The system automatically selects the standard address with the highest matching probability as the final result. For user feedback and corrections during actual use, this invention automatically constructs incremental samples and performs periodic online updates, enabling the model to continuously learn new address representations. Through this mechanism, the model can not only handle known structures but also adaptively adjust to situations such as naming changes, region splitting, or merging.
[0037] To further demonstrate the effectiveness of this invention, the performance of traditional methods and the method of this invention was compared in real-world business scenarios. Based on real address data, various metrics were statistically analyzed, including accuracy, multi-relationship chain utilization, complex address recognition rate, accuracy improvement before and after online updates, and average processing latency. The test data included over 100,000 unstructured address texts, containing a large number of address types with historical names, aliases, and cross-level jump descriptions. Under unified testing conditions, the accuracy of this invention in complex address recognition scenarios significantly outperformed traditional solutions, especially in areas with complex multi-relationship chain structures. The online update mechanism of this invention also improves the model's stability under dynamic address changes, further enhancing accuracy after several iterations.
[0038] Based on the above tests, it is evident that this invention has advantages in large-scale, multi-source, and dynamically changing address environments, effectively improving the accuracy and stability of address matching tasks, and can continue to evolve in practical applications.
[0039] Table 1: Model Performance Comparison
[0040] As can be seen from the data in Table 1, this invention outperforms traditional methods in all core performance metrics. Regarding address matching accuracy, this invention enhances structural representation capabilities through an enhanced knowledge graph, relational graph convolutional network, and graph isomorphism parsing mechanism, increasing accuracy from 87.2% to 96.8%, a 9.6 percentage point increase. In terms of multi-relationship chain utilization, this invention effectively utilizes cross-relationship link features through chain order encoding and chain order weighted aggregation mechanisms, significantly improving this metric from 42.5% to 91.3%, fully demonstrating its ability in complex structure modeling. In complex address recognition scenarios, this invention also demonstrates advantages, increasing the recognition rate from 78.1% to 94.6%, effectively solving the problem that traditional methods cannot handle aliases, historical names, and cross-level jump descriptions.
[0041] The online update mechanism of this invention brings additional performance improvements to the model. After several online updates, the overall address matching accuracy improved by an additional 3.7%, demonstrating that the model can learn new address representations in a timely manner and continuously optimize itself. Although the model structure is more complex, due to the efficient feature aggregation mechanism, the average processing latency of this invention is still lower than that of traditional solutions, decreasing from 63.4ms to 58.1ms, demonstrating better real-time processing capabilities. In summary, this invention demonstrates advantages in accuracy, structural understanding, dynamic update capability, and processing efficiency.
[0042] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.
Claims
1. A spatiotemporal address intelligent matching method based on multi-source data fusion, characterized in that, Includes the following steps: S1. Obtain multi-source address text data, perform preprocessing, construct an initial heterogeneous graph, and define address relationships as edge types; S2. Introduce external geographic knowledge graphs to perform entity alignment and relationship completion on nodes in the initial heterogeneous graph, and integrate administrative divisions, historical evolution and spatial adjacency information to generate an enhanced knowledge graph; S3. Construct a relational graph convolutional network based on the enhanced knowledge graph, define transformation matrices for different relations, perform relation-aware convolution on the nodes in the graph, and obtain the address embedding vector and the standard address embedding vector. S4. Embed the address into the vector input graph isomorphic network, and use the recursive matching mechanism of the isomorphic subgraph to recursively expand and parse the local isomorphic pattern of the structural chain generated in the enhanced knowledge graph to generate the structural expression vector. S5. Using the structural representation vector and the relationship chain extracted from the enhanced address knowledge graph as input, multi-hop link modeling is performed on the relationship chain to obtain the link representation vector, and a matching feature vector is generated based on the joint calculation of the structural representation vector and the link representation vector. S6. Calculate the similarity between the matching feature vector and the standard address embedding vector, and use supervised signals to optimize the relational graph convolutional network and graph isomorphic network to complete the training of the address matching model. S7. Deploy the trained address matching model in a real-time address processing environment, perform matching on the input address and output the standard address matching result, generate incremental samples from the real-time feedback, and periodically update the model parameters online using the incremental samples.
2. The spatiotemporal address intelligent matching method based on multi-source data fusion according to claim 1, characterized in that, S2 specifically includes: S21. Obtain the administrative division entities, road entities, community entities and spatial adjacency relationships in the external geographic knowledge graph, compare the entity numbers in the external knowledge graph with the node numbers in the initial heterogeneous graph, and generate entity alignment results. S22. Based on the entity alignment results, complete the administrative region hierarchy, geographical inclusion, and spatial adjacency relationships in the external knowledge graph to the missing relationship positions in the initial heterogeneous graph. S23. Based on the completion of the relationship, merge the node attributes in the initial heterogeneous graph and the entity attributes in the external knowledge graph to form an enhanced knowledge graph that includes administrative division levels, historical evolution relationships and spatial adjacency relationships.
3. The spatiotemporal address intelligent matching method based on multi-source data fusion according to claim 2, characterized in that, S3 specifically includes: S31. Read the node set and relation set from the enhanced knowledge graph, and map the relation type identifier of each type of relation to the relation encoding of the relation graph convolutional network; S32. Based on the relation encoding, set an independent transformation matrix for each relation type, and input the initial features of the nodes into the transformation matrix to generate relation feature representations; S33. In each layer of the relation graph convolutional network, according to the aggregation order of relation feature representation, the features of adjacent nodes belonging to different relation types are weighted and aggregated with the features of the current node, and the convolution result of the address node is output. S34. Repeatedly perform relation feature transformation and aggregation operations through several layers of relation graph convolutional networks to obtain address embedding vectors and standard address embedding vectors.
4. The spatiotemporal address intelligent matching method based on multi-source data fusion according to claim 3, characterized in that, S4 specifically includes: S41. Retrieve administrative division nodes and building-related nodes associated with address nodes from the enhanced knowledge graph to form a structural chain, and retain a node identifier and address embedding vector at each position. S42. Construct isomorphic subgraphs by sliding according to a fixed window length. Each subgraph consists of adjacent nodes and their connections in the structural chain, and is saved as a set of isomorphic subgraphs. S43. For each isomorphic subgraph, sum the node embedding vector and the neighboring node embedding vector, and input the result into a nonlinear mapping function to obtain the local pattern representation vector. S44. Starting from the bottom layer, take a weighted average of the same set of local pattern representation vectors in hierarchical order to generate intermediate structure representation vectors. S45. The intermediate structure representation vector at the top level is weighted and summed with the original embedding vector of the address node to obtain the structure representation vector.
5. The spatiotemporal address intelligent matching method based on multi-source data fusion according to claim 4, characterized in that, S5 specifically includes: S51. Starting from the address node in the enhanced knowledge graph, expand outward along the geographical inclusion relationship, spatial adjacency relationship and temporal evolution relationship to form a relationship chain. Each node is represented by an embedding vector generated by a convolutional network. S52. Traverse the relationship chain, concatenate the embedding vectors of the current node and the previous node in turn, and generate the hop count feature vector to form the link feature sequence. S53. Calculate the dot product of the link features and the structural representation vector to obtain the attention score and normalize it; S54. Multiply each hop count feature vector in the link feature sequence with the corresponding attention weight and sum them to generate the link representation vector of the relationship chain; S55. Add the link representation vectors of all relational chains in order to obtain the link aggregation vector; S56. Perform a linear weighted summation of the link aggregation vector and the structure expression vector according to a preset ratio coefficient, and use the weighted summation result as the matching feature vector of the address node.
6. The spatiotemporal address intelligent matching method based on multi-source data fusion according to claim 5, characterized in that, S6 specifically includes: S61. Calculate the cosine similarity between the matching feature vector and the standard address embedding vector, and construct a loss function based on the similarity and matching labels; S62. Perform backpropagation on the loss function and update the linear transformation weight parameters in the relational graph convolutional network, graph isomorphic network, and link modeling steps based on the gradient values of the loss function with respect to the network parameters. S63. Repeat the similarity calculation and parameter update. Stop training when the decrease value of the loss function is less than the preset convergence threshold in several consecutive iterations. S64. After training stops, save all network parameters to form a trained address matching model.
7. The spatiotemporal address intelligent matching method based on multi-source data fusion according to claim 6, characterized in that, Specifically, S7 includes: S71. Deploy the trained address matching model to the real-time address processing environment, calculate the matching feature vector for the input address data and output the standard address matching result; S72. Collect user error correction information and business feedback information generated in the real-time address processing environment, and construct an incremental sample set based on the feedback content; S73. Perform vector similarity calculation and loss function calculation on the incremental sample set and the current parameters of the model, and use the gradient value of the loss function to update the linear transformation weight parameters in the relational graph convolutional network, graph isomorphic network and link modeling steps; S74. Repeatedly perform parameter updates according to the preset update cycle. When the update cycle reaches the set condition, write the updated parameters into the address matching model to complete the online model update.
8. The spatiotemporal address intelligent matching method based on multi-source data fusion according to claim 7, characterized in that, The relational graph convolutional network specifically includes: The input layer receives the initial feature vectors of nodes in the enhanced knowledge graph and constructs a multi-relationship adjacency structure based on the relationship types between nodes. The relation transformation layer sets an independent relation transformation matrix for each relation type and adds a chain order encoding vector to each relation transformation matrix; The relation aggregation layer performs weighted aggregation based on the chain order encoding vector. It calculates the weighting coefficients for adjacent nodes that belong to the same relation type but are in different relation chain positions, and adds the node feature vector to the weighted aggregation result to obtain the node's convolution output vector. The multi-layer stacked structure repeats the relation transformation layer and relation aggregation layer in a fixed order for several layers. Node features propagate along several relation chains in the enhanced knowledge graph to generate node embedding vectors.
9. The spatiotemporal address intelligent matching method based on multi-source data fusion according to claim 8, characterized in that, The graph isomorphic network specifically includes: The isomorphic subgraph input layer receives several isomorphic subgraphs constructed from structural chains according to a fixed window length, and uses the node embedding vectors and edge connection relationships in each isomorphic subgraph as input data for the isomorphic subgraph. The local pattern computation layer adds the node embedding vector and the neighboring node embedding vector in a fixed traversal order of the window nodes, and inputs the sum into a nonlinear mapping function to generate a local pattern representation vector. The recursive aggregation layer adopts a bottom-up recursive aggregation method, using the intermediate structure representation vector as the input of the previous level of recursive aggregation; The global structure generation layer performs a linear weighted summation of the intermediate structure representation vector and the initial embedding vector of the address node according to a preset scaling factor to obtain the structure representation vector.
10. A spatiotemporal address intelligent matching system based on multi-source data fusion, applied to the spatiotemporal address intelligent matching method based on multi-source data fusion as described in any one of claims 1 to 9, characterized in that, Includes the following modules: The data processing module is used to acquire multi-source address text data and perform word segmentation, entity recognition and structured parsing to generate a set of nodes and a set of relationships for graph construction. The knowledge graph construction module is used to build an initial heterogeneous graph based on the node set and the relationship set, and to perform entity alignment and relationship completion on the external geographic knowledge graph to form an enhanced knowledge graph; The relation graph convolution feature generation module is used to perform multi-layer relation graph convolution operations on the enhanced knowledge graph to generate node embedding vectors for structural parsing and standard address comparison. The graph isomorphic structure parsing module is used to construct structural chains based on node embedding vectors and generate structural expression vectors through recursive isomorphic parsing. The training and inference module is used to calculate matching feature vectors based on structural representation vectors and node embedding vectors, and to perform similarity calculation, loss calculation and parameter update based on supervised samples, and output address matching results. The online update module is used to receive real-time business feedback, build an incremental sample set, and update the model parameters based on the incremental sample set within a preset update cycle.