Address text information correlation learning method based on data preprocessing

A technology of data preprocessing and text information, applied in the field of deep learning, can solve the problems of irregular address text, unbalanced number of training set samples, etc., so as to increase comprehension, alleviate the unbalanced number of samples, and improve generalization ability. Effect

Pending Publication Date: 2022-04-08
HANGZHOU DIANZI UNIV
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Aiming at the deficiencies of the prior art, the present invention proposes a method for learning the relevance of address text information based on data preprocessing. First, the address in the pre-training set is masked and pre-processed, and then the pre-trained pre-training set is used to initialize the Pre-train the model to enhance the pertinence of the knowledge learned by the model; then correct and complete the two addresses of the address relationship pair in the training set, and use a specific training set division strategy to divide the entire training set into multiple sub-groups The training set, and then use the pre-trained model to conduct integrated training and prediction for each sub-training set, so as to solve the problem of irregularity of the address text itself and the imbalance of the number of samples in the training set, and finally test the generalization ability of the integrated model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Address text information correlation learning method based on data preprocessing
  • Address text information correlation learning method based on data preprocessing
  • Address text information correlation learning method based on data preprocessing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] Below in conjunction with accompanying drawing, the present invention will be further explained;

[0053] Such as figure 1 As shown, a method for learning the relevance of address text information based on data preprocessing includes the following steps:

[0054] Step 1. Pre-training data masking

[0055] Collect all the individual addresses in the national statistical zoning code and urban-rural division database that contain complete address location information, and randomly cover up the address "Xiasha Street, Qiantang District, Hangzhou City, Zhejiang Province".

[0056] If random words are masked, the original address will cover any discontinuous word with a high probability, and the address information after masking is: "Xiasha[mask, Qiantang District, [mask] City, [mask] City, Zhejiang Province ]road". If it is masked with a special phrase representing location information, the masked words will represent the key elements of the address, such as randomly sele...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an address text information correlation training learning method. Aiming at the problems of lack of pertinence of model pre-training knowledge, non-standardization of address texts and unbalanced sample quantity of a training set, the method comprises the following steps: firstly, carrying out masking preprocessing on addresses in a pre-training set, and then carrying out pre-training on an initialized model by using the preprocessed pre-training set; the method comprises the steps that firstly, a training set is built, then, two addresses of an address relation pair in the training set are corrected and supplemented, a special training set division strategy is used for dividing the whole training set into a plurality of sub-training sets, then, the pre-trained model is used for conducting integrated training and prediction on all the sub-training sets, and finally, the generalization ability of the integrated model is tested.

Description

technical field [0001] The invention relates to the field of deep learning technology, in particular to a method for learning the relevance of address text information based on data preprocessing. Background technique [0002] With the rapid development of the Internet and the Internet of Things technology, there are a wide range of application scenarios in the real world for address text information correlation tasks, such as: geographic location services based on geographic information search, rapid search and positioning of emergency location information, Alignment of different address location information systems, etc. The judging method of address text information correlation is currently mainly a supervised learning method, which can be divided into the following two types: one is to combine the address pairs that need to be judged through special separator characters and input them into the model to directly obtain the classification results; The two addresses in the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N20/00
Inventor 何中杰施渊烈王越胜
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products