Migration learning method for heterogeneous users

A transfer learning and user-oriented technology, which is applied in the field of transfer learning for heterogeneous users, can solve the problems of effect impact, accuracy cannot be guaranteed, and classification accuracy is difficult to guarantee, so as to reduce the risk of privacy leakage and meet the requirements of high classification accuracy , the effect of improving the classification accuracy

Pending Publication Date: 2021-01-22
FUJIAN NORMAL UNIV
0 Cites 0 Cited by

AI-Extracted Technical Summary

Problems solved by technology

However, in real application scenarios, traditional machine learning methods still cannot fully apply the requirements
On the one hand, it is relatively difficult to obtain labeled data
Most of the data generated in life does not contain labels, and the cost of manual labeling is too high; and data collection often has to consider personal privacy and security issues, which also increases the difficulty of data acquisition in this step
On the other hand, traditional machine learning needs to rebuild the model and train every time the data is updated, which consumes a lot of time and resources
[0003] Migration learning alleviates the data pressure of traditional machine learning to a certain extent, but it is not possible to carry out migration learning in any situation, and the effect of "migration" is also affected by many factors
Most of the current research uses random source domain data, resulting in low classification accuracy, and cannot adapt to us...
View more

Method used

[0082] The present invention adopts the above technical solution. First, the server and other participants will not obtain the original data, which reduces the risk of privacy leakage to a certain extent. Secondly, through domain delimitation and secondary dimensionality reduction screening, the sample data has a higher correlation with the classification target, can adapt to user heterogeneity, and the classification effect is better, which can largely meet the demand for high classification accuracy. On the other hand, Softmax and CNN's cyclic dual classification algorithm, supervised learning guides unsupervised learning, and improves the classification accuracy of insufficiently labeled data. The present invention selects and demarcates the source domain and the target domain with the data obtained from multiple channels at the local end, so as to ensure sufficient data volume for transfer learning. On this basis, it meets the requirements of multi-objective output and improves the classification accuracy....
View more

Abstract

The invention discloses a migration learning method for heterogeneous users, a server and other participants. Original data cannot be obtained, and the risk of privacy disclosure is reduced to a certain extent. By domain delimitation and secondary dimension reduction screening, the correlation between the sample data and the classification target is higher, the invention can adapt to user heterogeneity, the classification effect is better, and the requirement for high classification accuracy can be met to a great extent. On the other hand, a Softmax and CNN cyclic double-classification algorithm guides unsupervised learning through supervised learning, and improves the classification accuracy of label deficiency data. Selection and delimitation of a source domain and a target domain are carried out on data obtained by multiple channels of a local end, so that enough data volume is guaranteed for transfer learning. On the basis, the requirement of multi-target output is met, and the classification accuracy is improved.

Application Domain

Character and pattern recognitionNeural architectures +2

Technology Topic

Multiple targetDimensionality reduction +6

Image

  • Migration learning method for heterogeneous users
  • Migration learning method for heterogeneous users
  • Migration learning method for heterogeneous users

Examples

  • Experimental program(1)

Example Embodiment

[0043]To make the objectives, technical solutions, and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the drawings in the embodiments of the present application.
[0044]Such asfigure 1 orfigure 2As shown, the present invention discloses a migration learning method for heterogeneous users, which includes the following steps:
[0045]Step 1. Participants perform data collection and primary processing on the local side to achieve the first data dimensionality reduction.
[0046]Specifically, step 1 includes the following steps:
[0047]Step 1. Participants perform data collection and primary processing on the local side to achieve the first data dimensionality reduction.
[0048]Step 2. According to the needs of the participants, the server selects and delimits the source domain and the target domain to achieve the second data dimensionality reduction.
[0049]Step 3. Use the S-CNN circular classification algorithm for classification.
[0050]Further, as a preferred embodiment, the specific steps of step 1 are:
[0051]Step 1-1, participants use raw data X locallyn×h , Calculate the data covariance matrix F: Where n is the number of entries of the participant's local raw data, and h is the dimension of the data;
[0052]Step 1-2, according to |λE-F|=0, calculate all its eigenvalues ​​λ and their corresponding eigenvectors μ, where E is the identity matrix;
[0053]Steps 1-3, for the characteristic value λi(λi∈λ) sort, and select the number of principal components according to a predetermined threshold r;
[0054]Step 1-4, output the feature vector set corresponding to the first r feature values ​​(μ1,μ2,...,Μr), and calculate the modulus of the eigenvectors, unitize the corresponding r eigenvectors to form an eigen matrix A;
[0055]Step 1-5, calculate the projection matrix X'n×r=Xn×h A(r
[0056]Steps 1-6, the server receives and stores the locally reduced data set uploaded by all participants. Form a data pool Where X′vRepresents the sample data matrix uploaded by the v-th participant in the data pool, and N represents the number of participants; 3. A migration learning method for heterogeneous users according to claim 1, characterized in that: the specifics of step 2 The steps are:
[0057]Step 2-1, participant u uploads its classification requirements (Nu,Mu,accu), where NuIs the number of source domains, MuIs the number of categories, accuIndicates the lowest classification accuracy;
[0058]Step 2-2, the server calculates each data sample matrix X in the data poolv'Data sample uploaded with participant u XuThe correlation of 'is calculated as follows:
[0059]
[0060]Among them, I represents the data matrix Xv'And Xu'The correlation between; x'represents a set of data of the data matrix X'; P(xv',xu') means xv'And xu'Joint probability distribution of two sets of data; P(x) represents the probability distribution of data x; P(xv'|xu') means data xvIn data xuThe probability distribution of'; KL stands for distance, which is short for Kullback-Leibler difference;
[0061]Step 2-3, according to the correlation I(Xv',Xu'), for Xv'Sort from high to low, and select the top N with high relevance according to the needs of participantsuX as the source domain of this transfer learningS, The participant's sample data Xu'As target domain XT;
[0062]Steps 2-4, perform secondary dimensionality reduction: use the migration component analysis TCA algorithm to map data in multiple fields to the same dimensional space. Feature mapping is accompanied by a reduction in the number of features, and finally a new data feature sample will be obtained matrix.
[0063]Further, as a preferred embodiment, the specific steps of steps 2-4 are:
[0064]Step 2-4-1, define the kernel matrix K: calculate X separatelyS, XTAnd the kernel matrix K of the two composite domainsS,S , KT,T , KT,S And KS, T , And then use formula (1) to construct the kernel matrix K;
[0065]
[0066]Where K is a (n1+n2)×(n1+n2) Of the matrix, n1And n2Respectively XSAnd XTThe number of samples mapped to the regenerated nuclear Hilbert space (RKHS);
[0067]Step 2-4-2, using the empirical kernel mapping function, decompose the kernel matrix K into K=(KK-1/2)(K-1/2K);
[0068]Step 2-4-3, according to K, calculate the characteristic distance between the source domain and the target domain according to formula 2:
[0069]Dist(XS,XT)=tr(KL) (2)
[0070]Among them, tr(KL) represents the trace of the matrix KL.
[0071]Steps 2-4-4, calculate W according to equation 3:
[0072]
[0073]among them, Means XSAnd XTThe maximum mean error (MMD) distance between the empirical means of the two domains, namely XSAnd XTKL distance between two domains; Is the result of the empirical kernel mapping of the m-dimensional space of W;
[0074]Step 2-4-5, output the final source domain matrix WXS , And the target domain matrix WXT;
[0075]Further, as a preferred embodiment, the specific classification steps of step 3 are:
[0076]Step 3-1, initialization: first train WXS Label data to get the initialized Softmax classifier;
[0077]Step 3-2, from WXT Select a part of the samples in, and initialize the cycle discrimination times q of this batch of data to 0;
[0078]Step 3-3: Use the Softmax classifier to predict and label the unlabeled sample data, and add pseudo labels to it to obtain the initial classification result;
[0079]Step 3-4, also pass the batch of samples through the CNN classifier to obtain the binary classification result;
[0080]Step 3-5, compare the results of the primary classification with the results of the secondary classification; when the two are inconsistent, set q=q+1, and judge whether q is greater than the threshold Q, if so, delete this batch of data and return to step 3- 2. Otherwise, return to step 3-3;
[0081]Steps 3-6, use the sample data to train the Softmax classifier, obtain the classification accuracy acc and combine the classification accuracy acc with the demand accuracy acc of participant iuCompared; when the classification accuracy acc is greater than accu, Then output the classification accuracy acc and the classification result; otherwise, return to step 3-3.
[0082]The present invention adopts the above technical solutions. First, the server and other participants will not obtain the original data, which reduces the risk of privacy leakage to a certain extent. Secondly, through domain delimitation and secondary dimensionality reduction screening, the sample data has a higher correlation with the classification target, can adapt to the heterogeneity of users, has a better classification effect, and can greatly meet the needs of high classification accuracy. On the other hand, the cyclic double classification algorithm of Softmax and CNN, supervised learning guides unsupervised learning, and improves the classification accuracy of under-labeled data. The invention selects and delimits the source domain and the target domain for the data obtained through multiple channels at the local end, so as to ensure that the migration learning has a sufficient amount of data. On this basis, it meets the needs of multi-target output and improves the classification accuracy.
[0083]The present invention adopts the above technical solutions. First, the server and other participants will not obtain the original data, which reduces the risk of privacy leakage to a certain extent. Secondly, through domain delimitation and secondary dimensionality reduction screening, the sample data has a higher correlation with the classification target, can adapt to the heterogeneity of users, has a better classification effect, and can greatly meet the needs of high classification accuracy. On the other hand, the cyclic double classification algorithm of Softmax and CNN, supervised learning guides unsupervised learning, and improves the classification accuracy of under-labeled data. The invention selects and delimits the source domain and the target domain for the data obtained through multiple channels at the local end, so as to ensure that the migration learning has a sufficient amount of data. On this basis, it meets the needs of multi-target output and improves the classification accuracy.
[0084]Obviously, the described embodiments are part of the embodiments of the present application, rather than all of the embodiments. In the case of no conflict, the embodiments in the application and the features in the embodiments can be combined with each other. The components of the embodiments of the present application generally described and shown in the drawings herein may be arranged and designed in various different configurations. Therefore, the detailed description of the embodiments of the present application is not intended to limit the scope of the claimed application, but merely represents selected embodiments of the present application. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

File processing method and electronic equipment

PendingCN111651420Aimprove relevance
Owner:VIVO MOBILE COMM CO LTD

Method and device for protecting privacy

InactiveCN104010272AProtect privacy and securityReduce the risk of privacy breaches
Owner:YULONG COMPUTER TELECOMM SCI (SHENZHEN) CO LTD

Concrete strength rapid determination method

PendingCN111413178Aimprove relevancesignificantly high
Owner:CHINA MCC17 GRP

Classification and recommendation of technical efficacy words

  • improve relevance
  • Reduce the risk of privacy breaches

Method and system for carrying out association on users and friends thereof in network community

InactiveCN101446961Aimprove relevancepromote communication
Owner:TENCENT TECH (SHENZHEN) CO LTD

System enabling recommended web site information to be displayed in browser address bar

InactiveCN102982134Aeffective arrangementimprove relevance
Owner:BEIJING QIHOO TECH CO LTD +1

Method for constructing deep convolution neural network model

InactiveCN106934456Aimprove relevanceImplementing Adaptive Incremental Learning
Owner:SHANDONG UNIV OF TECH

A hot news sharing method based on geographical information, an apparatus and a system

InactiveCN103634736Aimprove relevanceEfficient acquisition
Owner:BEIJING YOYO TIANYU SYST TECH

Method and device for protecting privacy

InactiveCN104010272AProtect privacy and securityReduce the risk of privacy breaches
Owner:YULONG COMPUTER TELECOMM SCI (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products