Cross-feature federal abnormal data detection method based on isolated forest

An abnormal data detection, isolated technology, applied in the fields of instruments, character and pattern recognition, computer parts, etc., can solve problems such as abnormal data detection of difficult characteristic data

Pending Publication Date: 2021-11-30
HANGZHOU FRAUDMETRIX TECH CO LTD +1
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of the above problems, the embodiment of the present application provides an isolated forest-based cross-feature federation anomaly data detection method, which can solve the problem that it is difficult to make full use of the feature data of samples distributed in different parties and require not to disclose the feature data of all parties. The problem of abnormal data detection based on the characteristic data of the participants

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-feature federal abnormal data detection method based on isolated forest
  • Cross-feature federal abnormal data detection method based on isolated forest
  • Cross-feature federal abnormal data detection method based on isolated forest

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0143] Embodiment 1, model training process

[0144] Assume that institutions A, B, and C all have a common sample of 100,000, of which party A has the 2-dimensional features of the 100,000 samples, institution B has the other 5-dimensional features of the 100,000 samples, and institution C has the 100,000 samples. The other 10-dimensional features, the feature data distribution of the three institutions are shown in Table 1.

[0145] Table 1

[0146]

[0147] The current organization A hopes that the characteristic data of the federal agencies B and C will detect the abnormal data of the above 100,000 samples, so the agency A initiates a federal anomaly detection request to the agencies B and C, and the agencies B and C agree to participate; then the agency A As the initiator, institutions B and C are two of the participants, and the implementation steps of federated anomaly detection are as follows:

[0148] (1) Institution A, as the initiator, first sends the modeling ...

Embodiment 2

[0175] Embodiment 2, prediction embodiment

[0176] (1) Institutions A, B, and C participate at the same time, and the prediction samples are calculated on the leaf nodes divided on each isolated tree in the test isolated forest model, based on the number of training samples on the leaf nodes and the number of layers of the leaf nodes The path length of the sample falling on the leaf node;

[0177] (2), institutions A, B, and C calculate the path length of each prediction sample in all isolated trees according to formula (1), and then calculate their average path length in all isolated trees;

[0178] (3) Institutions A, B, and C calculate the abnormal score of each prediction sample according to the formula (b) based on the average path length of each prediction sample and the number of training samples of a single isolated tree. Finally, institutions A, B, and C Both get the abnormal score of this predicted sample;

[0179] Among them, the path length prediction process of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a cross-feature federal abnormal data detection method and system based on an isolated forest, electronic equipment and a storage medium, and the method comprises the steps that: when each isolated tree in the federal isolated forest model is constructed, an initiator randomly selects one node splitting party from the initiator and each participant as a current node, and the node splitting party completes splitting of the current node; node splitting information is sent to the non-node splitting party for the non-node splitting party to perform the same node splitting, so that isolated trees with the same structure can be constructed by utilizing the feature data stored by each party under the condition that the feature data stored by the party is not leaked; and then an isolated forest can be formed, and abnormal data can be detected from the test sample more comprehensively by using the isolated forest.

Description

technical field [0001] The present application relates to the field of federated learning, in particular to a method, system, electronic device and storage medium for detecting anomalies in cross-feature federated data based on isolated forests. Background technique [0002] An outlier is an extremely individual value in a sample whose value deviates significantly from other observations in the sample to which it belongs. There are usually some outliers in the sample. If all the sample data is brought into the calculation analysis or model training process without analysis, the outliers in it will have a bad influence on the results. Therefore, it is necessary to perform abnormalities before data analysis or processing. Value detection, such as credit risk assessment or fraud risk assessment in the field of risk control, after detection, abnormal users are marked to reduce risks in commercial transactions. [0003] Currently, methods such as an isolated forest model compose...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/24323G06F18/214
Inventor 朱帆孟丹李宏宇李晓林
Owner HANGZHOU FRAUDMETRIX TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products