Unlock instant, AI-driven research and patent intelligence for your innovation.
Cross-feature federal abnormal data detection method based on isolated forest
What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An abnormal data detection, isolated technology, applied in the fields of instruments, character and pattern recognition, computer parts, etc., can solve problems such as abnormal data detection of difficult characteristic data
Pending Publication Date: 2021-11-30
HANGZHOU FRAUDMETRIX TECH CO LTD +1
View PDF0 Cites 4 Cited by
Summary
Abstract
Description
Claims
Application Information
AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology
Problems solved by technology
[0004] In view of the above problems, the embodiment of the present application provides an isolated forest-based cross-feature federation anomaly data detection method, which can solve the problem that it is difficult to make full use of the feature data of samples distributed in different parties and require not to disclose the feature data of all parties. The problem of abnormal data detection based on the characteristic data of the participants
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more
Image
Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
Click on the blue label to locate the original text in one second.
Reading with bidirectional positioning of images and text.
Smart Image
Examples
Experimental program
Comparison scheme
Effect test
Embodiment 1
[0143] Embodiment 1, model training process
[0144] Assume that institutions A, B, and C all have a common sample of 100,000, of which party A has the 2-dimensional features of the 100,000 samples, institution B has the other 5-dimensional features of the 100,000 samples, and institution C has the 100,000 samples. The other 10-dimensional features, the feature data distribution of the three institutions are shown in Table 1.
[0145] Table 1
[0146]
[0147] The current organization A hopes that the characteristic data of the federal agencies B and C will detect the abnormal data of the above 100,000 samples, so the agency A initiates a federal anomaly detection request to the agencies B and C, and the agencies B and C agree to participate; then the agency A As the initiator, institutions B and C are two of the participants, and the implementation steps of federated anomaly detection are as follows:
[0148] (1) Institution A, as the initiator, first sends the modeling ...
Embodiment 2
[0175] Embodiment 2, prediction embodiment
[0176] (1) Institutions A, B, and C participate at the same time, and the prediction samples are calculated on the leaf nodes divided on each isolated tree in the test isolated forest model, based on the number of training samples on the leaf nodes and the number of layers of the leaf nodes The path length of the sample falling on the leaf node;
[0177] (2), institutions A, B, and C calculate the path length of each prediction sample in all isolated trees according to formula (1), and then calculate their average path length in all isolated trees;
[0178] (3) Institutions A, B, and C calculate the abnormal score of each prediction sample according to the formula (b) based on the average path length of each prediction sample and the number of training samples of a single isolated tree. Finally, institutions A, B, and C Both get the abnormal score of this predicted sample;
[0179] Among them, the path length prediction process of...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
PUM
Login to View More
Abstract
The embodiment of the invention provides a cross-feature federal abnormal data detection method and system based on an isolated forest, electronic equipment and a storage medium, and the method comprises the steps that: when each isolated tree in the federal isolated forest model is constructed, an initiator randomly selects one node splitting party from the initiator and each participant as a current node, and the node splitting party completes splitting of the current node; node splitting information is sent to the non-node splitting party for the non-node splitting party to perform the same node splitting, so that isolated trees with the same structure can be constructed by utilizing the feature data stored by each party under the condition that the feature data stored by the party is not leaked; and then an isolated forest can be formed, and abnormal data can be detected from the test sample more comprehensively by using the isolated forest.
Description
technical field [0001] The present application relates to the field of federated learning, in particular to a method, system, electronic device and storage medium for detecting anomalies in cross-feature federated data based on isolated forests. Background technique [0002] An outlier is an extremely individual value in a sample whose value deviates significantly from other observations in the sample to which it belongs. There are usually some outliers in the sample. If all the sample data is brought into the calculation analysis or model training process without analysis, the outliers in it will have a bad influence on the results. Therefore, it is necessary to perform abnormalities before data analysis or processing. Value detection, such as credit risk assessment or fraud risk assessment in the field of risk control, after detection, abnormal users are marked to reduce risks in commercial transactions. [0003] Currently, methods such as an isolated forest model compose...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
Application Information
Patent Timeline
Application Date:The date an application was filed.
Publication Date:The date a patent or application was officially published.
First Publication Date:The earliest publication date of a patent with the same application number.
Issue Date:Publication date of the patent grant document.
PCT Entry Date:The Entry date of PCT National Phase.
Estimated Expiry Date:The statutory expiry date of a patent right according to the Patent Law, and it is the longest term of protection that the patent right can achieve without the termination of the patent right due to other reasons(Term extension factor has been taken into account ).
Invalid Date:Actual expiry date is based on effective date or publication date of legal transaction data of invalid patent.