Unlock instant, AI-driven research and patent intelligence for your innovation.

Entity matching method and computer program based on non-principal attribute outlier detection

A technology of outlier detection and non-primary attributes, applied in the Internet field, can solve problems such as weak supervision and large workload, and achieve the effect of improving accuracy and recall.

Active Publication Date: 2021-12-07
CIVIL AVIATION UNIV OF CHINA
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] To sum up, the shortcomings of the existing technology are: the supervised classifier model introduced in this paper needs to be trained, and label labeling requires a lot of work. In the future, we can try to use weak supervision or crowdsourcing to make the system automatically Finding matches and reducing the workload of manual labeling is the focus of future research

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity matching method and computer program based on non-principal attribute outlier detection
  • Entity matching method and computer program based on non-principal attribute outlier detection
  • Entity matching method and computer program based on non-principal attribute outlier detection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] In order to further understand the invention content, characteristics and effects of the present invention, the following examples are given, and detailed descriptions are as follows in conjunction with the accompanying drawings:

[0050] see figure 1 , an entity matching method based on non-main attribute outlier detection, including the following steps:

[0051] Step 1: Data preprocessing, that is, processing the original data entity and generating the input data set of EM. According to the difference between input data and output data, data preprocessing mainly includes two parts:

[0052] Data extraction: According to the goal of the experiment, find out the common non-primary attributes of different source data, use incremental extraction, and save the extracted data to another table. And use regular expressions or natural language processing technology to remove obviously wrong or meaningless field information.

[0053] Data archiving and cleaning: Use archivin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an entity matching method based on non-main attribute outlier detection, which belongs to the technical field of the Internet, and is characterized in that: the entity matching method based on non-main attribute outlier detection is embodied in two aspects. On the one hand, it utilizes Non-key attribute values ​​eliminate the ambiguity caused by the diversity of main attribute values. On the other hand, according to the outlier model, the data is quickly screened to extract matching pairs; The characteristics of the main attribute use the corresponding rules to roughly screen the data to reduce the data size of the record pair. On this basis, use the five steps in the outlier model to do further screening to obtain a preliminary entity pair set, and then according to the generated The entity pair set samples the data set, and finally uses machine learning to select an appropriate matcher and train it. The present invention overcomes to a certain extent the drawback that outlier point matching cannot be applied to large-scale data in traditional singular value decomposition.

Description

technical field [0001] The invention belongs to the technical field of the Internet, and in particular relates to an entity matching method and a computer program based on non-main attribute outlier detection. Background technique [0002] In the next 30 years, data applications will become more and more prominent, which will definitely affect the construction and development of civil aviation informatization. With the promotion of the mobile Internet, some convenient applications can be pushed to smart terminals, and big data technology can be used to analyze the behavior of passengers and understand their concerns, so as to improve the aviation experience of users. [0003] From the perspective of global civil aviation development, due to the intensified market competition, the civil aviation industry has long been operating at a meager profit level. With the continuous worsening of the global financial crisis in recent years, the survival pressure of airlines is increasi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/36G06N20/00G06Q50/30
CPCG06Q50/30
Inventor 曹卫东王广森王怀超
Owner CIVIL AVIATION UNIV OF CHINA