User data preprocessing system for link prediction relation recommendation

A data preprocessing and link prediction technology, which is applied in database management systems, relational databases, electrical digital data processing, etc., can solve problems such as improper selection of field matching algorithms, poor versatility of data preprocessing systems, and difficulty in expansion.

Inactive Publication Date: 2021-10-15
HENAN POLYTECHNIC INST
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The existing technology has the following shortcomings: the existing data preprocessing system has low efficiency in processing large-scale data, improper selection of field matching algorithms, and detection accuracy depends on feature selection, and the data preprocessing system has poor versatility, is not easy to expand, and is not sufficient for resource scheduling

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0047] The user builds a data mining case and submits a task. After the system receives the task case, receives the data file, or connects to the location of the database, it decomposes the task and obtains the sequence of subtasks. The discrete recommendation engine transfers the file to the data analysis agent to analyze and obtain the task After the data file information is submitted to the discretization recommendation agent, the intelligent control agent checks the discretization database data, and performs rough set analysis to obtain the corresponding discretization decision rules.

[0048] Further, in the above technical solution, the link prediction relationship recommendation layer obtains the edge set in the weighted relationship network graph, and divides the edge set into a training set and a test set, and according to the training Link prediction is performed on the set to obtain a prediction result, and based on preset indicators, a preset index value is obtained...

Embodiment 2

[0061] After the system parses the data configuration file, the data extraction module extracts data from the external data file, and the data transfer module stores the data in the form of text. The uniquely identified ID, the text is stored in the order of the fields in the configuration file, so that when restoring data, you only need to parse the fields in the configuration file to identify the fields corresponding to the values ​​in the record. After the data extraction is completed, the extracted The data is stored in the HDFS file system according to the specified format for use by the subsequent preprocessing module. The data preprocessing module is divided into three parts. The first part is data integrity, consistency, and validity detection, and the second part is similar duplicate data detection. The third part is abnormal data detection, data integrity, consistency, and validity detection. After obtaining the data to be processed from the HDFS file system, first de...

Embodiment 3

[0069] After the data is processed by the data preprocessing module, the data quality is improved. The task of the data storage module is to import these data into the specified database in batches. The data storage medium and storage method need to be determined according to the characteristics of different business types. For business logic Relevant data, use RDBMS for strong relational constraints, and use distributed storage methods for data with weak structural requirements and large-scale data, using HBase database. Data storage must not only be stored in the RDBMS database, but also To be stored in the HBase database.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a user data preprocessing system for link prediction relation recommendation. The data preprocessing system comprises a link prediction relation recommendation layer, a data layer, a data processing layer, a system management layer, a service layer, an application layer, a data access layer and a view layer. According to the invention, a data preprocessing overall framework of a multi-agent architecture is constructed by analyzing the characteristics of a specific preprocessing algorithm and obtaining a corresponding intelligent recommendation scheme. The framework integrates the functions of a preprocessing algorithm, user interaction, system scheduling and the like into the whole system; the link prediction relation layer enables the framework to have openness and expandability, and provides support for preprocessing tasks under different backgrounds; aiming at the intelligent problem of the data preprocessing system, the knowledge discovery model agent describes each part of the data preprocessing process; a scientific algorithm recommendation scheme is provided for a user in a knowledge base mode, and intelligent recommendation of an algorithm is achieved through the knowledge classification capacity of a rough set theory.

Description

technical field [0001] The present invention relates to the technical field of ledger management, and more specifically, the present invention relates to a user data preprocessing system for link prediction relationship recommendation. Background technique [0002] With the rapid development of the Internet of Things, mobile Internet, and smart phones, data output is increasing exponentially, and big data technology has emerged as the times require. Due to various factors, the collected data inevitably has quality problems. Using these "dirty data" for Data mining may lead to incorrect knowledge mining and wrong data analysis, which will bring misleading and loss to researchers and even enterprises. In order to improve data quality, data preprocessing is required for user data sets, especially similar duplicate data detection and abnormal data detection are particularly important. [0003] The existing technology has the following shortcomings: the existing data preprocessi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/25G06F16/215G06F16/23G06F16/2458G06F16/2457G06F16/27G06F16/28G06F16/182G06F16/16G06F16/17G06F21/31G06N5/02
CPCG06F16/254G06F16/252G06F16/215G06F16/2365G06F16/2465G06F16/2457G06F16/27G06F16/283G06F16/284G06F16/182G06F16/168G06F16/1734G06F21/31G06N5/022
Inventor 任越美李垒杨云孙立伟毛峥边青全王培培吕品
Owner HENAN POLYTECHNIC INST
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products