Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

2667 results about "Data cleansing" patented technology

Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data. Data cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting.

System and method for constructing information-analysis-oriented knowledge maps

The invention discloses a system and method for constructing information-analysis-oriented knowledge maps. The system comprises a data acquisition module, a text extraction module, an entity recognition module, a semantic analysis module and an entity-relation extraction module, wherein the data acquisition module is used for carrying out cleaning and simple preprocessing on acquired data and outputting the data to the text extraction module; the text extraction module is used for carrying out data cleaning and preprocessing on structured and unstructured data and conveying clean data to the entity recognition module; the entity recognition module is used for segmenting words of a text, marking the word characteristics of the segmented words, then extracting terms and conveying extracted results to the semantic analysis module; the semantic analysis module is used for analyzing and extracting relation among bodies, generating a semantic metadata model by a body construction tool and outputting the semantic metadata model to the entity-relation extraction module; and the entity-relation extraction module is used for finally generating knowledge map language by extracting taxonomic relation and non-taxonomic relation. The system and method disclosed by the invention have the advantages that by combination of syntactic training and association rules, not only are external input and artificial intervention reduced, but also the entity relation can be continuously recognized.
Owner:NO 32 RES INST OF CHINA ELECTRONICS TECH GRP

Text joins for data cleansing and integration in a relational database management system

An organization's data records are often noisy: because of transcription errors, incomplete information, and lack of standard formats for textual data. A fundamental task during data cleansing and integration is matching strings—perhaps across multiple relations—that refer to the same entity (e.g., organization name or address). Furthermore, it is desirable to perform this matching within an RDBMS, which is where the data is likely to reside. In this paper, We adapt the widely used and established cosine similarity metric from the information retrieval field to the relational database context in order to identify potential string matches across relations. We then use this similarity metric to characterize this key aspect of data cleansing and integration as a join between relations on textual attributes, where the similarity of matches exceeds a specified threshold. Computing an exact answer to the text join can be expensive. For query processing efficiency, we propose an approximate, sampling-based approach to the join problem that can be easily and efficiently executed in a standard, unmodified RDBMS. Therefore the present invention includes a system for string matching across multiple relations in a relational database management system comprising generating a set of strings from a set of characters, decomposing each string into a subset of tokens, establishing at least two relations within the strings, establishing a similarity threshold for the relations, sampling the at least two relations, correlating the relations for the similarity threshold and returning all of the tokens which meet the criteria of the similarity threshold.
Owner:AMERICAN TELEPHONE & TELEGRAPH CO +1

Recommendation system based on graph convolution technology

A recommendation system based on graph convolution technology comprises a preprocessing module, a heterogeneous graph generation module, a model training module and a recommendation result generationmodule, wherein, the preprocessing module cleans the interaction records of the user and the article and performs the data cleaning and the format standardization operation, and generates the interaction sequence for each user and outputs the interaction sequence to the heterogeneous graph generation module; The heterogeneous graph generation module constructs three heterogeneous graphs representing user preferences, dependencies among items and similarities among users according to user interaction sequence data, and outputs the generated graph structure data to the model training module. Themodel training module trains the graph convolution model based on graph structure data and generates vector representation for each user and object. The recommendation result generation module calculates the user's preference for all items according to the vector expression, and generates the final recommendation result. The invention solves the problem that the number of the neighbors of each node is not equal, and the information of the neighbors of the nodes in the heterogeneous graph is mined by the convolution operation, so that the recommendation effect is improved.
Owner:SHANGHAI JIAO TONG UNIV

Time sequence classification early warning method for storage device

The invention discloses a time sequence classification early warning method for a storage device. The method comprises the steps of collecting storage device parameters in real time; cleaning data; performing ARIMA time sequence analysis; and performing logistic regression analysis and early warning mechanism output. Under the background of a big data environment, time sequence prediction analysisis performed by adopting an ARIMA model according to historical data and hard disk SMART information obtained by statistics; the correlation between a SMART eigenvalue and a fault rate of the storagedevice is analyzed; and an eigenvalue more suitable for a Logistic model is selected out to perform classification prediction. A machine learning method is adopted for predicting the fault rate of the storage device, so that the problems of classification singleness and low early warning intensity in final prediction of the storage device are solved, the defects of hysteresis, low accuracy, pooractual early warning effect and difficult application to the big data environment for a disk early warning mechanism in the prior art are overcome, the occurrence probability of each early warning intensity can be predicted, and an effective solution is provided for real-time operation maintenance and monitoring in a data center environment.
Owner:HUAZHONG UNIV OF SCI & TECH

Malicious traffic detection method, system and apparatus, and computer readable storage medium

The invention discloses a malicious traffic detection method. The method comprises the following steps: correspondingly establishing malicious and normal data sample libraries by using obtained malicious and normal data traffic samples; executing a data cleaning operation and a preprocessing operation on the data sample libraries in sequence to obtain training data, and constructing a traffic detection model by using the training data and a deep learning algorithm; judging whether to-be-measured data traffic contains malicious data by using the traffic detection model; and if so, sending alarminformation carrying the to-be-measured data traffic belonging to malicious data via a preset oath. Feature learning and training are performed by using the malicious and normal data traffic samplesvia the automatic learning property of the deep learning algorithm, the feature information extraction operation is completed without consuming precious human resources, thereby improving the improving the work efficiency and improving the discrimination of the malicious traffic. Precision. The invention further discloses a malicious traffic detection system and apparatus and a computer readable storage medium, which have the above beneficial effects.
Owner:SANGFOR TECH INC

Medical clinical quality monitoring and evaluation system based on single-disease model

The invention discloses a medical clinical quality monitoring and evaluation system based on a single-disease model, and relates to the field of medical quality management. The system comprises a clinical data integration subsystem, a data cleaning and standardization subsystem, a statistic analysis and evaluation algorithm subsystem and a clinical quality management application subsystem. The clinical data integration subsystem is used for sending collected case data in the original diagnosis and treatment process of single diseases into a clinical record database; the data cleaning and standardization subsystem is used for selecting case data from the clinical record database and processing the data to form a single-disease evaluation database; the statistic analysis and evaluation algorithm subsystem is used for performing index calculation and comprehensive evaluation calculation on the received data; the clinical quality management application subsystem is used for displaying a comprehensive evaluation result obtained after the statistic analysis and evaluation algorithm subsystem performs calculation. Through the system, processing and statistic evaluation of the clinical data in clinical quality management are achieved, and a medical quality manager can truly and objectively master the quality of the diagnosis and treatment process of various diseases in various clinical departments in real time in a full-quantized mode.
Owner:HUAJU MEDICAL ASSESSMENT INFORMATION TECH BEIJING CO LTD

Mass unstructured distribution network data integration method based on knowledge mapping technology

ActiveCN107330125AReduce the calculation amount of data fusionReduce storage pressureSpecial data processing applicationsInformatizationData source
The invention discloses a mass unstructured distribution network data integration method based on knowledge mapping technology. The method includes that a data collection unit collects unstructured distribution network data of each informatization system, and quality analysis and data cleaning processing are performed on the unstructured distribution network data of each informatization system; according to the processed unstructured distribution network data of each informatization system, data local index based on local knowledge mapping is constructed; the data local index based on the local knowledge mapping is sent to a data management center through a big data connector; the data management center constructs data global index based on global knowledge mapping. Collection, quality analysis and data cleaning of distributed multi-source heterogeneous data are advanced to each informatization system, so that data fusion calculation quantity, storage pressure and data scheduling burden of the data management center are lowered; the data global index based on the global knowledge mapping is utilized to integrate data sources, so that convenience is brought to data inquiry and extraction, and workload of the data management center is reduced.
Owner:YUNNAN POWER GRID CO LTD ELECTRIC POWER RES INST

Potential customer mining and recommending method

The invention provides a potential customer mining and recommending method, which comprises the following steps of obtaining personal information and social activity information of a user from a social platform, fusing the personal information and the social activity information with locally stored user shopping records, and carrying out data cleaning and screening to obtain data for training andtesting a potential customer classification model; constructing a user portrait according to the personal information, the social record and the shopping record of the user, processing the social record and the shopping record of the user into a feature vector form which can be used by a model, then training a user interest prediction model, and dividing the users into potential customers and passers-by; and finally, identifying and providing more targeted commodity pages for the potential customers according to the interests of the potential customers. According to the method, the interest ofthe user can be judged while the user is accurately classified; corresponding products are displayed or precise advertisement putting is implemented according to the interest judgement of the users,so that conversion of potential customers is realized; targeted recommendation can also be provided for old customers, and customer stickiness is improved.
Owner:GUANGDONG UNIV OF TECH

Intelligent equipment machine learning safety monitoring system based on user behavior

The invention discloses an intelligent equipment machine learning safety monitoring system based on a user behavior. The system is characterized by comprising a first-level machine learning model oriented to the third party intelligent equipment user behavior data and a second-level user behavior machine learning model of an intelligent equipment side based on a MPU memory protection mechanism; the first-level learning model performing data cleaning on two types of data on the basis of two types of data, namely, the data of the same intelligent equipment type and the behavior data of the same individual user, by means of the user behavior data of a third party cloud platform, determining the data and the correlation needed to use by the intelligent equipment, and determining a subject of the intelligent equipment user behavior according to the type of the intelligent equipment; the second-level user behavior machine learning model of the intelligent equipment side based on the MPU memory protection mechanism, the intelligent equipment side firstly using the memory protection mechanism of the MPU to divide safety protection regions on a safety monitoring model obtained in the first-level machine learning model, and finally enabling a monitoring system to effectively protect the security of the intelligent equipment and the user.
Owner:INST OF INFORMATION ENG CAS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products