Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.
2673 results about "Data cleansing" patented technology
Filter
Efficacy Topic
Property
Owner
Technical Advancement
Application Domain
Technology Topic
Technology Field Word
Patent Country/Region
Patent Type
Patent Status
Application Year
Inventor
Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data. Data cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting.
The invention discloses a system and method for constructing information-analysis-oriented knowledge maps. The system comprises a data acquisition module, a text extraction module, an entity recognition module, a semantic analysis module and an entity-relation extraction module, wherein the data acquisition module is used for carrying out cleaning and simple preprocessing on acquired data and outputting the data to the text extraction module; the text extraction module is used for carrying out data cleaning and preprocessing on structured and unstructured data and conveying clean data to the entity recognition module; the entity recognition module is used for segmenting words of a text, marking the word characteristics of the segmented words, then extracting terms and conveying extracted results to the semantic analysis module; the semantic analysis module is used for analyzing and extracting relation among bodies, generating a semantic metadata model by a body construction tool and outputting the semantic metadata model to the entity-relation extraction module; and the entity-relation extraction module is used for finally generating knowledge map language by extracting taxonomic relation and non-taxonomic relation. The system and method disclosed by the invention have the advantages that by combination of syntactic training and association rules, not only are external input and artificial intervention reduced, but also the entity relation can be continuously recognized.
The invention discloses a credit recording system and method based on a block chain. The credit recording system comprises a plurality of user terminals which are connected to the block chain and used for acquiring and verifying credit data, wherein each user terminal is installed with an encryption module for encrypting target credit data to be transmitted to form encrypted credit data, a communication module for transmitting the encrypted credit data together with the block chain, and a storage module for storing the target credit data of the user terminal and other encrypted credit data transmitted by the block chain. The credit data are wide in sources, independent, reliable and real; the cleaning and sieving complexity of the credit data are lowered greatly; and the credit recording system and method have flexible and diverse application scenes and high extensibility.
The invention discloses a data control method based on data platforms. The data control method includes: (1) acquiring data of a plurality of the data platforms and integrating the data, wherein the integrated data includes user data of the data platforms, original data of data items, multi-dimensional descriptions of user behavior, multi-dimensional descriptions of the data items, online data and offline data; (2) processing the integrated data in a distributed processing frame mode and performing normalization operation, standardization operation and data cleaning operation, wherein the normalization operation refers to performing normalization operation to numerical data, the standardization operation refers to organizing the data in structuralization, keeping the data integrity, reducing redundancy and increasing the uniformity of the data, and the data cleaning operation refers to perform the data cleaning operation to incomplete data, wrong data and repeated data; and (3) extracting the processed data and displaying the data. The data control method based on the data platforms improves the speed of data searching through a novel data control mode.
The invention discloses a method and a device for building a classification forecasting mixed model. The method includes: dividing a sample data set into data sets of different types according to data characteristics, performing data cleaning for the data sets and performing variable selection after data cleaning is finished to generate variable sets of different types, and adopting at least one classification forecasting single model for each variable set to build the classification forecasting mixed model. Through the method and the device, the classification forecasting mixed model is respectively built after data subdivision, and accuracy of classification forecasting is improved.
A recommendation system based on graph convolution technology comprises a preprocessing module, a heterogeneous graph generation module, a model training module and a recommendation result generationmodule, wherein, the preprocessing module cleans the interaction records of the user and the article and performs the data cleaning and the format standardization operation, and generates the interaction sequence for each user and outputs the interaction sequence to the heterogeneous graph generation module; The heterogeneous graph generation module constructs three heterogeneous graphs representing user preferences, dependencies among items and similarities among users according to user interaction sequence data, and outputs the generated graph structure data to the model training module. Themodel training module trains the graph convolution model based on graph structure data and generates vector representation for each user and object. The recommendation result generation module calculates the user's preference for all items according to the vector expression, and generates the final recommendation result. The invention solves the problem that the number of the neighbors of each node is not equal, and the information of the neighbors of the nodes in the heterogeneous graph is mined by the convolution operation, so that the recommendation effect is improved.
The invention discloses a method and device for cleaning mass data. The method comprises the steps of first configuring data cleaning rule files, obtaining data cleaning rules corresponding to a to-be-cleaned data table according to a table name of the data cleaning rules, automatically generating cleaning codes to perform cleaning, tagging every to-be-cleaned datum in the cleaning process, analyzing which data cleaning rule the data trigger by tag analysis, and accordingly performing corresponding cleaning processing. The device for cleaning the mass data comprises a data rule configuration module, a data cleaning code generation module, an execution module and an analysis module, and the mass data are cleaned through the mass data cleaning method. The mass data can be effectively cleaned, the efficiency is high, dirty data which are cleaned out are reserved in a classified mode, and sources and whereabouts of every dirty datum can be located precisely.
The invention discloses a time sequence classification early warning method for a storage device. The method comprises the steps of collecting storage device parameters in real time; cleaning data; performing ARIMA time sequence analysis; and performing logistic regression analysis and early warning mechanism output. Under the background of a big data environment, time sequence prediction analysisis performed by adopting an ARIMA model according to historical data and hard disk SMART information obtained by statistics; the correlation between a SMART eigenvalue and a fault rate of the storagedevice is analyzed; and an eigenvalue more suitable for a Logistic model is selected out to perform classification prediction. A machine learning method is adopted for predicting the fault rate of the storage device, so that the problems of classification singleness and low early warning intensity in final prediction of the storage device are solved, the defects of hysteresis, low accuracy, pooractual early warning effect and difficult application to the big data environment for a disk early warning mechanism in the prior art are overcome, the occurrence probability of each early warning intensity can be predicted, and an effective solution is provided for real-time operation maintenance and monitoring in a data center environment.
The embodiment of the invention discloses a big data statistics method and system, a computer device and a storage medium. The method provided by the embodiment of the invention comprises the following steps: reading binlog of a Mysql database, and sequentially putting log records into a message queue; The message queue is consumed through an ETL service, log records in the message queue are extracted, cleaned, converted and loaded to obtain corresponding business data, and the business data are loaded to a corresponding data warehouse; carrying out real-time analysis, aggregation, query and offline calculation on the business data through a Spark distributed query engine to obtain a corresponding statistical result; According to the technical scheme, the data is imported into the warehouse in an incremental mode, the data are cleaned and stored after being cleaned, the statistical data are calculated in advance through offline calculation, the statistical data are directly taken out when the service system is used, the statistical speed is increased, and the statistical pressure of the database is reduced.
The invention discloses a method of comprehensively judging a resident trip mode based on handset signaling data, which belongs to the transport planning and management data analyzing field. A data source is from handset signaling data provided by mobile network service providers. Data cleaning, integrating and position conversion are further performed. A resident trip mode is further judged by mobile space-time path describing and stopover point identifying. The method which can effectively discriminate seven common trip modes including walking, bicycling, routine bus, electric vehicle, self-driving, taxi and rail transit thus acquires trip mode information of residents. A data basis is further provided for fields of special traffic programs, comprehensive traffic programs and intelligent traffic systems of cities, etc.
The invention discloses a mass unstructured distribution network data integration method based on knowledge mapping technology. The method includes that a data collection unit collects unstructured distribution network data of each informatizationsystem, and quality analysis and data cleaning processing are performed on the unstructured distribution network data of each informatizationsystem; according to the processed unstructured distribution network data of each informatizationsystem, data local index based on local knowledge mapping is constructed; the data local index based on the local knowledge mapping is sent to a data management center through a big data connector; the data management center constructs data global index based on global knowledge mapping. Collection, quality analysis and data cleaning of distributed multi-source heterogeneous data are advanced to each informatization system, so that data fusion calculation quantity, storage pressure and data scheduling burden of the data management center are lowered; the data global index based on the global knowledge mapping is utilized to integrate data sources, so that convenience is brought to data inquiry and extraction, and workload of the data management center is reduced.
The invention provides an elevator as-required maintenance system based on big data analysis. The elevator as-required maintenance system comprises a data source, a data access module and a data processing module. The data access module obtains data from the data source and conducts parsing and distribution, and the data processing module receives the data distributed by the data access module andconducts storage, modeling and analysis; and the data access module comprises a data parsing unit, a data cleaning unit and a data distribution unit. The invention further provides an elevator as-required maintenance method based on big data analysis. Internet of Things equipment is applied to collecting maintenance-related elevator data and elevator safety operation failure data, failure situations in the daily using process of an elevator and the online situations of the Internet of Things equipment are combined, the purpose that a daily maintenance mode of Internet of Things plus maintenance according to elevator safety operation requirements to build a quantitative index mathematical model of the elevator on which maintenance is conducted according to risks and situations is achieved,and data support is provided for maintenance reformation.
The invention provides a potential customer mining and recommending method, which comprises the following steps of obtaining personal information and social activity information of a user from a social platform, fusing the personal information and the social activity information with locally stored user shopping records, and carrying out data cleaning and screening to obtain data for training andtesting a potential customer classification model; constructing a user portrait according to the personal information, the social record and the shopping record of the user, processing the social record and the shopping record of the user into a feature vector form which can be used by a model, then training a user interest prediction model, and dividing the users into potential customers and passers-by; and finally, identifying and providing more targeted commodity pages for the potential customers according to the interests of the potential customers. According to the method, the interest ofthe user can be judged while the user is accurately classified; corresponding products are displayed or precise advertisement putting is implemented according to the interest judgement of the users,so that conversion of potential customers is realized; targeted recommendation can also be provided for old customers, and customer stickiness is improved.
The invention discloses a method and a device for constructing a patent data knowledge map. The method comprises the following steps: obtaining patent data of an existing patent database, preprocessing the patent data to unify a patent data format, and segmenting the patent data after being merged with the same type to obtain the segmentation data of each type of patent data; performing knowledgeextraction of preprocessed patent data, performing data cleaning of word segmentation data of each type of patent data to obtain the corresponding subject original file, extracting keywords to obtainsubject words, and constructing a patent subject database for each type of patent data; defining the entity of patent data, determining the subject of patent data, identifying the entity and subject of patent according to the general knowledge map, mining the semantic relationship between the entity and the subject, and constructing the patent data knowledge map.
A method, system, and apparatus are provided for processing tables embedded within documents wherein a first table header is detected by using semantic groupings of table header terms to identify a minimum number of table header terms in a scanned line of an text document; a potential data zone is extracted by applying white space correlation analysis to a portion of the text document that is adjacent to the first table header; one or more data zone columns from the potential data zone are grouped and aligned with a corresponding header column in the first table header to form a candidate table; data cleansing is performed on the candidate table; and then one or more columns of the candidate table are evaluated using natural languageprocessing to apply a specified table analysis.
The invention discloses an object-level information excavation system, which comprises a data collection module, a data cleaning module, a content pretreatment module and an object correlation search module, wherein, the data collection module used to collect data comprises a WEB grabber, the data cleaning module used to process structured data comprises a data verification module and a repeat-ridding process module, the content pretreatment module used to pretreat unstructured data comprises a metadata management module and a content analyzer, and the object correlation search module used to analyze the correlation degree of the processed content of the content pretreatment module comprises a correlation degree analyzer. The invention also discloses an object-level information excavation method, which comprises the following steps that: information is collected from web pages; the data cleaning process is carried out to the structured data collected; the content pretreatment operation is carried out to the unstructured data collected; the object correlation search operation is carried out to the content obtained after the pretreatment.
The invention provides a cleaning method of power communication operation and maintenance data, and more particularly to a power operation and maintenance data cleaning method based on an isolation forest algorithm and a neural network. The method includes: firstly, using the improved isolation forest algorithm to construct an isolation forest model iForest solving a target problem; then definingan evaluation system of the isolation forest algorithm on abnormal data; and carrying out prediction correction on abnormal data attributes, which are detected through an isolation forest, through training the BP neural network. According to the method, optimization is carried out on a power communication operation and maintenance data cleaning method based on the isolation forest algorithm and the neural network, abnormality detection precision is improved, data correction errors are reduced, and effective optimization is realized for a power operation and maintenance data cleaning program onaspects of abnormal-data positioning accuracy, a data correction accuracy rate, training time, resource occupation and the like.
A cleansing system for improving operation of a plant. A server is coupled to the cleansing system for communicating with the plant via a communication network. A computer system has a web-based platform for receiving and sending plant data related to the operation of the plant over the network. A display device interactively displays the plant data. A data cleansing unit is configured for performing an enhanced data cleansing process for allowing an early detection and diagnosis of the operation of the plant based on at least one environmental factor. The data cleansing unit calculates and evaluates an offset amount representing a difference between a measurement and a simulation for detecting an error of measurement during the operation of the plant based on the plant data.
The invention provides a product demand preference characteristic digging and quality evaluation method based on comment information. The method comprises the following steps: 1, crawling data, namely by using a network crawling technique, crawling product comment appointed information of an E-commerce platform and storing in a databank; 2, performing data preprocessing and product characteristic word extraction, namely performing data cleansing and preprocessing on acquired data, and further performing product characteristic extraction on preprocessed data by using a BiLSTM-CRF (Bidirectional Long Short-Term Memory-Conditional Random Field) model; and 3, performing product demand preference characteristic digging and quality evaluation. By adopting the method, quality problems of products can be rapidly understood according to feedback information of customers, demand preference characteristics of the customers can be understood, and thus companies can make relatively good decisions to meet the customers.
The invention discloses an intelligent equipmentmachine learning safety monitoringsystem based on a user behavior. The system is characterized by comprising a first-level machine learning model oriented to the third partyintelligent equipment user behavior data and a second-level user behavior machine learning model of an intelligent equipment side based on a MPU memory protection mechanism; the first-level learning model performing data cleaning on two types of data on the basis of two types of data, namely, the data of the same intelligent equipment type and the behavior data of the same individual user, by means of the user behavior data of a third party cloud platform, determining the data and the correlation needed to use by the intelligent equipment, and determining a subject of the intelligent equipment user behavior according to the type of the intelligent equipment; the second-level user behavior machine learning model of the intelligent equipment side based on the MPU memory protection mechanism, the intelligent equipment side firstly using the memory protection mechanism of the MPU to divide safety protection regions on a safety monitoring model obtained in the first-level machine learning model, and finally enabling a monitoring system to effectively protect the security of the intelligent equipment and the user.
A namespace exploits individual resource identity attributes of an application to allow the integration of resource instances from applications into a configuration managementdatabase (CMDB), prior to any data cleansing or namespaceharmonization activities. An approach for incremental reconciliation of resource instances within the CMDB is defined.