Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

2673 results about "Data cleansing" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data. Data cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting.

Sharable multi-tenant reference data utility and methods of operation of same

InactiveUS20060235715A1Good quality dataQuality improvementFinanceData setApplication software

A multi-source multi-tenant reference data utility and methods for forming and maintaining the same, delivering high quality reference data in response to requests from clients, implemented using a shared infrastructure, and also providing added value services using the client's reference data. Included are data cleansing and quality assurance of the received data with full tracking of the sourcing of each value, storage of resulting entity values in a repository which allows retrievals and enforces source based entitlements, and delivery of retrieved data in the form of on demand datasets supporting a wide range of client application needs. An advantageous implementation has additional services for reporting on data quality and usage, a selection of value adding data driven computations and business document storage. By using a shared infrastructure and amortizing the costs of data quality assurance across a plurality of clients, while ensuring that clients only receive values from data sources to which they are licensed, better quality data at lower cost is delivered.

Sharable multi-tenant reference data utility and methods of operation of same

Sharable multi-tenant reference data utility and methods of operation of same

Sharable multi-tenant reference data utility and methods of operation of same

Owner:IBM CORP

System and method for constructing information-analysis-oriented knowledge maps

InactiveCN106815293ASave human effortMany sources of solutionsWeb data indexingRelational databasesInformation analysisData acquisition

The invention discloses a system and method for constructing information-analysis-oriented knowledge maps. The system comprises a data acquisition module, a text extraction module, an entity recognition module, a semantic analysis module and an entity-relation extraction module, wherein the data acquisition module is used for carrying out cleaning and simple preprocessing on acquired data and outputting the data to the text extraction module; the text extraction module is used for carrying out data cleaning and preprocessing on structured and unstructured data and conveying clean data to the entity recognition module; the entity recognition module is used for segmenting words of a text, marking the word characteristics of the segmented words, then extracting terms and conveying extracted results to the semantic analysis module; the semantic analysis module is used for analyzing and extracting relation among bodies, generating a semantic metadata model by a body construction tool and outputting the semantic metadata model to the entity-relation extraction module; and the entity-relation extraction module is used for finally generating knowledge map language by extracting taxonomic relation and non-taxonomic relation. The system and method disclosed by the invention have the advantages that by combination of syntactic training and association rules, not only are external input and artificial intervention reduced, but also the entity relation can be continuously recognized.

System and method for constructing information-analysis-oriented knowledge maps

System and method for constructing information-analysis-oriented knowledge maps

System and method for constructing information-analysis-oriented knowledge maps

Owner:NO 32 RES INST OF CHINA ELECTRONICS TECH GRP

Text joins for data cleansing and integration in a relational database management system

InactiveUS20050027717A1Relational databasesComparison of digital valuesCosine similarityRelational database management system

An organization's data records are often noisy: because of transcription errors, incomplete information, and lack of standard formats for textual data. A fundamental task during data cleansing and integration is matching strings—perhaps across multiple relations—that refer to the same entity (e.g., organization name or address). Furthermore, it is desirable to perform this matching within an RDBMS, which is where the data is likely to reside. In this paper, We adapt the widely used and established cosine similarity metric from the information retrieval field to the relational database context in order to identify potential string matches across relations. We then use this similarity metric to characterize this key aspect of data cleansing and integration as a join between relations on textual attributes, where the similarity of matches exceeds a specified threshold. Computing an exact answer to the text join can be expensive. For query processing efficiency, we propose an approximate, sampling-based approach to the join problem that can be easily and efficiently executed in a standard, unmodified RDBMS. Therefore the present invention includes a system for string matching across multiple relations in a relational database management system comprising generating a set of strings from a set of characters, decomposing each string into a subset of tokens, establishing at least two relations within the strings, establishing a similarity threshold for the relations, sampling the at least two relations, correlating the relations for the similarity threshold and returning all of the tokens which meet the criteria of the similarity threshold.

Text joins for data cleansing and integration in a relational database management system

Text joins for data cleansing and integration in a relational database management system

Text joins for data cleansing and integration in a relational database management system

Owner:AMERICAN TELEPHONE & TELEGRAPH CO +1

Credit recording system and method based on block chain

ActiveCN106485167AWide variety of sourcesReduce complexityDigital data protectionExtensibilityComputer terminal

The invention discloses a credit recording system and method based on a block chain. The credit recording system comprises a plurality of user terminals which are connected to the block chain and used for acquiring and verifying credit data, wherein each user terminal is installed with an encryption module for encrypting target credit data to be transmitted to form encrypted credit data, a communication module for transmitting the encrypted credit data together with the block chain, and a storage module for storing the target credit data of the user terminal and other encrypted credit data transmitted by the block chain. The credit data are wide in sources, independent, reliable and real; the cleaning and sieving complexity of the credit data are lowered greatly; and the credit recording system and method have flexible and diverse application scenes and high extensibility.

Credit recording system and method based on block chain

Credit recording system and method based on block chain

Owner:CENTRIN DATA SYST

Industrial big data multidimensional analysis and visualization method based on JSON document structure

ActiveCN110618983ADatabase management systemsVisual data miningInteractive graphicsData set

The invention belongs to the technical field of industrial big data application, and particularly relates to an industrial big data multidimensional analysis and visualization method based on a JSON document structure. The method comprises the following steps: by taking JSON as a basic carrier of data, constructing an industrial data mart in parallel by utilizing Spark and ElasticSearch through configuring a relational database and a file system data source and defining data conversion and data cleaning operations; configuring an overall process of data analysis in a graphical mode to construct an analysis data set of a multi-dimensional structure, and repeated association operation on massive data being avoided; and for a specific data analysis scene, customizing each dimension calculation index of the data analysis report in a visual dragging mode based on a pre-constructed multi-dimensional analysis data set, and generating an interactive graphic analysis report. According to the method, the JSON document format is used as a carrier of basic data, and the advantages of the JSON document format in storage and analysis are utilized, so that multi-dimensional analysis structure modeling and user-defined interactive analysis are more convenient and efficient.

Industrial big data multidimensional analysis and visualization method based on JSON document structure

Industrial big data multidimensional analysis and visualization method based on JSON document structure

Industrial big data multidimensional analysis and visualization method based on JSON document structure

Owner:FUDAN UNIV

Big data development standardized systematic classification and command set system

ActiveCN106649455AWeb data indexingSpecial data processing applicationsRelational databaseData acquisition

The invention discloses a big data development standardized systematic classification and command set system. The system comprises a data acquisition module which acquires data in a relational database and a local file and stores the data in a big data platform, a data processing module which cleans the data in the big data platform to be in a defined format according to a user demand and performs statistics and analysis, a data source and SQL engine module which imports and exports the data among the relational database, the local file and the big data platform and is connected to an NOSQL database, a machine learning algorithm module which analyzes correlation among the data in the big data platform, classifies the data and analyzes a new data relationship according to existing correlation among the data, a natural language processing module which processes a natural language in the data of the big data platform, and a search engine module which provides a data retrieval service according to a user request and displays a retrieval result to a user, wherein the natural language processing comprises the execution of article abstracting.

Big data development standardized systematic classification and command set system

Big data development standardized systematic classification and command set system

Big data development standardized systematic classification and command set system

Owner:孙燕群 +1

Data control method based on data platforms

ActiveCN103136335AImprove search speedImprove operational data analysis capabilitiesSpecial data processing applicationsData controlData integrity

The invention discloses a data control method based on data platforms. The data control method includes: (1) acquiring data of a plurality of the data platforms and integrating the data, wherein the integrated data includes user data of the data platforms, original data of data items, multi-dimensional descriptions of user behavior, multi-dimensional descriptions of the data items, online data and offline data; (2) processing the integrated data in a distributed processing frame mode and performing normalization operation, standardization operation and data cleaning operation, wherein the normalization operation refers to performing normalization operation to numerical data, the standardization operation refers to organizing the data in structuralization, keeping the data integrity, reducing redundancy and increasing the uniformity of the data, and the data cleaning operation refers to perform the data cleaning operation to incomplete data, wrong data and repeated data; and (3) extracting the processed data and displaying the data. The data control method based on the data platforms improves the speed of data searching through a novel data control mode.

Data control method based on data platforms

Data control method based on data platforms

Data control method based on data platforms

Owner:北京百分点科技集团股份有限公司

Method and device for building classification forecasting mixed model

ActiveCN102567391AThe classification prediction is accurateHigh precisionSpecial data processing applicationsData setBusiness forecasting

The invention discloses a method and a device for building a classification forecasting mixed model. The method includes: dividing a sample data set into data sets of different types according to data characteristics, performing data cleaning for the data sets and performing variable selection after data cleaning is finished to generate variable sets of different types, and adopting at least one classification forecasting single model for each variable set to build the classification forecasting mixed model. Through the method and the device, the classification forecasting mixed model is respectively built after data subdivision, and accuracy of classification forecasting is improved.

Method and device for building classification forecasting mixed model

Method and device for building classification forecasting mixed model

Method and device for building classification forecasting mixed model

Owner:CHINA MOBILE GRP GUANGDONG CO LTD

Recommendation system based on graph convolution technology

ActiveCN109299373ASolve the problem of unequal length of interaction sequenceImprove recommendation effectDigital data information retrievalCharacter and pattern recognitionGraph structured dataDependency relation

A recommendation system based on graph convolution technology comprises a preprocessing module, a heterogeneous graph generation module, a model training module and a recommendation result generationmodule, wherein, the preprocessing module cleans the interaction records of the user and the article and performs the data cleaning and the format standardization operation, and generates the interaction sequence for each user and outputs the interaction sequence to the heterogeneous graph generation module; The heterogeneous graph generation module constructs three heterogeneous graphs representing user preferences, dependencies among items and similarities among users according to user interaction sequence data, and outputs the generated graph structure data to the model training module. Themodel training module trains the graph convolution model based on graph structure data and generates vector representation for each user and object. The recommendation result generation module calculates the user's preference for all items according to the vector expression, and generates the final recommendation result. The invention solves the problem that the number of the neighbors of each node is not equal, and the information of the neighbors of the nodes in the heterogeneous graph is mined by the convolution operation, so that the recommendation effect is improved.

Recommendation system based on graph convolution technology

Recommendation system based on graph convolution technology

Recommendation system based on graph convolution technology

Owner:SHANGHAI JIAO TONG UNIV

Method and device for cleaning mass data

ActiveCN103593352AConsistentWith specificationWeb data indexingSpecial data processing applicationsComputer moduleData mining

The invention discloses a method and device for cleaning mass data. The method comprises the steps of first configuring data cleaning rule files, obtaining data cleaning rules corresponding to a to-be-cleaned data table according to a table name of the data cleaning rules, automatically generating cleaning codes to perform cleaning, tagging every to-be-cleaned datum in the cleaning process, analyzing which data cleaning rule the data trigger by tag analysis, and accordingly performing corresponding cleaning processing. The device for cleaning the mass data comprises a data rule configuration module, a data cleaning code generation module, an execution module and an analysis module, and the mass data are cleaned through the mass data cleaning method. The mass data can be effectively cleaned, the efficiency is high, dirty data which are cleaned out are reserved in a classified mode, and sources and whereabouts of every dirty datum can be located precisely.

Method and device for cleaning mass data

Method and device for cleaning mass data

Method and device for cleaning mass data

Owner:ADVANCED NEW TECH CO LTD

Time sequence classification early warning method for storage device

InactiveCN108052528AMitigating the effects of errorsEasy to optimizeCharacter and pattern recognitionSpecial data processing applicationsHysteresisEffective solution

The invention discloses a time sequence classification early warning method for a storage device. The method comprises the steps of collecting storage device parameters in real time; cleaning data; performing ARIMA time sequence analysis; and performing logistic regression analysis and early warning mechanism output. Under the background of a big data environment, time sequence prediction analysisis performed by adopting an ARIMA model according to historical data and hard disk SMART information obtained by statistics; the correlation between a SMART eigenvalue and a fault rate of the storagedevice is analyzed; and an eigenvalue more suitable for a Logistic model is selected out to perform classification prediction. A machine learning method is adopted for predicting the fault rate of the storage device, so that the problems of classification singleness and low early warning intensity in final prediction of the storage device are solved, the defects of hysteresis, low accuracy, pooractual early warning effect and difficult application to the big data environment for a disk early warning mechanism in the prior art are overcome, the occurrence probability of each early warning intensity can be predicted, and an effective solution is provided for real-time operation maintenance and monitoring in a data center environment.

Time sequence classification early warning method for storage device

Time sequence classification early warning method for storage device

Time sequence classification early warning method for storage device

Owner:HUAZHONG UNIV OF SCI & TECH

Big data statistics method and system, computer device and storage medium

InactiveCN109753531ARelieve pressureFast statisticsDatabase management systemsInterprogram communicationMessage queueReal time analysis

The embodiment of the invention discloses a big data statistics method and system, a computer device and a storage medium. The method provided by the embodiment of the invention comprises the following steps: reading binlog of a Mysql database, and sequentially putting log records into a message queue; The message queue is consumed through an ETL service, log records in the message queue are extracted, cleaned, converted and loaded to obtain corresponding business data, and the business data are loaded to a corresponding data warehouse; carrying out real-time analysis, aggregation, query and offline calculation on the business data through a Spark distributed query engine to obtain a corresponding statistical result; According to the technical scheme, the data is imported into the warehouse in an incremental mode, the data are cleaned and stored after being cleaned, the statistical data are calculated in advance through offline calculation, the statistical data are directly taken out when the service system is used, the statistical speed is increased, and the statistical pressure of the database is reduced.

Big data statistics method and system, computer device and storage medium

Big data statistics method and system, computer device and storage medium

Big data statistics method and system, computer device and storage medium

Owner:SHENZHEN MAPGOO TECH

Resident trip mode comprehensive judging method based on handset signaling data

InactiveCN105117789AEasy way to getLower acquisition costsDetection of traffic movementForecastingTransportation planningElectric vehicle

The invention discloses a method of comprehensively judging a resident trip mode based on handset signaling data, which belongs to the transport planning and management data analyzing field. A data source is from handset signaling data provided by mobile network service providers. Data cleaning, integrating and position conversion are further performed. A resident trip mode is further judged by mobile space-time path describing and stopover point identifying. The method which can effectively discriminate seven common trip modes including walking, bicycling, routine bus, electric vehicle, self-driving, taxi and rail transit thus acquires trip mode information of residents. A data basis is further provided for fields of special traffic programs, comprehensive traffic programs and intelligent traffic systems of cities, etc.

Resident trip mode comprehensive judging method based on handset signaling data

Resident trip mode comprehensive judging method based on handset signaling data

Resident trip mode comprehensive judging method based on handset signaling data

Owner:SOUTHWEST JIAOTONG UNIV

Malicious traffic detection method, system and apparatus, and computer readable storage medium

InactiveCN108200030AImprove discrimination accuracyImprove discriminationData switching networksFeature learningData traffic

The invention discloses a malicious traffic detection method. The method comprises the following steps: correspondingly establishing malicious and normal data sample libraries by using obtained malicious and normal data traffic samples; executing a data cleaning operation and a preprocessing operation on the data sample libraries in sequence to obtain training data, and constructing a traffic detection model by using the training data and a deep learning algorithm; judging whether to-be-measured data traffic contains malicious data by using the traffic detection model; and if so, sending alarminformation carrying the to-be-measured data traffic belonging to malicious data via a preset oath. Feature learning and training are performed by using the malicious and normal data traffic samplesvia the automatic learning property of the deep learning algorithm, the feature information extraction operation is completed without consuming precious human resources, thereby improving the improving the work efficiency and improving the discrimination of the malicious traffic. Precision. The invention further discloses a malicious traffic detection system and apparatus and a computer readable storage medium, which have the above beneficial effects.

Malicious traffic detection method, system and apparatus, and computer readable storage medium

Malicious traffic detection method, system and apparatus, and computer readable storage medium

Malicious traffic detection method, system and apparatus, and computer readable storage medium

Owner:SANGFOR TECH INC

Medical clinical quality monitoring and evaluation system based on single-disease model

InactiveCN104766259AAchieving processing powerTrue masteryData processing applicationsSpecial data processing applicationsStatistical analysisClinical record

The invention discloses a medical clinical quality monitoring and evaluation system based on a single-disease model, and relates to the field of medical quality management. The system comprises a clinical data integration subsystem, a data cleaning and standardization subsystem, a statistic analysis and evaluation algorithm subsystem and a clinical quality management application subsystem. The clinical data integration subsystem is used for sending collected case data in the original diagnosis and treatment process of single diseases into a clinical record database; the data cleaning and standardization subsystem is used for selecting case data from the clinical record database and processing the data to form a single-disease evaluation database; the statistic analysis and evaluation algorithm subsystem is used for performing index calculation and comprehensive evaluation calculation on the received data; the clinical quality management application subsystem is used for displaying a comprehensive evaluation result obtained after the statistic analysis and evaluation algorithm subsystem performs calculation. Through the system, processing and statistic evaluation of the clinical data in clinical quality management are achieved, and a medical quality manager can truly and objectively master the quality of the diagnosis and treatment process of various diseases in various clinical departments in real time in a full-quantized mode.

Medical clinical quality monitoring and evaluation system based on single-disease model

Medical clinical quality monitoring and evaluation system based on single-disease model

Medical clinical quality monitoring and evaluation system based on single-disease model

Owner:HUAJU MEDICAL ASSESSMENT INFORMATION TECH BEIJING CO LTD

Training method and device of classification model, mobile terminal, and readable storage medium

InactiveCN108875821AHigh precisionImprove performanceCharacter and pattern recognitionNeural learning methodsData setAlgorithm

The application relates to a training method and device of a classification model, a mobile terminal, and a computer readable storage medium. The method comprises the following steps: training the classification model based on a preset data set until the precision of the classification model meets the standard value, wherein the data in the preset data set carries annotation information; identifying each data in the preset data set based on the trained classification model so as to acquire class information of each data; when the class information of the data and the annotation information areinconsistent, cleaning the data so as to acquire the cleaned target data set; re-training the classification model based on the cleaned target data set, thereby guaranteeing the quality of each datain the target data set based on a semi-automatic cleaning way. The data quality can be guaranteed without performing multi-stage artificial auditing mechanism, the manpower cost is greatly saved, thedata cleaning efficiency is improved, and the precision and the performance of the classification model can be improved by training the classification model based on the target data set.

Training method and device of classification model, mobile terminal, and readable storage medium

Training method and device of classification model, mobile terminal, and readable storage medium

Training method and device of classification model, mobile terminal, and readable storage medium

Owner:GUANGDONG OPPO MOBILE TELECOMM CORP LTD

Mass unstructured distribution network data integration method based on knowledge mapping technology

ActiveCN107330125AReduce the calculation amount of data fusionReduce storage pressureSpecial data processing applicationsInformatizationData source

The invention discloses a mass unstructured distribution network data integration method based on knowledge mapping technology. The method includes that a data collection unit collects unstructured distribution network data of each informatization system, and quality analysis and data cleaning processing are performed on the unstructured distribution network data of each informatization system; according to the processed unstructured distribution network data of each informatization system, data local index based on local knowledge mapping is constructed; the data local index based on the local knowledge mapping is sent to a data management center through a big data connector; the data management center constructs data global index based on global knowledge mapping. Collection, quality analysis and data cleaning of distributed multi-source heterogeneous data are advanced to each informatization system, so that data fusion calculation quantity, storage pressure and data scheduling burden of the data management center are lowered; the data global index based on the global knowledge mapping is utilized to integrate data sources, so that convenience is brought to data inquiry and extraction, and workload of the data management center is reduced.

Mass unstructured distribution network data integration method based on knowledge mapping technology

Mass unstructured distribution network data integration method based on knowledge mapping technology

Mass unstructured distribution network data integration method based on knowledge mapping technology

Owner:YUNNAN POWER GRID CO LTD ELECTRIC POWER RES INST

Method and system for integrated machine learning convenient for data analysis personnel to use

InactiveCN108363714AFulfillment requirementsLow technical costCharacter and pattern recognitionMachine learningBusiness PersonnelFeature extraction

The invention relates to the technical field of machine learning, and especially relate to a method and a system for integrated machine learning convenient for data analysis personnel to use. The method comprises the following steps: (1) data exploring; (2) data cleaning; (3) feature extraction; (4) feature selection; (5) sampling; (6) model training; (7) model optimization; (8) model combination;(9) model interpretability; (10) nature language processing. The system comprises a data processing module, a feature processing module, a model processing module, and a nature language processing module. The method and the system provide a unified algorithm modeling process for machine learning engineers, students, teachers, and machine learning fans, so that the machine learning engineers, students, teachers, and machine learning fans complete a modeling process by 20% efforts, and concentrate 80% efforts on understanding of business and model application, to deeply understand business andpreferably realize requirements of business personnel on models.

Method and system for integrated machine learning convenient for data analysis personnel to use

Method and system for integrated machine learning convenient for data analysis personnel to use

Method and system for integrated machine learning convenient for data analysis personnel to use

Owner:北京至信普林科技有限公司

Elevator as-required maintenance system and method based on big data analysis

ActiveCN108083044AImprove securityForecastingCharacter and pattern recognitionMathematical modelData access

The invention provides an elevator as-required maintenance system based on big data analysis. The elevator as-required maintenance system comprises a data source, a data access module and a data processing module. The data access module obtains data from the data source and conducts parsing and distribution, and the data processing module receives the data distributed by the data access module andconducts storage, modeling and analysis; and the data access module comprises a data parsing unit, a data cleaning unit and a data distribution unit. The invention further provides an elevator as-required maintenance method based on big data analysis. Internet of Things equipment is applied to collecting maintenance-related elevator data and elevator safety operation failure data, failure situations in the daily using process of an elevator and the online situations of the Internet of Things equipment are combined, the purpose that a daily maintenance mode of Internet of Things plus maintenance according to elevator safety operation requirements to build a quantitative index mathematical model of the elevator on which maintenance is conducted according to risks and situations is achieved,and data support is provided for maintenance reformation.

Elevator as-required maintenance system and method based on big data analysis

Elevator as-required maintenance system and method based on big data analysis

Elevator as-required maintenance system and method based on big data analysis

Owner:ZHEJIANG NEW ZAILING TECH CO LTD

Potential customer mining and recommending method

ActiveCN110222272AJudging interestHigh precisionDigital data information retrievalCharacter and pattern recognitionFeature vectorSocial platform

The invention provides a potential customer mining and recommending method, which comprises the following steps of obtaining personal information and social activity information of a user from a social platform, fusing the personal information and the social activity information with locally stored user shopping records, and carrying out data cleaning and screening to obtain data for training andtesting a potential customer classification model; constructing a user portrait according to the personal information, the social record and the shopping record of the user, processing the social record and the shopping record of the user into a feature vector form which can be used by a model, then training a user interest prediction model, and dividing the users into potential customers and passers-by; and finally, identifying and providing more targeted commodity pages for the potential customers according to the interests of the potential customers. According to the method, the interest ofthe user can be judged while the user is accurately classified; corresponding products are displayed or precise advertisement putting is implemented according to the interest judgement of the users,so that conversion of potential customers is realized; targeted recommendation can also be provided for old customers, and customer stickiness is improved.

Potential customer mining and recommending method

Potential customer mining and recommending method

Potential customer mining and recommending method

Owner:GUANGDONG UNIV OF TECH

A method and a device for constructing a patent data knowledge map

ActiveCN109189942AData processing applicationsNatural language data processingData miningData format

The invention discloses a method and a device for constructing a patent data knowledge map. The method comprises the following steps: obtaining patent data of an existing patent database, preprocessing the patent data to unify a patent data format, and segmenting the patent data after being merged with the same type to obtain the segmentation data of each type of patent data; performing knowledgeextraction of preprocessed patent data, performing data cleaning of word segmentation data of each type of patent data to obtain the corresponding subject original file, extracting keywords to obtainsubject words, and constructing a patent subject database for each type of patent data; defining the entity of patent data, determining the subject of patent data, identifying the entity and subject of patent according to the general knowledge map, mining the semantic relationship between the entity and the subject, and constructing the patent data knowledge map.

A method and a device for constructing a patent data knowledge map

A method and a device for constructing a patent data knowledge map

A method and a device for constructing a patent data knowledge map

Owner:SHANDONG UNIV

Marketing analysis data market system

InactiveCN104731791AMarketingSpecial data processing applicationsAnalysis dataData access layer

The invention provides a marketing analysis data market system. The marketing analysis data market system comprises a data access layer, a data extraction module, a data conversion module, a data cleaning module, a log and alarm sending module and a data downloading module, a data packet of the data access layer contains office data, external data and service data, and models of the system includes a data logic model and a data physics module. Firstly, necessity in designing the marketing data market is analyzed and ETL data processing including noise data processing, data uniformity and data quality and the like is analyzed by discussing a data integration method, and various data sources can be reorganized and processed by a data transition tool. In addition, the physics model in the data market is realized according to the physic list structure of the logic model. Finally, application prospect of the data market in the marketing analysis is expected.

Marketing analysis data market system

Owner:DONGYANG AIWEIDE ADVERTISING MEDIA

Heuristic Domain Targeted Table Detection and Extraction Technique

ActiveUS20190171704A1Improve accuracyHighly accurate targeted table type detection capabilityNatural language data processingSpecial data processing applicationsScan lineCorrelation analysis

A method, system, and apparatus are provided for processing tables embedded within documents wherein a first table header is detected by using semantic groupings of table header terms to identify a minimum number of table header terms in a scanned line of an text document; a potential data zone is extracted by applying white space correlation analysis to a portion of the text document that is adjacent to the first table header; one or more data zone columns from the potential data zone are grouped and aligned with a corresponding header column in the first table header to form a candidate table; data cleansing is performed on the candidate table; and then one or more columns of the candidate table are evaluated using natural language processing to apply a specified table analysis.

Heuristic Domain Targeted Table Detection and Extraction Technique

Heuristic Domain Targeted Table Detection and Extraction Technique

Heuristic Domain Targeted Table Detection and Extraction Technique

Owner:IBM CORP

Method and system for digging object grade knowledge

ActiveCN101231661AEasy to handleFlexible miningSpecial data processing applicationsProcess moduleSource Data Verification

The invention discloses an object-level information excavation system, which comprises a data collection module, a data cleaning module, a content pretreatment module and an object correlation search module, wherein, the data collection module used to collect data comprises a WEB grabber, the data cleaning module used to process structured data comprises a data verification module and a repeat-ridding process module, the content pretreatment module used to pretreat unstructured data comprises a metadata management module and a content analyzer, and the object correlation search module used to analyze the correlation degree of the processed content of the content pretreatment module comprises a correlation degree analyzer. The invention also discloses an object-level information excavation method, which comprises the following steps that: information is collected from web pages; the data cleaning process is carried out to the structured data collected; the content pretreatment operation is carried out to the unstructured data collected; the object correlation search operation is carried out to the content obtained after the pretreatment.

Method and system for digging object grade knowledge

Method and system for digging object grade knowledge

Owner:上海估家网络科技有限公司 +1

Power operation and maintenance data cleaning method based on isolation forest algorithm and neural network

ActiveCN108776683AImprove efficiencyImprove accuracyData processing applicationsCharacter and pattern recognitionAnomaly detectionEvaluation system

The invention provides a cleaning method of power communication operation and maintenance data, and more particularly to a power operation and maintenance data cleaning method based on an isolation forest algorithm and a neural network. The method includes: firstly, using the improved isolation forest algorithm to construct an isolation forest model iForest solving a target problem; then definingan evaluation system of the isolation forest algorithm on abnormal data; and carrying out prediction correction on abnormal data attributes, which are detected through an isolation forest, through training the BP neural network. According to the method, optimization is carried out on a power communication operation and maintenance data cleaning method based on the isolation forest algorithm and the neural network, abnormality detection precision is improved, data correction errors are reduced, and effective optimization is realized for a power operation and maintenance data cleaning program onaspects of abnormal-data positioning accuracy, a data correction accuracy rate, training time, resource occupation and the like.

Power operation and maintenance data cleaning method based on isolation forest algorithm and neural network

Power operation and maintenance data cleaning method based on isolation forest algorithm and neural network

Power operation and maintenance data cleaning method based on isolation forest algorithm and neural network

Owner:GUANGDONG POWER GRID CO LTD +1

Advanced data cleansing system and method

InactiveUS20160292325A1Improving measurement error estimationEasy to detectTesting/monitoring control systemsVolume variation compensation/correction apparatusObservational errorComputerized system

A cleansing system for improving operation of a plant. A server is coupled to the cleansing system for communicating with the plant via a communication network. A computer system has a web-based platform for receiving and sending plant data related to the operation of the plant over the network. A display device interactively displays the plant data. A data cleansing unit is configured for performing an enhanced data cleansing process for allowing an early detection and diagnosis of the operation of the plant based on at least one environmental factor. The data cleansing unit calculates and evaluates an offset amount representing a difference between a measurement and a simulation for detecting an error of measurement during the operation of the plant based on the plant data.

Advanced data cleansing system and method

Advanced data cleansing system and method

Advanced data cleansing system and method

Owner:UOP LLC

Intelligent terminal and stock trend prediction method based LSTM thereof

InactiveCN106991506AAvoid errorsAvoid practicalityFinanceForecastingEvaluation resultData set

The invention relates to an intelligent terminal and a stock trend prediction method based LSTM thereof. The method comprises steps of acquiring history data of a target stock, carrying out data cleaning and normalization, and dividing the data into a training data set and a test data set according to time; carrying out an offline model training on the training data so as to train multiple neural network models of the LSTM separately; acquiring a prediction value list output by the multiple neural network models of the training data, and comparing the prediction value list with an actual stock trend value to calculate an occupied weight value when the neural network models are used as a combined model; and using the test data of the test data set to estimate evaluation results of the neural network models in the combined model, thereby adjusting the occupied weight value when the neural network models are used as the combined model. According to the invention, in a way of the combined model, problems of quite big errors and quite low practicability of a simple prediction method of a single LSTM model are avoided.

Intelligent terminal and stock trend prediction method based LSTM thereof

Intelligent terminal and stock trend prediction method based LSTM thereof

Intelligent terminal and stock trend prediction method based LSTM thereof

Owner:SHENZHEN INST OF ADVANCED TECH

Product demand preference characteristic digging and quality evaluation method based on comment information

PendingCN107133214AEffective supervisionNatural language data processingData miningConditional random fieldDecision taking

The invention provides a product demand preference characteristic digging and quality evaluation method based on comment information. The method comprises the following steps: 1, crawling data, namely by using a network crawling technique, crawling product comment appointed information of an E-commerce platform and storing in a databank; 2, performing data preprocessing and product characteristic word extraction, namely performing data cleansing and preprocessing on acquired data, and further performing product characteristic extraction on preprocessed data by using a BiLSTM-CRF (Bidirectional Long Short-Term Memory-Conditional Random Field) model; and 3, performing product demand preference characteristic digging and quality evaluation. By adopting the method, quality problems of products can be rapidly understood according to feedback information of customers, demand preference characteristics of the customers can be understood, and thus companies can make relatively good decisions to meet the customers.

Product demand preference characteristic digging and quality evaluation method based on comment information

Product demand preference characteristic digging and quality evaluation method based on comment information

Product demand preference characteristic digging and quality evaluation method based on comment information

Owner:CHINA JILIANG UNIV

Intelligent equipment machine learning safety monitoring system based on user behavior

InactiveCN106230849AProtection securityAchieve securityDigital data protectionPlatform integrity maintainanceThird partyMonitoring system

The invention discloses an intelligent equipment machine learning safety monitoring system based on a user behavior. The system is characterized by comprising a first-level machine learning model oriented to the third party intelligent equipment user behavior data and a second-level user behavior machine learning model of an intelligent equipment side based on a MPU memory protection mechanism; the first-level learning model performing data cleaning on two types of data on the basis of two types of data, namely, the data of the same intelligent equipment type and the behavior data of the same individual user, by means of the user behavior data of a third party cloud platform, determining the data and the correlation needed to use by the intelligent equipment, and determining a subject of the intelligent equipment user behavior according to the type of the intelligent equipment; the second-level user behavior machine learning model of the intelligent equipment side based on the MPU memory protection mechanism, the intelligent equipment side firstly using the memory protection mechanism of the MPU to divide safety protection regions on a safety monitoring model obtained in the first-level machine learning model, and finally enabling a monitoring system to effectively protect the security of the intelligent equipment and the user.

Intelligent equipment machine learning safety monitoring system based on user behavior

Intelligent equipment machine learning safety monitoring system based on user behavior

Intelligent equipment machine learning safety monitoring system based on user behavior

Owner:INST OF INFORMATION ENG CAS

Importing and Reconciling Resources From Disjoint Name Spaces to a Common Namespace

InactiveUS20090063562A1Database management systemsDigital data processing detailsHarmonizationApplication software

A namespace exploits individual resource identity attributes of an application to allow the integration of resource instances from applications into a configuration management database (CMDB), prior to any data cleansing or namespace harmonization activities. An approach for incremental reconciliation of resource instances within the CMDB is defined.

Importing and Reconciling Resources From Disjoint Name Spaces to a Common Namespace

Importing and Reconciling Resources From Disjoint Name Spaces to a Common Namespace

Importing and Reconciling Resources From Disjoint Name Spaces to a Common Namespace

Owner:IBM CORP

Popular searches

On demand Service use Value-added service Business documents Data scrubbing Quality assurance Quality data Client-side Data-driven Documentation