Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

576 results about "Data dredging" patented technology

Data dredging (also data fishing, data snooping, data butchery, and p-hacking) is the misuse of data analysis to find patterns in data that can be presented as statistically significant when in fact there is no real underlying effect. This is done by performing many statistical tests on the data and only paying attention to those that come back with significant results, instead of stating a single hypothesis about an underlying effect before the analysis and then conducting a single test for it.

Apparatus, systems, and methods for gathering and processing biometric and biomechanical data

Apparatus, systems, and methods are provided for measuring and analyzing movements of a body and for communicating information related to such body movements over a network. In certain embodiments, a system gathers biometric and biomechanical data relating to positions, orientations, and movements of various body parts of a user performed during sports activities, physical rehabilitation, or military or law enforcement activities. The biometric and biomechanical data can be communicated to a local and / or remote interface, which uses digital performance assessment tools to provide a performance evaluation to the user. The performance evaluation may include a graphical representation (e.g., a video), statistical information, and / or a comparison to another user and / or instructor. In some embodiments, the biometric and biomechanical data is communicated wirelessly to one or more devices including a processor, display, and / or data storage medium for further analysis, archiving, and data mining. In some embodiments, the device includes a cellular telephone.
Owner:APPLIED TECH HLDG +1

Statistical modeling methods for determining customer distribution by churn probability within a customer population

InactiveUS20070185867A1Avoid erosionFacilitates efforts to retain high profitability customersDatabase queryingMarketingData dredgingCustomer attrition
A system and method for managing churn among the customers of a business is provided. The system and method provide for an analysis of the causes of customer churn and identifies customers who are most likely to churn in the future. Identifying likely churners allows appropriate steps to be taken to prevent customers who are likely to chum from actually churning. The system included a dedicated data mart, a population architecture, a data manipulation module, a data mining tool and an end user access module for accessing results and preparing preconfigured reports. The method includes adopting an appropriate definition of churn, analyzing historical customer to identify significant trends and variables, preparing data for data mining, training a prediction model, verifying the results, deploying the model, defining retention targets, and identifying the most responsive targets.
Owner:ACCENTURE GLOBAL SERVICES LTD

Motion capture element

Motion capture element for low power and accurate data capture for use in healthcare compliance, sporting, gaming, military, virtual reality, industrial, retail loss tracking, security, baby and elderly monitoring and other applications for example obtained from a motion capture element and relayed to a database via a mobile phone. System obtains data from motion capture elements, analyzes data and stores data in database for use in these applications and / or data mining, which may be charged for. Enables unique displays associated with the user, such as 3D overlays onto images of the user to visually depict the captured motion data. Ratings, compliance, ball flight path data can be calculated and displayed, for example on a map or timeline or both. Enables performance related equipment fitting and purchase. Includes active and passive identifier capabilities.
Owner:NEWLIGHT CAPITAL LLC

Method and apparatus for data mining to discover associations and covariances associated with data

Data mining techniques are provided which are effective and efficient for discovering useful information from an amorphous collection or data set of records. For example, the present invention provides for the mining of data, e.g., of several or many records, to discover interesting associations between entries of qualitative text, and covariances between data of quantitative numerical types, in records. Although not limited thereto, the invention has particular application and advantage when the data is of a type such as clinical, pharmacogenomic, forensic, police and financial records, which are characterized by many varied entries, since the problem is then said to be one of “high dimensionality” which has posed mathematical and technical difficulties for researchers. This is especially true when considering strong negative associations and negative covariance, i.e., between items of data which may so rarely come together that their concurrence is never seen in any record, yet the fact that this is not expected is of potential great interest.
Owner:IBM CORP

Consistency modeling of healthcare claims to detect fraud and abuse

Transaction-based behavioral profiling, whereby the entity to be profiled is represented by a stream of transactions, is required in a variety of data mining and predictive modeling applications. An approach is described for assessing inconsistency in the activity of an entity, as a way of detecting fraud and abuse, using service-code information available on each transaction. Inconsistency is based on the concept that certain service-codes naturally co-occur more than do others. An assessment is made of activity consistency looking at the overall activity of an individual entity, as well as looking at the interaction of entities. Several approaches for measuring consistency are provided, including one inspired by latent semantic analysis as used in text analysis. While the description is in the context of fraud detection in healthcare, the techniques are relevant to application in other industries and for purposes other than fraud detection.
Owner:FAIR ISAAC & CO INC

Apparatus, systems, and methods for gathering and processing biometric and biomechanical data

Apparatus, systems, and methods are provided for measuring and analyzing movements of a body and for communicating information related to such body movements over a network. In certain embodiments, a system gathers biometric and biomechanical data relating to positions, orientations, and movements of various body parts of a user performed during sports activities, physical rehabilitation, or military or law enforcement activities. The biometric and biomechanical data can be communicated to a local and / or remote interface, which uses digital performance assessment tools to provide a performance evaluation to the user. The performance evaluation may include a graphical representation (e.g., a video), statistical information, and / or a comparison to another user and / or instructor. In some embodiments, the biometric and biomechanical data is communicated wirelessly to one or more devices including a processor, display, and / or data storage medium for further analysis, archiving, and data mining. In some embodiments, the device includes a cellular telephone.
Owner:NIKE INC

Incorporating predicrive models within interactive business analysis processes

A Customer Relationship Management (CRM) system that incorporates predictive models. The system is used by business users who are unfamiliar with the art of data mining. The predictive model, which is constructed by a model-building mechanism in a data mining subsystem, accepts the appropriate input attributes, performs calculations against a segment comprised of records, and generates an output attribute.
Owner:TERADATA US

Method and apparatus for efficient and flexible surveillance visualization with context sensitive privacy preserving and power lens data mining

InactiveUS20080198159A1Overcome problemsQuickly explore potential abnormalitiesBurglar alarm3D modellingGraphicsData dredging
The surveillance visualization system extracts information from plural cameras to generate a graphical representation of a scene, with stationary entities such as buildings and trees represented by graphical model and with moving entities such as cars and people represented by separate dynamic objects that can be coded to selectively reveal or block the identity of the entity for privacy protection. A power lens tool allows users to specify and retrieve results of data mining operations applied to a metadata store linked with objects in the scene. A distributed model is presented where a grid or matrix is used to define data mining conditions and to present the results in a variety of different formats. The system supports use by multiple persons who can share metadata and data mining queries with one another.
Owner:PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO LTD

Gesture-based communication and reporting system

The present invention relates to a gesture-based reporting method and system, including a critical results reporting pathway, which is used to communicate critical findings to users according to predetermined methods (i.e., e-mail, facsimile, etc.), and create an electronic auditing trail to document receipt, understanding, bi-directional queries, and track clinical outcomes. Based on a predetermined rule set, predetermined data elements within the structured database could trigger the critical results reporting pathway. There is a quality assurance component to the invention, such that technical deficiencies in an imaging quality can be noted, analyzed, and tracked. There is a workflow and data analysis portion to the invention, wherein workflow is enhanced, and structured data is mapped to a standardized lexicon, such that data mining can be performed. Thus, the present invention extends beyond reporting alone and is a tool to facilitate electronic communication, consultation, education / training, and data mining for quality assurance.
Owner:IMPRIVATA

Data mining framework using a signature associated with an algorithm

A framework is provided that enables data mining algorithms to be plugged into it without any change to algorithm software implementations, while still providing all the standard data mining tasks. It may be implemented by the data source provider. It also then allows for the complete separation of data storage and algorithms. When the user initiates a mining session and picks an algorithm for build task or a model for an apply or test task, the framework may become responsible for preparing a set of “prompts” to the user asking him to provide some expression which is specific to the particular kind of data the user is working with.
Owner:ORACLE INT CORP

Designation of a Characteristic of a Physical Capability by Motion Analysis, Systems and Methods

Motion Analysis is used to classify or rate human capability in a physical domain via a minimized movement and data collection protocol producing a discreet, overall figure of merit of the selected physical capability. The minimal protocol is determined by data mining of a more extensive movement and data collection. Protocols are relevant in medical, sports and occupational applications. Kinematic, kinetic, body type, Electromyography (EMG), Ground Reactive Force (GRF), demographic, and psychological data are encompassed. Resulting protocols are capable of transforming raw data representing specific human motions into an objective rating of a skill or capability related to those motions.
Owner:SELNER ALLEN JOSEPH

Consistency modeling of healthcare claims to detect fraud and abuse

Transaction-based behavioral profiling, whereby the entity to be profiled is represented by a stream of transactions, is required in a variety of data mining and predictive modeling applications. An approach is described for assessing inconsistency in the activity of an entity, as a way of detecting fraud and abuse, using service-code information available on each transaction. Inconsistency is based on the concept that certain service-codes naturally co-occur more than do others. An assessment is made of activity consistency looking at the overall activity of an individual entity, as well as looking at the interaction of entities. Several approaches for measuring consistency are provided, including one inspired by latent semantic analysis as used in text analysis. While the description is in the context of fraud detection in healthcare, the techniques are relevant to application in other industries and for purposes other than fraud detection.
Owner:FAIR ISAAC & CO INC

Inference control method in a data cube

A system for editing the dimension structure associated with a data cube is revealed. The editing may be used to enforce given criteria on the cube. This includes modifying the cube in order to satisfy regulations requiring researchers to protect information about individuals, such as medical, genealogy and genetics records. The inference control methods disclosed therefore enable safe aggregated datasets to be released to researchers. When combined with information theoretic methods, the invention method of editing the dimension structures may be used to express clearly and discover correlations that exist in the dataset. This mining of the data and editing of the dimension structure allows the user of a simple multidimensional cube viewer to visually verify the patterns discovered.
Owner:DECODE GENETICS EHF

Method for automatic evaluation based on generalized fluent spoken language fluency

ActiveCN101740024ATroubleshoot automated assessment issuesFast scoringSpeech recognitionData dredgingSpoken language
The invention relates to a method for automatic evaluation based on generalized fluent spoken language fluency, which comprises the following steps of: acquiring speech data according to different ages and spoken language levels by using a speech input device; adopting an evaluating model based on characteristics of the generalized fluency and the machine learning training fluency; configuring a speech recognition system with corresponding parameters according to scripts of different subjects and genders of enunciators in the speech data; performing quantification on speech speed coherence, content understanding, advanced skills and reconstruction standard characteristics in the speech data to comprehensively extract the characteristics of the fluency from the speech data from the angle of expert assessment and evaluation; and adopting a decision tree method in regression fitting analysis and data mining to detect faults of abnormal fluency and grade and diagnose the fluency. The acquired score of the machine fluency can reach the level close to that of grading experts, and the relativity index exceeds that of 2 to 3 of general 5 experts; besides, the method has a high speed, and can be embedded into a spoken language automatic evaluation system to serve as an important module to evaluate fluency indexes in pronunciation quality.
Owner:IFLYTEK CO LTD

System and method for utilizing motion capture data

System and method for utilizing motion capture data for healthcare compliance, sporting, gaming, military, virtual reality, industrial, retail loss tracking, security, baby and elderly monitoring and other applications for example obtained from a motion capture element and relayed to a database via a mobile phone. System obtains data from motion capture elements, analyzes data and stores data in database for use in these applications and / or data mining, which may be charged for. Enables unique displays associated with the user, such as 3D overlays onto images of the user to visually depict the captured motion data. Ratings, compliance, ball flight path data can be calculated and displayed, for example on a map or timeline or both. Enables performance related equipment fitting and purchase. Includes active and passive identifier capabilities.
Owner:NEWLIGHT CAPITAL LLC

Clustering analysis and decision tree algorithm-based truck loading work time prediction model

The invention discloses a clustering analysis and decision tree algorithm-based truck loading work time prediction model. A clustering analysis and decision tree mixed algorithm is introduced, factors influencing inventory control are abstracted out, related historical data serves as a training sample, and finally the truck loading work time can be effectively predicted by using a trained decision tree data model; and the historical data of truck loading is deeply mined by utilizing a data mining technology based on a demand, and an available, easy-to-use and high-accuracy data model is generated. The clustering analysis and the decision tree algorithm are combined and complement each other, so that the accuracy of the data model is improved; an optimization policy is adopted for an original decision tree algorithm under the condition of establishing a simple and accurate data model, so that the calculation amount is reduced and the algorithm efficiency is improved; and through the data model, a relatively accurate time interval of cargo loading can be predicted and used for better manual decision-making.
Owner:WUHAN BAOSTEEL CENT CHINA TRADE

Intelligent ammeter fault real time prediction method based on decision-making tree

ActiveCN106054104AReflect real-time fault conditionsElectrical measurementsData dredgingSmart meter
Provided is an intelligent ammeter fault real time prediction method based on a decision-making tree, comprising the steps of: 1, pre-processing intelligent ammeter data of an electricity information acquisition system; 2, according to an intelligent ammeter fault determination model, screening the fault data of intelligent ammeters in the electricity information acquisition system and sending the fault data into an intelligent ammeter fault database; 3, dividing the historic data in the intelligent ammeter fault database into a training set and a test set, employing a decision-making tree algorithm to perform data excavation on the training set, and forming an intelligent ammeter fault decision-making tree and a preliminary classification rule; 4, through the data of the test set, performing accuracy assessment on the preliminary classification rule, determining the preliminary classification rule if the accuracy meets requirements, or else returning to the training set for training again; 5, generating an intelligent ammeter fault real time prediction model according to a finally determined classification rule; and 6, linking an intelligent ammeter real time fault database to the intelligent ammeter fault real time prediction model for real time prediction to obtain intelligent ammeter fault real time prediction results.
Owner:国网新疆电力有限公司营销服务中心 +1

Automatic data explorer that determines relationships among original and derived fields

An automatic data mining tool that characterizes the relationships between different database fields from both structured and unstructured data. It extracts a data model, identifies and categorizes all the data fields, performs pre-processing to deal with unstructured data effectively, and processes the data without human intervention to automatically explore how the fields are related to one another. Prior to the commencement of user-controlled data mining, the present invention goes through all the fields in a database table space in order to establish meaningful relationships between various fields using whatever computer resources are available (i.e. by using "cycle stealing"). This allows the present invention to run in the background and establish relationships between fields even before data mining (DM) begins, and determine redundant, useless, and / or trivial fields without any external guidance. This results in faster, more accurate data mining since these relationships are available before a user begins the process of data mining.
Owner:LOYOLA MARYMOUNT UNIV +1

Urgent collection method based on credit score and urgent collection device thereof

InactiveCN106952155ABeneficial technical effectImprove the effect of collectionFinanceRisk levelData dredging
The invention discloses an urgent collection method based on the credit score and an urgent collection device thereof. The urgent collection method comprises the steps of collecting urgent collection object credit information, processing the collected credit information, establishing a credit risk scoring model, predicting the overdue debt collection probability and performing the targeted urgent collection strategy. The risk level can be automatically judged according to the urgent collection score result so that the urgent collection strategy of the client can be reasonably determined, the urgent collection effect can be enhanced and the bad debt loss can be reduced. Data mining acts as the analysis technology so that the modeling time can be saved and the data support can be provided for the urgent collection industry. Besides, user information can also be updated in real time, the urgent collection risk score can be more accurately predicted, the most appropriate urgent collection strategy can be performed and the urgent collection task efficiency can be enhanced.
Owner:深圳前海纵腾金融科技服务有限公司

Distributed knowledge data mining device and mining method used for complex network

The invention discloses a distributed knowledge data mining device and method used for a complex network. The distributed knowledge data mining device adopts a distributed computing platform which is composed of a control unit, a computing unit and a man-machine interaction unit, wherein the innovation key is to finish the calculated amount needed by a multifarious clustering algorithm in the data mining by different servers so as to improve the efficiency of the data mining. Aiming at different knowledge data, the degrees of relation and the weights of knowledge data also can be computed by applying different standards, so that a more credible result is obtained. A second-level clustering mode is adopted in the knowledge data mining process; the result of the first-level clustering is relatively rough, but the computing complexity is very low; and the computing complexity of the second-level clustering is relatively high, but the result is more precise. By combining the first-level clustering with the second-level clustering efficiently, the distributed knowledge data mining device improves the time complexity and clustering precision greatly in comparison with the traditional first-level clustering mode. According to the invention, as a visual and direct exhibition network structure and a dynamic evolutionary process are adopted, references are provided for the prediction in the fields of disciplinary development and hotspot research.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Method and system for detecting unusual events and application thereof in computer intrusion detection

An automated decision engine is utilized to screen incoming alarms using a knowledge-base of decision rules. The decision rules are updated with the assistance of a data mining engine that analyzes historical data. “Normal” alarm events, sequences, or patterns generated by sensors under conditions not associated with unusual occurrences (such as intrusion attacks) are characterized and these characterizations are used to contrast normal conditions from abnormal conditions. By identifying frequent occurrences and characterizing them as “normal” it is possible to easily identify anomalies which would indicate a probable improper occurrence. This provides very accurate screening capability based on actual event data.
Owner:TREND MICRO INC

Image processing of mass spectrometry data for using at multiple resolutions

A system and method for utilizing an image processing technique to transform raw data collected by a mass spectrometer into a hierarchical data format. The image processing technique may include the use of a wavelet transform. The hierarchical data format, provides for using the transformed data at multiple resolutions without data loss for such operations as data mining, matching, and displaying, for example. Further, the transformed data enables higher levels of data compression than generally possible from directly compressing the raw data. Additionally, the transformed data provides can be used to identify and suppress noise.
Owner:EFECKTA TECH CORP

Product comment analyzing method and system with learning supervising function

The invention belongs to the data mining and natural language processing technical field, specifically a product comment analyzing method and system with learning supervising function. According to the method, multiple categories (product features) are artificially defined aiming at a specific product; firstly, product feature aspect classification is carried out to collected user comments by a machine learning and training classifier successively; then emotion analysis is carried out to the comment texts classified by a training classifier; and finally quantitative comments on each feature of the product are summarized for a user through comprehensively counting the product features related to a great deal of comment texts and corresponding emotional tendencies. The method and the system of the invention are relatively efficient, rapid, simple and convenient; the classified contents are all user concerned contents; the analysis result is provided to the user visually and clearly; and the work of viewing a great deal of comments can be removed.
Owner:FUDAN UNIV

Data mining model interpretation, optimization, and customization using statistical techniques

A system, method, and program product for interpreting, optimizing, and customizing data mining models through the use of statistical techniques that utilize diagnostic measures and statistical significance testing. A data processing system is disclosed that includes a data mining system for mining data from a data warehouse in accordance with a data model, wherein the data model defines how data groups can be partitioned; and a data group analysis system that calculates a set of diagnostic measures and performs statistical significance tests for a defined data group.
Owner:IBM CORP

Database query optimization using clustering data mining

A method and system for optimizing a database query. A database table populated with data is received and scanned. Statistics and single column histograms associated with single columns of the table are determined. Cardinality based on the statistics and histograms is estimated. All possible correlations among multiple columns are determined by performing clustering data mining that partitions data in the table into clusters. Top ranked columns based on the correlations are determined. The difference between the estimated cardinality and a support count of a cluster is determined to exceed a threshold, and in response, multiple column histograms based on the top ranked columns are determined. An optimal query plan based on the multiple column histograms is generated.
Owner:IBM CORP

Mobile data traffic package recommendation algorithm based on user historical data

The invention provides a mobile data traffic package recommendation algorithm based on user historical data according to data mining analysis technology. The mobile data traffic package recommendation algorithm comprises the following steps of: 1) a target user finding period comprising the processes of a, acquiring a processed generated data set which comprises a training set and a prediction set, b, executing a random forest classification algorithm for finding a latent data traffic package improving user as a target user, and c, ending; 2), a data traffic package recommendation period comprising the process of a, acquiring a processed generated prediction set, b, executing a K-means clustering algorithm for obtaining a slightly similar user cluster, c, obtaining the target user obtained in the process 1)-b, d, executing a TopN recommendation algorithm on the target user in a same cluster according to a similarity function of the user, and e, ending. The mobile data traffic package recommendation algorithm is used for finding the latent user with a latent data traffic improvement requirement according to data mining technology and executing a recommended plan on the user. Compared with a traditional method, the mobile data traffic package recommendation algorithm has advantages of higher accuracy, higher efficiency, simple realization, low cost, etc.
Owner:NANJING UNIV

Method for selecting regression test case for clustering with semi-supervised information

The invention discloses a method for selecting a regression test case for clustering with semi-supervised information. The method comprises the following steps: recording the execution overage information of the test case, generating a function execution profile, and representing the test case in a quantitative form; analyzing the historical test results to obtain the constraint relationship among test cases; and analyzing the test cases with a semi-supervised clustering algorithm to obtain similarities and differences of the execution conditions of the test cases, understand the relation between program behaviors and the test cases, effectively reduce the number of test cases in the regression test stage and maintaining enough high error detection capability. According to the invention, the program is understood according to the internal relation of the program behaviors revealed by the test cases based on the data mining technology so that the selection of the test cases is easier and more automatic, the tests cases can be used more effectively in regression tests, the test case selection accuracy is promoted, and the regression test efficiency is improved.
Owner:NANJING UNIV

Non-invasive load identification algorithm based on hybrid neural network and ensemble learning

The invention belongs to the data mining and machine learning field and relates to a non-invasive load identification algorithm based on a hybrid neural network and ensemble learning. According to the method, experimental data are processed, so that the format of the data conforms to the input formats of models; after the data are processed, a hybrid neural network model is established; the data are input into the model; the model is trained and tested, identification results are obtained; and voting is performed for the results of three different models based on the idea of ensemble learning, so that a final identification result is obtained. With the method adopted, the feature extraction effect and load identification effect of the hybrid neural network are better than the effects of a traditional neural network; an ensemble learning idea-based method is provided, a plurality of feature subsets are selected from a total feature set so as to train a plurality of base classifiers, and the base classifiers are combined, and therefore, variance can be decreased, and the identification effect of the final identification result can be improved, and the problem of adverse influence of the introduction of harmonic features on an identification effect can be solved.
Owner:NORTH CHINA ELECTRIC POWER UNIV (BAODING)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products