Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

33 results about "Categorical variable" patented technology

In statistics, a categorical variable is a variable that can take on one of a limited, and usually fixed number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property. In computer science and some branches of mathematics, categorical variables are referred to as enumerations or enumerated types. Commonly (though not in this article), each of the possible values of a categorical variable is referred to as a level. The probability distribution associated with a random categorical variable is called a categorical distribution.

Method, system and computer program product for visually approximating scattered data using color to represent values of a categorical variable

A method, system, and computer program product for a new data visualization tool for determining distribution weights that represent values of a categorical variable and then mapping a distinct color to each of the weights so as to visually represent the different values of the categorical variable (or data attribute) in a scatter plot. The distinct colors of a splat are based on the distribution of categorical variable values in a corresponding bin, the distribution of which is represented by a vector. The vector contains as many locations as the number of different values for the categorical variable. The value stored in each location is typically a weight or percentage for that particular value of the categorical variable. Each location in the vector is also associated with a distinct color. The coloring of a single splat with multiple colors involves the rendering of each vector by looping through each vector location, and then based on the weight stored in that location, randomly selecting the same percentage of triangles in the splat for the color associated with that vector location. A threshold is used to help reduce confusion and decrease processing time by summing all weights below the threshold and assigning to it a single neutral color. A slider or other controller can be used to vary the value of the threshold.
Owner:MORGAN STANLEY +1

Method for detecting urea-doped milk based on synchronous-asynchronous two-dimensional near-infrared related spectra

The invention relates to a method for detecting urea-doped milk based on synchronous-asynchronous two-dimensional near-infrared related spectra. The method comprises the following steps: 1, preparing pure milk for experiments and urea-doped milk; 2, respectively scanning the near-infrared spectra of the pure milk for experiments and the urea-doped milk; 3, calculating to obtain the normalization synchronous-asynchronous two-dimensional near-infrared related spectrum matrix of the pure milk for experiments and the normalization synchronous-asynchronous two-dimensional near-infrared related spectrum matrix of the urea-doped milk; 4, building a discrimination model with a categorical variable matrix by a multi-dimensional partial least squares; 5, scanning and calculating unknown sample milk to obtain the synchronous-asynchronous two-dimensional near-infrared related spectrum matrix of the unknown sample milk, and substituting into the discrimination model to obtain whether urea is doped or not. Similarity and difference information of a to-be-analyzed system, changing with external interference, are fully utilized, and the influence of the single adoption of synchronous spectrum or asynchronous spectrum matrix redundant information on the model is overcome. The method is simple and scientific, and the analysis efficiency and the discriminating accuracy are high.
Owner:天津市浓昇农业科技有限公司

Power transformer defect information data mining method

ActiveCN105843210AReasonable and effective maintenance strategyEliminate omissionsElectric testing/monitoringData dredgingData set
The invention discloses a power transformer defect data mining method. The method includes the following steps that: defect attribute screening is performed on the historical defect data set D0 of a power transformer, so that a defect data set D1 can be formed; filling or deletion is performed on defect attributes in the D1, so that noise data can be decreased; new attributes are constructed based on existing attributes of the D1, discretization is performed on continuously-valued attributes, reasonable stratification is performed on categorical attributes, and therefore, a defect data set D2 can be formed; the correlation between input attributes and target attributes is calculated, uncorrelated attributes are deleted, the remaining attributes form a defect data set D3; the association relationships between the attributes of the defect data set are calculated by using an Apriori algorithm; and effective association rules are extracted, the defect factors of the power transformer are analyzed, an association rule knowledge base can be formed. With the power transformer defect data mining method of the invention adopted, the defects of the power transformer can be mined in a multi-dimensional and multi-level manner, the association relationships between the attributes can be extracted conveniently and fast, a basis can be provided for power transformer condition evaluation, and the accuracy of condition evaluation can be improved.
Owner:TSINGHUA UNIV +1

Small sample PolSAR image classification method based on fuzzy label semantic prior

ActiveCN110096994AEnsure consistencyAvoid the problem of increased calculationScene recognitionNeural architecturesSmall sampleAlgorithm
The invention discloses a small sample PolSAR image classification method based on fuzzy label semantic prior. The method comprises the steps of preparing a PolSAR image to be classified; obtaining real polarization characteristics as input data of the network; obtaining a sampling matrix for recording the position of the training sample and a sampling label matrix for recording pixel label information at the corresponding position; utilizing the sampling label matrix to initialize and classify to build a full convolutional network FCN; sending the real-number input data, the sampling matrix,the sampling label matrix and the classification matrix to a built full convolutional network FCN for training; updating the classification matrix by utilizing the prediction result of the FCN, the sampling matrix, the sampling label matrix and the current state of the classification matrix; repeating the operation until the maximum number of iterations is met; outputting the final classificationmatrix; and calculating classification accuracy and a classification result graph to complete image classification. According to the invention, alternate iteration training is carried out on the deepfull convolution network parameters and the label category variables, and the problem of low PolSAR classification precision under a small sample problem is solved.
Owner:XIDIAN UNIV

Nominal attribute-based continuous type feature construction method

InactiveCN106897776AStrong scalabilitySuitable for parallelizationMachine learningFeature extractionData set
The invention discloses a nominal attribute-based continuous type feature construction method. The method comprises the following steps of: 1) performing data preprocessing; 2) setting a feature construction frame according to business background knowledge; 3) generating concrete a feature construction path; 4) constructing corresponding features according to the feature construction path and generating a training set; (5) performing feature selection on the training set and constructing a prediction model; (6) saving the relevant data set and the prediction model and terminating an off-line training process; 7) performing preprocessing and feature extraction on sample data required to be subjected to on-line prediction; and 8) using a prediction model obtained through the off-line training to predict a sample. The nominal attribute-based continuous type feature construction method of the invention cannot only be applied to a user-item scene and but also be applied to more general classification and regression prediction problems with nominal attributes or categorical variable features. Compared with traditional One-Hot and Dummy coding, the features generated by using the method of the invention make the differences of samples more obvious and have strong interpretability.
Owner:SOUTH CHINA UNIV OF TECH

Unordered categorical variable processing method and device

The invention provides an unordered categorical variable processing method and device. The method comprises the following steps of: obtaining an unordered categorical variable set, wherein the unordered categorical variable set comprises at least two categories of unordered categorical variables and corresponding dependent variables are binary variables; aiming at each category of unordered categorical variables in the unordered categorical variable set, carrying out statistical analysis on a categorical proportion, in the category of unordered categorical variables, of an unordered categorical variable, the dependent variable values of which is a target categorical value in the binary variables; and clustering the unordered categorical variable set on the basis of the categorical proportion of each category of unordered categorical variables so as to obtain a plurality of unordered categorical variable subsets, wherein each unordered categorical variable subset comprises at least onecategory of unordered categorical variables and each unordered categorical variable subset corresponds to an ordered categorical variable. According to the method and device, grouping can be realizedwithout participation of human experiences, so that the grouped processing efficiency is relatively high and the objectivity and correctness of grouping results are further enhanced.
Owner:GUOXIN YOUE DATA CO LTD

Method and system for optimizing operator's mobile service resources

The invention discloses a method and a system for optimizing mobile service resources of an operator. The method includes: counting the historical dialing data of the operator's customers, and the dialing data is a continuous variable; converting the continuous variable into a discrete characteristic variable through chi-square analysis; taking whether the customer has opened a mobile service as a binary classification variable, establishing the characteristic variable and classification Variable C4.5 decision tree model, wherein, in the decision tree model, calculate the information gain rate corresponding to each segmentation, and select the segmentation threshold with the largest information gain rate as the optimal segmentation threshold for this attribute; according to the decision The tree model calculates the value of the classification variable to obtain the prediction result of whether the customer subscribes to the mobile service; optimizes the mobile service resources of the operator according to the prediction result. Through the technical solution provided by the invention, the customer's demand for mobile service can be efficiently obtained from the current customer's dialing behavior, so as to realize the optimized deployment of the operator's mobile service resources.
Owner:CHINA TELECOM CORP LTD

Standardizing and abstraction system of records measured by a plurality of physical quantities' measuring devices

Processing system for standardization and abstraction of registers measured by measuring devices (1), which comprise processing means of measured registers (5) received for generating processed registers (5a) storable in a storage database (6) of processed registers, characteristics schemes (7a) of separate models of measuring device (1), which comprise at least one module (9) and at least one submodule (10) assigned to the said module (9) on the basis of its functioning mode (17a), each module (9) being allotted to at least one memory position of a measuring device model (1), associated with a single measuring point and assigned to at least one category map (8a) in which a submodule (10) is related to a category variable, with assignment tables (16) of category variables, mapping means (11) with assignment means (12) and transformation means (13) provided for transforming values of each measured register (1a) into values of processed registers (5a), expressed in the pre-established equivalent unit of measurement assigned to the corresponding submodule (10). The assignment means (12) are designed for assigning to each measured register (1a) read in the memory position (18a) corresponding to a submodule (10), the category variable with which the submodule is related in the assignment table (16).
Owner:IBERIA TECH INTEGRATED SOLUTIONS S L U

Data Mining Method for Power Transformer Defect Information

The invention discloses a power transformer defect data mining method. The method includes the following steps that: defect attribute screening is performed on the historical defect data set D0 of a power transformer, so that a defect data set D1 can be formed; filling or deletion is performed on defect attributes in the D1, so that noise data can be decreased; new attributes are constructed based on existing attributes of the D1, discretization is performed on continuously-valued attributes, reasonable stratification is performed on categorical attributes, and therefore, a defect data set D2 can be formed; the correlation between input attributes and target attributes is calculated, uncorrelated attributes are deleted, the remaining attributes form a defect data set D3; the association relationships between the attributes of the defect data set are calculated by using an Apriori algorithm; and effective association rules are extracted, the defect factors of the power transformer are analyzed, an association rule knowledge base can be formed. With the power transformer defect data mining method of the invention adopted, the defects of the power transformer can be mined in a multi-dimensional and multi-level manner, the association relationships between the attributes can be extracted conveniently and fast, a basis can be provided for power transformer condition evaluation, and the accuracy of condition evaluation can be improved.
Owner:TSINGHUA UNIV +1

A real-time prediction method for urban road traffic accident risk

InactiveCN104732075BIn line with traffic characteristicsImprove accuracyForecastingTraffic accidentTraffic flow
The invention provides a real-time prediction method of urban road traffic accident risk, which calculates by extracting the geometric linear data of each observation object in the observation set, the basic data of historical traffic flow n minutes before the occurrence of the traffic accident, and the historical weather condition data to obtain traffic The characteristic parameters of traffic flow n minutes before the accident and weather condition data are transformed into grades of categorical variables and the distribution probability of this grade, and then a real-time prediction model of urban road traffic accidents based on Poisson distribution is established, using the determined traffic flow characteristic parameters The level of weather condition data and the distribution probability of this level are used to calibrate the prediction model. When predicting the traffic accident risk of the required prediction object, it is only necessary to calculate the real-time traffic flow characteristic parameters and weather conditions of the required prediction object in real time. The level after the data is converted into a categorical variable and the distribution probability of the level can be used to predict the traffic accident risk of the desired prediction object using the calibrated formula.
Owner:SUN YAT SEN UNIV

Method for detection of milk mixed with urea based on synchronous-asynchronous two-dimensional near-infrared correlation spectroscopy

The invention relates to a method for detecting urea-doped milk based on synchronous-asynchronous two-dimensional near-infrared related spectra. The method comprises the following steps: 1, preparing pure milk for experiments and urea-doped milk; 2, respectively scanning the near-infrared spectra of the pure milk for experiments and the urea-doped milk; 3, calculating to obtain the normalization synchronous-asynchronous two-dimensional near-infrared related spectrum matrix of the pure milk for experiments and the normalization synchronous-asynchronous two-dimensional near-infrared related spectrum matrix of the urea-doped milk; 4, building a discrimination model with a categorical variable matrix by a multi-dimensional partial least squares; 5, scanning and calculating unknown sample milk to obtain the synchronous-asynchronous two-dimensional near-infrared related spectrum matrix of the unknown sample milk, and substituting into the discrimination model to obtain whether urea is doped or not. Similarity and difference information of a to-be-analyzed system, changing with external interference, are fully utilized, and the influence of the single adoption of synchronous spectrum or asynchronous spectrum matrix redundant information on the model is overcome. The method is simple and scientific, and the analysis efficiency and the discriminating accuracy are high.
Owner:天津市浓昇农业科技有限公司

Method and apparatus for high-dimensional market segmentation based on canonical correlation

The invention relates to a market segmentation method, in particular to a high-dimensional market segmentation method and device based on typical correlation. The method of the invention comprises thefollowing steps: S1, according to a sample, establishing a data file of an original variable; S2, according to the original variable in S1, selecting a plurality of index variable groups, performingnonlinear canonical correlation analysis, and determining the dimension of the sample and the canonical correlation coefficient of each dimension, when the determined canonical correlation coefficientis greater than a set threshold, obtaining the object score on each dimension of the sample according to the canonical correlation coefficient; S3, according to the object score of the sample, the customer group is classified by the cluster analysis; S4, describing the market segmentation result according to the classification result. The invention is a segmentation method based on the relationship among a plurality of index variable groups, which is suitable for the market segmentation of multi-dimensional variables, and realizes the quantification of category variables and the dimension reduction of data. The invention has wide application range and high reliability of subdivision result.
Owner:云图元睿(上海)科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products