Data analysis method, device, equipment and storage medium
By preprocessing, clustering analysis, and regression tree construction of natural gas sales data, the problem of large errors caused by manual Excel analysis was solved, enabling more accurate data analysis and enterprise decision support.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- RICHFIT INFORMATION TECH
- Filing Date
- 2024-12-17
- Publication Date
- 2026-06-19
AI Technical Summary
The existing technology of manually analyzing natural gas sales data using Excel spreadsheets results in large errors and poor analysis results.
Data mining algorithms are used to preprocess, cluster, and construct regression trees for natural gas sales data, generating a sales data prediction model and displaying it visually.
It improves the accuracy of natural gas sales data analysis and enhances corporate decision support capabilities, helping companies formulate more reasonable market strategies and marketing tactics.
Smart Images

Figure CN122241404A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of data processing technology, and in particular to a data analysis method, apparatus, device, and storage medium. Background Technology
[0002] Natural gas is a widely used energy source.
[0003] Currently, in the process of evaluating natural gas sales, the comprehensive evaluation and analysis of natural gas sales data is achieved by manually analyzing Excel spreadsheets. However, manual statistical analysis is prone to large errors, resulting in poor data analysis results for natural gas. Summary of the Invention
[0004] This application provides a data analysis method, apparatus, device, and storage medium, which can improve the analysis effect of natural gas sales data. The technical solution provided by this application is as follows:
[0005] According to one aspect of the embodiments of this application, a data analysis method is provided, the method comprising:
[0006] Obtain raw data related to natural gas sales;
[0007] The original data is preprocessed to obtain preprocessed data;
[0008] Data mining algorithms are used to analyze the preprocessed data to obtain data analysis results;
[0009] The data analysis results are post-processed to obtain post-processed analysis data, which provides data support for enterprise decision-making.
[0010] In some embodiments, the raw data is categorized by region and type.
[0011] In some embodiments, preprocessing the original data to obtain preprocessed data includes:
[0012] The original data is normalized to obtain the preprocessed data;
[0013] Among them, for data X in the original data i The preprocessing formula is:
[0014]
[0015] Among them, X i ′ is for X i The data obtained after the preprocessing is μ, where X is...i The average value of the data in the corresponding dimension.
[0016] In some embodiments, the data mining algorithm employed includes cluster analysis, which includes:
[0017] Based on the natural gas sales scenario, the number of clusters for the preprocessed data is determined to be k;
[0018] Based on the cluster centers of the k clusters of the preprocessed data, calculate the membership degree of each data point of the preprocessed data to its respective cluster;
[0019] Based on the membership degree of each data point in the preprocessed data to its respective cluster, the clusters to which each data point in the preprocessed data belongs are adjusted until the clustering is completed, and the clustering analysis results are obtained. The clustering results include k clusters that have been clustered, and each cluster includes at least one data point.
[0020] In some embodiments, the membership degree is calculated using the following formula:
[0021]
[0022] Where d(x) i c k ) is the data point x i and cluster center c k The distance between them, σ is the standard deviation parameter.
[0023] In some embodiments, the distance between the data point and the corresponding cluster center is obtained by calculating the probability that the data point belongs to each Gaussian distribution.
[0024] In some embodiments, the probability of a data point belonging to a Gaussian distribution is calculated using the following formula:
[0025]
[0026] Where, x i For data points, c k For x i The cluster center of the cluster to which it belongs, where Σk is the weight of the corresponding parameter.
[0027] In some embodiments, after the cluster analysis is completed, the method further includes:
[0028] From the k clusters, select the data in the first cluster as a subset of data;
[0029] Construct a regression tree based on the aforementioned subset of data;
[0030] Based on the regression tree, a predictive model for the sales data of the natural gas is generated.
[0031] In some embodiments, the method further includes:
[0032] In response to a query operation on natural gas sales-related data, data corresponding to the query operation is obtained from the relevant data, which includes the raw data and the post-processed analytical data.
[0033] The data corresponding to the query operation in the relevant data will be visualized.
[0034] In some embodiments, the query operation refers to a query operation based on a first filtering condition;
[0035] The step of visualizing the data corresponding to the query operation from the relevant data includes:
[0036] The data that meets the first filtering criteria from the relevant data will be visualized.
[0037] According to one aspect of the embodiments of this application, a data analysis apparatus is provided, the apparatus comprising:
[0038] The data acquisition module is used to acquire raw data related to natural gas sales;
[0039] The preprocessing module is used to preprocess the original data to obtain preprocessed data;
[0040] The data analysis module is used to analyze the preprocessed data using data mining algorithms to obtain data analysis results;
[0041] The post-processing module is used to perform data post-processing on the data analysis results to obtain post-processed analysis data, which provides data support for enterprise decision-making.
[0042] In some embodiments, the raw data is categorized by region and type.
[0043] In some embodiments, the preprocessing module is configured to:
[0044] The original data is normalized to obtain the preprocessed data;
[0045] Among them, for data X in the original data i The preprocessing formula is:
[0046]
[0047] Among them, Xi ′ is for X i The data obtained after the preprocessing is μ, where X is... i The average value of the data in the corresponding dimension.
[0048] In some embodiments, the data analysis module includes:
[0049] The quantity determination submodule is used to determine the number of clusters of the preprocessed data as k based on the natural gas sales scenario;
[0050] The membership calculation submodule is used to calculate the membership degree of each data point in the preprocessed data to its respective cluster based on the cluster centers of the k clusters of the preprocessed data.
[0051] The clustering submodule is used to adjust the cluster to which each data point of the preprocessed data belongs based on the membership degree of each data point to its respective cluster, until the clustering is completed, and obtain the clustering analysis result. The clustering result includes k clusters that have been clustered, and each cluster includes at least one data point.
[0052] In some embodiments, the membership degree is calculated using the following formula:
[0053]
[0054] Where d(x) i c k ) is the data point x i and cluster center c k The distance between them, σ is the standard deviation parameter.
[0055] In some embodiments, the distance between the data point and the corresponding cluster center is obtained by calculating the probability that the data point belongs to each Gaussian distribution.
[0056] In some embodiments, the probability of a data point belonging to a Gaussian distribution is calculated using the following formula:
[0057]
[0058] Where, x i For data points, c k For x i The cluster center of the cluster to which it belongs, where Σk is the weight of the corresponding parameter.
[0059] In some embodiments, the apparatus further includes:
[0060] The data selection module is used to select data from the first cluster as a data subset from the k clusters;
[0061] A regression tree construction module is used to construct a regression tree based on the data subset;
[0062] The prediction module is used to generate a prediction model for the sales data of the natural gas based on the regression tree.
[0063] In some embodiments, the apparatus further includes:
[0064] The data acquisition module is also used to respond to a query operation on natural gas sales-related data, and to acquire data corresponding to the query operation from the related data, wherein the related data includes the raw data and the post-processed analysis data;
[0065] The visualization module is used to visualize the data in the relevant data that corresponds to the query operation.
[0066] In some embodiments, the query operation refers to a query operation based on a first filter condition; the visualization module is used for:
[0067] The data that meets the first filtering criteria from the relevant data will be visualized.
[0068] According to one aspect of the embodiments of this application, a computer device is provided, the computer device including a processor and a memory, the memory storing a computer program, the computer program being loaded and executed by the processor to implement the above-described data analysis method.
[0069] According to one aspect of the embodiments of this application, a computer-readable storage medium is provided, wherein a computer program is stored in the computer-readable storage medium, the computer program being loaded and executed by a processor to implement the above-described data analysis method.
[0070] According to one aspect of the embodiments of this application, a computer program product is provided, which is loaded and executed by a processor to implement the above-described data analysis method.
[0071] The technical solutions provided in this application embodiment may have the following beneficial effects:
[0072] By processing raw natural gas data and analyzing it using data mining algorithms, an automated analysis system for natural gas sales data can be built. This system provides in-depth understanding of market trends, customer needs, competitor dynamics, and other information, offering data support for corporate decision-making and helping to formulate more reasonable and effective decisions, thereby improving the effectiveness of natural gas sales data analysis.
[0073] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit this application. Attached Figure Description
[0074] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0075] Figure 1 This is a flowchart of a data analysis method provided in one embodiment of this application;
[0076] Figure 2 This is a block diagram of a data analysis apparatus provided in one embodiment of this application;
[0077] Figure 3 This is a block diagram of a computer device provided in one embodiment of this application. Detailed Implementation
[0078] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of methods consistent with some aspects of this application as detailed in the appended claims.
[0079] The method provided in this application can be executed by a computer device, which refers to an electronic device with data computing, processing, and storage capabilities. This computer device can be a terminal such as a PC (Personal Computer), tablet computer, smartphone, wearable device, or intelligent robot; or it can be a server. The server can be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing cloud computing services.
[0080] This application provides a data analysis method, which can also be called a method for building a natural gas sales data analysis system. Based on the data records filled in and obtained through the interface, it analyzes the characteristics of sales business and establishes a sales operation evaluation system from the dimensions of planning, daily specification, and metering business.
[0081] The technical solution of this application will be described and illustrated below through several embodiments.
[0082] Please refer to Figure 1The diagram illustrates a flowchart of a data analysis method provided in one embodiment of this application. In this embodiment, the method is primarily illustrated by its application to the computer device described above. The method may include at least one of the following steps (110-140).
[0083] Step 110: Obtain raw data related to natural gas sales.
[0084] Natural gas is a type of combustible gas found in nature and is a fossil fuel. It is deposited in porous underground rock formations, including oilfield gas, gas field gas, coalbed methane, mud volcano gas, and biogenic gas, with small amounts also found in coal seams. Natural gas is a high-quality fuel and chemical feedstock.
[0085] In some embodiments, raw data related to natural gas sales can be collected during historical sales processes. In some embodiments, the raw data is categorized by region and type. In some embodiments, regions are divided into North, South, East, West, etc.; natural gas types can include pipeline gas, CNG (Compressed Natural Gas), LNG (Liquefied Natural Gas), and pipeline-transported gas. By categorizing the raw data by region and type, the corresponding region and type of different data can be clearly identified, making the data analysis results more closely matched to the corresponding region and type, thereby improving the accuracy of the data analysis.
[0086] In some embodiments, the raw data may also include at least one of the following: geographic information, natural gas sales volume, market share, natural gas recoverable reserves, natural gas extraction ratio, natural gas supply ratio, natural gas injection volume, natural gas pipeline inventory, natural gas pipeline self-consumption, pipeline length, natural gas to pipeline capital expenditure ratio, and natural gas price data.
[0087] In some embodiments, raw data is extracted from a large database based on sales operations and represented in the form of concepts, rules, patterns, etc. Since the data sources are diverse, including manually imported Excel tables and automatically connected data, a separate interface needs to be designed for each data source to implement data import.
[0088] Step 120: Preprocess the original data to obtain preprocessed data.
[0089] In some embodiments, a natural gas sales data processing algorithm is used to preprocess the raw data. Because the data sources are diverse, some of the data in reality is dirty and cannot be directly analyzed. Data cleaning (i.e., preprocessing) is required to remove, correct, fill in, and normalize data that may contain numerous null values, incorrect character encoding methods, or duplicates, thereby obtaining preprocessed data.
[0090] In some embodiments, preprocessing the original data to obtain preprocessed data includes: normalizing the original data to obtain preprocessed data; wherein, for data X in the original data... i The preprocessing formula is:
[0091]
[0092] Among them, X i ′ is for X i The data obtained after preprocessing, μ is X i The average value of the data in the corresponding dimension.
[0093] In some embodiments, preprocessing may include processes such as data feature selection, data standardization, and data subset generation.
[0094] Step 130: Use data mining algorithms to analyze the preprocessed data and obtain the data analysis results.
[0095] In some embodiments, step 130 may also involve creating a model application for the data and forecasting future sales data for natural gas in order to find a more reasonable sales approach.
[0096] Step 140: Perform post-processing on the data analysis results to obtain post-processed analysis data, which provides data support for enterprise decision-making.
[0097] In summary, the technical solution provided in this application, by processing the raw data of natural gas and analyzing the data using data mining algorithms, can gain in-depth understanding of market trends, customer needs, competitor dynamics, and other information, providing data support for enterprise decision-making and helping to formulate more reasonable and effective decisions. This establishes an automated analysis system for natural gas sales data and improves the analysis effect of natural gas sales data.
[0098] In some possible implementations, the method may also include the following steps:
[0099] 1. In response to query operations related to natural gas sales data, retrieve the data corresponding to the query operation from the relevant data, including raw data and post-processed analytical data;
[0100] 2. Visualize the data corresponding to the query operation from the relevant data.
[0101] In some embodiments, the relevant data may also be post-processed analytical data.
[0102] In some embodiments, the post-processed analytical data can be presented to decision-makers through visualization or other means to assist in decision-making, thereby making the analytical results (including the post-processed analytical data) more intuitive and easier to understand.
[0103] In some embodiments, data post-processing includes querying and displaying raw data, sales analysis results, and daily specified forecasts based on charts. The data display can support joint queries based on date, sales type, region, and province, and can also be categorized by daily and monthly data.
[0104] In some embodiments, a query operation refers to a query operation based on a first filtering condition; visualizing the data in the relevant data that corresponds to the query operation includes: visualizing the data in the relevant data that meets the first filtering condition.
[0105] In some embodiments, the relevant data can be filtered using different criteria to select the data that the user wants to view. In some embodiments, the filtering criteria may include at least one of date, sales type, region, and province. In some embodiments, the relevant data may be categorized by daily or monthly data.
[0106] In some embodiments, the relevant data can be visualized using various statistical charts, tables, etc.
[0107] In the above implementation method, query operations and visualization displays make it easier for users to query and understand the data they need, thereby improving the convenience of users querying, understanding and analyzing data.
[0108] In some possible implementations, data mining algorithms are employed, including cluster analysis, which comprises the following steps:
[0109] 1. Based on the natural gas sales scenario, determine the number of clusters for the preprocessed data as k;
[0110] 2. Based on the cluster centers of the k clusters of the preprocessed data, calculate the membership degree of each data point in the preprocessed data to its respective cluster;
[0111] 3. Based on the membership degree of each data point in the preprocessed data to its respective cluster, adjust the cluster to which each data point in the preprocessed data belongs until the clustering is completed, and obtain the clustering analysis results. The clustering results include k clusters that have been clustered, and each cluster includes at least one data point.
[0112] In some embodiments, the formula for calculating membership degree is:
[0113]
[0114] Where d(x) i c k ) is the data point x i and cluster center c k The distance between them, σ is the standard deviation parameter.
[0115] In some embodiments, the distance between a data point and its corresponding cluster center is obtained by calculating the probability that the data point belongs to each Gaussian distribution.
[0116] In some embodiments, the probability of a data point belonging to a Gaussian distribution is calculated using the following formula:
[0117]
[0118] Where, x i For data points, c k For x i The cluster center of the cluster to which it belongs, where Σk is the weight of the corresponding parameter.
[0119] In some embodiments, the cluster center is calculated based on the weighted average of all data points in the corresponding cluster, as follows:
[0120]
[0121] Where N is the total number of data points in the cluster.
[0122] In some embodiments, after the cluster analysis is completed, the method further includes:
[0123] From k clusters, select the data in the first cluster as a subset of data;
[0124] Constructing a regression tree based on a subset of data;
[0125] A predictive model for natural gas sales data is generated based on regression trees.
[0126] In some embodiments, after completing cluster analysis, a subset of data from a single cluster is selected, and a regression tree is constructed using this subset. The generation method may include the following steps:
[0127] 1. After completing the cluster analysis, select the data from a single cluster to generate a subset of data, and use all the data as an initial node;
[0128] 2. Traverse all features and calculate the split index for each feature at each candidate split point;
[0129] 3. Recursively split to form leaf nodes;
[0130] 4. Adjust the tree depth and hyperparameters to optimize the performance of the sales data prediction model;
[0131] 5. For each cluster (i.e., the regression tree), calculate the MSE, RMSE, and MAE evaluation metrics, calculate the importance score of the features on each cluster, and compare them to improve the interpretability of the sales data prediction model.
[0132] In some embodiments, the sales data prediction model can be a neural network model, which predicts and analyzes the future sales methods and sales effects of natural gas after learning from the raw data.
[0133] After completing the cluster analysis, a subset of data from a single cluster is selected to generate a predictive regression tree. Based on historical data, the calculation method, path, number of leaf nodes, and weights of the regression tree are adjusted. The sales data prediction model based on the regression tree is as follows:
[0134]
[0135] Where M is the number of leaf nodes. It is the predicted value of the m-th leaf node, I(x∈R) m ) is an indicator function.
[0136] In some embodiments, when constructing a regression tree, the error calculation is adjusted according to the weights of the data points. By weighting, the sales price, natural gas production, and natural gas sales volume data have a greater impact on the model. Therefore, the mean squared error is calculated using the following formula:
[0137]
[0138] Where N is the number of data points, ω i It is the weight of the i-th data point, y i It is the true value of the i-th data point; It is the predicted value of the i-th data point.
[0139] The technical solutions provided in this application can achieve the following effects.
[0140] Analyzing natural gas sales data provides insights into market trends, customer needs, and competitor activities. This facilitates more informed market decisions and the development of more effective market strategies. Natural gas sales data analysis helps businesses understand customer buying behavior, preferences, and habits, enabling them to better meet customer needs, provide personalized services, and improve customer satisfaction.
[0141] Resource optimization: By mining sales data, it's possible to better plan and optimize resource utilization, including production, supply chain, and human resources. This helps improve efficiency, reduce costs, and enhance the company's competitiveness.
[0142] Marketing strategy development: Based on the analysis of sales data, marketing strategies can be developed in a more targeted manner.
[0143] Risk management: Analyzing sales data also helps companies identify potential risks and problems in a timely manner.
[0144] Decision Support: Natural gas sales data analysis systems can provide data support for corporate decision-making. Insights extracted from the data help management make more informed and strategic decisions.
[0145] Efficiency Improvement: By automating data collection and analysis, businesses can improve work efficiency. Employees no longer need to spend a lot of time manually organizing and analyzing data, allowing them to focus more on strategic work.
[0146] Forecasting and Planning: By using historical sales data for trend analysis and forecasting, companies can better plan future business development and adjust production and supply chain plans.
[0147] The effectiveness of a natural gas sales data analysis system lies in helping companies gain a more comprehensive and in-depth understanding of their business situation, thereby better responding to market changes and enhancing their competitiveness.
[0148] The following are embodiments of the apparatus described in this application, which can be used to execute the embodiments of the method described in this application. For details not disclosed in the apparatus embodiments of this application, please refer to the embodiments of the method described in this application.
[0149] Please refer to Figure 2 This diagram illustrates a block diagram of a data analysis apparatus according to an embodiment of this application. The apparatus has the functionality to implement the data analysis method example described above; this functionality can be implemented in hardware or by hardware executing corresponding software. The apparatus can be the computer device described above, or it can be mounted on a computer device. The apparatus 200 may include:
[0150] Data acquisition module 210 is used to acquire raw data related to natural gas sales;
[0151] Preprocessing module 220 is used to preprocess the original data to obtain preprocessed data;
[0152] Data analysis module 230 is used to analyze the preprocessed data using data mining algorithms to obtain data analysis results;
[0153] The post-processing module 240 is used to perform data post-processing on the data analysis results to obtain post-processed analysis data, which provides data support for enterprise decision-making.
[0154] In some embodiments, the raw data is categorized by region and type.
[0155] In some embodiments, the preprocessing module is configured to:
[0156] The original data is normalized to obtain the preprocessed data;
[0157] Among them, for data X in the original data i The preprocessing formula is:
[0158]
[0159] Among them, X i ′ is for X i The data obtained after the preprocessing is μ, where X is... i The average value of the data in the corresponding dimension.
[0160] In some embodiments, the data analysis module includes:
[0161] The quantity determination submodule is used to determine the number of clusters of the preprocessed data as k based on the natural gas sales scenario;
[0162] The membership calculation submodule is used to calculate the membership degree of each data point in the preprocessed data to its respective cluster based on the cluster centers of the k clusters of the preprocessed data.
[0163] The clustering submodule is used to adjust the cluster to which each data point of the preprocessed data belongs based on the membership degree of each data point to its respective cluster, until the clustering is completed, and obtain the clustering analysis result. The clustering result includes k clusters that have been clustered, and each cluster includes at least one data point.
[0164] In some embodiments, the membership degree is calculated using the following formula:
[0165]
[0166] Where d(x) i c k ) is the data point x i and cluster center c k The distance between them, σ is the standard deviation parameter.
[0167] In some embodiments, the distance between the data point and the corresponding cluster center is obtained by calculating the probability that the data point belongs to each Gaussian distribution.
[0168] In some embodiments, the probability of a data point belonging to a Gaussian distribution is calculated using the following formula:
[0169]
[0170] Where, x i For data points, c k For x i The cluster center of the cluster to which it belongs, where Σk is the weight of the corresponding parameter.
[0171] In some embodiments, the apparatus further includes:
[0172] The data selection module is used to select data from the first cluster as a data subset from the k clusters;
[0173] A regression tree construction module is used to construct a regression tree based on the data subset;
[0174] The prediction module is used to generate a prediction model for the sales data of the natural gas based on the regression tree.
[0175] In some embodiments, the device further includes a visualization module 250.
[0176] The data acquisition module is also used to respond to a query operation on natural gas sales-related data, and to acquire data corresponding to the query operation from the related data, including the raw data and the post-processed analysis data.
[0177] The visualization module 250 is used to visualize the data in the relevant data that corresponds to the query operation.
[0178] In some embodiments, the query operation refers to a query operation based on the first filter condition; the visualization module 250 is used for:
[0179] The data that meets the first filtering criteria from the relevant data will be visualized.
[0180] In summary, the technical solution provided in this application, by processing the raw data of natural gas and analyzing the data using data mining algorithms, establishes an automated analysis system for natural gas sales data. This system can provide in-depth understanding of market trends, customer needs, competitor dynamics, and other information, offering data support for enterprise decision-making, helping to formulate more reasonable and effective decisions, and improving the analysis effect of natural gas sales data.
[0181] It should be noted that the apparatus provided in the above embodiments is only illustrated by the division of the above functional modules when implementing its functions. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be repeated here.
[0182] Please refer to Figure 3 This diagram illustrates a structural block diagram of a computer device according to an embodiment of this application. The computer device is used to implement the data analysis method provided in the above embodiments. Specifically:
[0183] The computer device 300 includes a CPU (Central Processing Unit) 301, a system memory 304 including RAM (Random Access Memory) 302 and ROM (Read-Only Memory) 303, and a system bus 305 connecting the system memory 304 and the central processing unit 301. The computer device 300 also includes a basic I / O (Input / Output) system 306 that facilitates information transfer between various components within the computer, and a mass storage device 307 for storing the operating system 313, application programs 314, and other program modules 315.
[0184] The basic input / output system 306 includes a display 308 for displaying information and an input device 309 for user input, such as a mouse or keyboard. Both the display 308 and the input device 309 are connected to the central processing unit 301 via an input / output controller 310 connected to the system bus 305. The basic input / output system 306 may also include the input / output controller 310 for receiving and processing input from multiple other devices such as a keyboard, mouse, or electronic stylus. Similarly, the input / output controller 310 also provides output to a display screen, printer, or other types of output devices.
[0185] The mass storage device 307 is connected to the central processing unit 301 via a mass storage controller (not shown) connected to the system bus 305. The mass storage device 307 and its associated computer-readable media provide non-volatile storage for the computer device 300. That is, the mass storage device 307 may include computer-readable media (not shown) such as a hard disk or a CD-ROM (Compact Disc Read-Only Memory) drive.
[0186] Without loss of generality, the computer-readable medium may include computer storage media and communication media. Computer storage media include volatile and non-volatile, removable and non-removable media implemented using any method or technology for storing information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media include RAM, ROM, EPROM (Erasable Programmable Read Only Memory), EEPROM (Electrically Erasable Programmable Read Only Memory), flash memory or other solid-state storage, CD-ROM, DVD (Digital Video Disc) or other optical storage, magnetic tape cassettes, magnetic tape, disk storage, or other magnetic storage devices. Of course, those skilled in the art will recognize that the computer storage media are not limited to the above-mentioned types. The system memory 304 and mass storage device 307 described above can be collectively referred to as memory.
[0187] According to various embodiments of this application, the computer device 300 can also be connected to a remote computer on a network, such as the Internet. That is, the computer device 300 can be connected to a network 312 via a network interface unit 311 connected to the system bus 305, or the network interface unit 311 can be used to connect to other types of networks or remote computer systems (not shown).
[0188] In an exemplary embodiment, a computer-readable storage medium is also provided, wherein a computer program is stored therein, which, when executed by a processor, implements the above-described data analysis method.
[0189] In an exemplary embodiment, a computer program product is also provided, which is loaded and executed by a processor to implement the above-described data analysis method.
[0190] It should be understood that "multiple" as used in this article refers to two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A alone, A and B simultaneously, or B alone. The character " / " generally indicates that the preceding and following related objects have an "or" relationship.
[0191] The above description is merely an exemplary embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.
Claims
1. A data analysis method, characterized in that, The method includes: Obtain raw data related to natural gas sales; The original data is preprocessed to obtain preprocessed data; Data mining algorithms are used to analyze the preprocessed data to obtain data analysis results; The data analysis results are post-processed to obtain post-processed analysis data, which provides data support for enterprise decision-making.
2. The method according to claim 1, characterized in that, The raw data is categorized by region and type.
3. The method according to claim 1, characterized in that, The data mining algorithm employed includes cluster analysis, which includes: Based on the natural gas sales scenario, the number of clusters for the preprocessed data is determined to be k; Based on the cluster centers of the k clusters of the preprocessed data, calculate the membership degree of each data point in the preprocessed data to its respective cluster; Based on the membership degree of each data point in the preprocessed data to its respective cluster, the clusters to which each data point in the preprocessed data belongs are adjusted until the clustering is completed, and the clustering analysis results are obtained. The clustering results include k clusters that have been clustered, and each cluster includes at least one data point.
4. The method according to claim 3, characterized in that, After the cluster analysis is completed, the method further includes: From the k clusters, select the data in the first cluster as a subset of data; Construct a regression tree based on the aforementioned subset of data; Based on the regression tree, a predictive model for the sales data of the natural gas is generated.
5. The method according to claim 1, characterized in that, The method further includes: In response to a query operation on natural gas sales-related data, data corresponding to the query operation is obtained from the relevant data, which includes the raw data and the post-processed analytical data. The data corresponding to the query operation in the relevant data will be visualized.
6. The method according to claim 5, characterized in that, The query operation refers to the query operation based on the first filter condition; The step of visualizing the data corresponding to the query operation from the relevant data includes: The data that meets the first filtering criteria from the relevant data will be visualized.
7. A data analysis device, characterized in that, The device includes: The data acquisition module is used to acquire raw data related to natural gas sales; The preprocessing module is used to preprocess the original data to obtain preprocessed data; The data analysis module is used to analyze the preprocessed data using data mining algorithms to obtain data analysis results; The post-processing module is used to perform data post-processing on the data analysis results to obtain post-processed analysis data, which provides data support for enterprise decision-making.
8. A computer device, characterized in that, The computer device includes a processor and a memory, the memory storing a computer program, which is loaded and executed by the processor to implement the data analysis method according to any one of claims 1 to 6.
9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, which is loaded and executed by a processor to implement the data analysis method according to any one of claims 1 to 6.
10. A computer program product, characterized in that, The computer program product is loaded and executed by a processor to implement the data analysis method according to any one of claims 1 to 6.