Data expansion method, device and equipment of oil sample and storage medium

By generating a spectral feature space and filtering random sample points with a projection distance less than a threshold, the oil sample data is expanded, solving the accuracy and cost problems of complex oil product property prediction models and achieving efficient oil property prediction.

CN122241205APending Publication Date: 2026-06-19CHINA PETROLEUM & CHEMICAL CORP +2

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA PETROLEUM & CHEMICAL CORP
Filing Date
2024-12-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In the petroleum refining process, complex petroleum products have many chemical property parameters and require a large amount of calculation, resulting in poor calculation accuracy of petroleum property prediction models, high costs, and limited sample sizes, making it difficult to meet analytical needs.

Method used

By acquiring the near-infrared spectral information of the initial oil samples, a spectral feature space is generated, random sample points are randomly generated, and random sample points with projection distances less than a threshold are selected and retained to form the final sample set, thus expanding the oil sample data.

Benefits of technology

It reduces the cost of analysis and calculation for complex oil products and improves the accuracy of oil property prediction models.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241205A_ABST
    Figure CN122241205A_ABST
Patent Text Reader

Abstract

This invention relates to the field of oil product testing and discloses a method, apparatus, device, and storage medium for expanding oil product sample data. The method includes: acquiring near-infrared spectral information of multiple initial oil product samples and generating a corresponding spectral feature space, wherein an initial sample set comprising initial sample points of multiple initial oil product samples is formed within the spectral feature space; generating multiple random sample points based on the distribution area of ​​the initial sample set within the spectral feature space; calculating the projection distance between each random sample point and its projection point on the convex hull region of the initial sample set, and filtering the random sample points based on the projection distance, retaining random sample points whose projection distance is less than a distance threshold; inputting the random oil product samples corresponding to the retained random sample points into the initial sample set, forming a final sample set comprising multiple initial oil product samples and multiple random oil product samples. The method provided by this invention can rapidly expand the number of oil product samples at low cost.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of oil product testing, and in particular to a method, apparatus, equipment, and storage medium for data expansion of oil product samples. Background Technology

[0002] In the petroleum refining process, the chemical properties of the refined oil products are usually simulated and analyzed by establishing oil property prediction models, thereby enabling the prediction of the properties of the oil products.

[0003] However, in actual petroleum refining processes, some petroleum products have as many as a dozen or even dozens of chemical property parameters, such as the octane number of gasoline and the cetane number of diesel. Therefore, when simulating and analyzing these petroleum products with complex chemical properties and large computational load, not only are the manpower and material costs very high, but the analysis and calculation time is also very long, and the number of samples obtained is also limited, which makes it difficult to meet the needs of establishing petroleum property prediction models. Consequently, the calculation accuracy of these petroleum product property prediction models is very poor. Summary of the Invention

[0004] This invention provides a data expansion method, apparatus, device, and storage medium for oil samples to solve the above-mentioned problems.

[0005] In a first aspect, embodiments of the present invention provide a data expansion method for oil samples, comprising: acquiring near-infrared spectral information of multiple initial oil samples; generating a corresponding spectral feature space based on the near-infrared spectral information, wherein an initial sample set including initial sample points of multiple initial oil samples is formed within the spectral feature space; generating multiple random sample points based on the distribution area of ​​the initial sample set within the spectral feature space; calculating the projection distance between each random sample point and its corresponding projection point on the convex hull region of the initial sample set; filtering the random sample points based on the projection distance between each random sample point and its corresponding projection point, retaining random sample points whose projection distance is less than a distance threshold; and inputting the random oil samples corresponding to the retained random sample points into the initial sample set to form a final sample set including multiple initial oil samples and multiple random oil samples.

[0006] The data expansion method for oil samples provided in this invention can generate a corresponding spectral feature space based on the near-infrared spectral information of multiple initial oil samples to obtain an initial sample set including multiple initial sample points. By randomly generating multiple random sample points in the spectral feature space and retaining random sample points whose projection distance is less than a distance threshold, random oil samples corresponding to the retained random sample points are obtained. The random oil samples are then input into the initial sample set to obtain the final sample set, thereby realizing the data expansion of oil samples. This method greatly reduces the analysis and calculation cost of complex oil products while improving the calculation accuracy of the oil property prediction model when simulating and analyzing oil samples.

[0007] Optionally, the step of obtaining near-infrared spectral information of multiple initial oil samples includes: performing near-infrared spectral detection on each initial oil sample to generate a near-infrared spectrum for each initial oil sample; and extracting near-infrared spectral information from the near-infrared spectrum based on principal component analysis.

[0008] Optionally, after generating the corresponding spectral feature space based on near-infrared spectral information, the method further includes: performing clustering processing on the initial sample set to obtain multiple subsets of initial oil samples, each subset including all initial sample points of one initial oil sample; and screening sparse subsets of initial sample points in all subsets where the number of initial sample points is less than a threshold.

[0009] Optionally, the step of generating multiple random sample points based on the distribution region of the initial sample set in the spectral feature space includes: obtaining the distribution region of the sparse sub-sample set in the spectral feature space based on the distribution region of the initial sample set in the spectral feature space; obtaining the maximum and minimum values ​​of the sparse sub-sample set on the principal component axes of each dimension of the spectral feature space based on the distribution region of the sparse sub-sample set and the number of dimensions of the spectral feature space; obtaining a random generation region for randomly generating random sample points based on the maximum and minimum values; and randomly generating multiple random sample points within the range of the random generation region.

[0010] Optionally, the step of randomly generating multiple random sample points within the random generation region includes: Obtain the first set of judgment formulas. The first set of judgment formulas is: ; ; in, For sparse subsets, For one of the random sample points in the sparse subset. The values ​​on the principal component axis j are represented by the number of values ​​for i, where j is the principal component axis j for each dimension, f is the number of dimensions in the spectral feature space, and n is the number of random sample points. The maximum value on the principal component axis j is the minimum value on the principal component axis j, and w is a random number randomly generated between 0 and 1; when i=0 and i≤n and / or when j=0 and j≤f, random sample points are randomly generated within the range of the random generation region, otherwise the generation of random sample points is stopped.

[0011] Optionally, the projection distance between each random sample point and its projection point on the convex hull region of the initial sample set is calculated based on the following second set of judgment formulas:

[0012] ; ; ; ; in, Let η be the convex hull region of the sparse subset, and let η be any point within the convex hull region of the sparse subset. For the convex combination points in the corresponding convex hull region, For the i-th random sample point The squared distance to the convex hull region of the sparse subset, where α is the convex combination weight vector corresponding to the convex combination point. Let be the projection point. Let be the projected distance between the random sample point and the projected point, and m be the number of random sample points in the sparse subset. To be with random sample points The corresponding optimal convex combination weights.

[0013] Optionally, the step of inputting random oil samples corresponding to the retained random sample points into the initial sample set to form a final sample set including multiple initial oil samples and multiple random oil samples includes: obtaining the convex combination points corresponding to the retained random sample points and the property parameters of the corresponding initial oil samples; and obtaining the chemical property information of the initial oil samples corresponding to the retained random sample points based on the following formula: ; in, to The property parameters are defined as follows: α is the weight of the convex combination corresponding to the convex combination point, k is the number of convex combination points, and the property parameters include the octane number, olefin content, aromatic content, and distillation range of the initial oil sample. Based on the chemical property information, random oil samples corresponding to the retained random sample points are generated and input into the initial sample set to form a final sample set including multiple initial oil samples and multiple random oil samples.

[0014] In a second aspect, embodiments of the present invention provide a data expansion device for oil samples, comprising: an acquisition module for acquiring near-infrared spectral information of multiple initial oil samples; a first generation module for generating a corresponding spectral feature space based on the near-infrared spectral information, wherein an initial sample set including initial sample points of multiple initial oil samples is formed within the spectral feature space; a second generation module for generating multiple random sample points based on the distribution area of ​​the initial sample set within the spectral feature space; a calculation module for calculating the projection distance between each random sample point and a projection point of the random sample point on the convex hull region of the initial sample set; a filtering module for filtering the random sample points based on the projection distance between each random sample point and its corresponding projection point, retaining random sample points whose projection distance is less than a distance threshold; and an expansion module for inputting random oil samples corresponding to the retained random sample points into the initial sample set to form a final sample set including multiple initial oil samples and multiple random oil samples.

[0015] The data expansion device for oil samples provided in this invention can generate a corresponding spectral feature space based on the near-infrared spectral information of multiple initial oil samples to obtain an initial sample set including multiple initial sample points. By randomly generating multiple random sample points in the spectral feature space and retaining random sample points whose projection distance is less than a distance threshold, random oil samples corresponding to the retained random sample points are obtained. The random oil samples are then input into the initial sample set to obtain the final sample set, thereby realizing the data expansion of oil samples. This significantly reduces the analysis and calculation cost of complex oil products while improving the calculation accuracy of the oil property prediction model when simulating and analyzing oil samples.

[0016] Thirdly, embodiments of the present invention provide a data expansion device for oil samples, comprising: a processor and a memory, wherein the memory stores instructions; the processor invokes the instructions in the memory to cause the processor to execute the data expansion method for oil samples according to any of the foregoing embodiments of the first aspect of the present invention.

[0017] The processor of the data expansion device for oil samples provided in this embodiment of the invention executes the data expansion method for oil samples according to any of the foregoing embodiments of the first aspect of the invention by calling instructions in the memory. This enables the data expansion device to generate a corresponding spectral feature space based on the near-infrared spectral information of multiple initial oil samples, thereby obtaining an initial sample set including multiple initial sample points. By randomly generating multiple random sample points in the spectral feature space and retaining random sample points whose projection distance is less than a distance threshold, the device obtains random oil samples corresponding to the retained random sample points. The random oil samples are then input into the initial sample set to obtain the final sample set, thus realizing the data expansion of oil samples. This significantly reduces the analysis and calculation costs of complex oil products while improving the calculation accuracy of the oil property prediction model when simulating and analyzing oil samples.

[0018] Fourthly, embodiments of the present invention provide a computer-readable storage medium storing instructions that, when executed by a processor, implement the data expansion method for oil samples according to any of the foregoing embodiments of the first aspect of the present invention.

[0019] The instructions stored in the computer-readable storage medium provided in this embodiment of the invention can be called by a processor and executed by the data expansion method for oil samples according to any of the foregoing embodiments of the first aspect of the invention. When the instructions stored in the storage medium are called by the processor, a corresponding spectral feature space can be generated based on the near-infrared spectral information of multiple initial oil samples to obtain an initial sample set including multiple initial sample points. By randomly generating multiple random sample points in the spectral feature space and retaining random sample points whose projection distance is less than a distance threshold, random oil samples corresponding to the retained random sample points are obtained. The random oil samples are then input into the initial sample set to obtain the final sample set, thereby realizing the data expansion of oil samples. This greatly reduces the analysis and calculation cost of complex oil products while improving the calculation accuracy of the oil property prediction model when simulating and analyzing oil samples. Attached Figure Description

[0020] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the structures shown in these drawings without creative effort.

[0021] Figure 1 A flowchart illustrating one embodiment of the data expansion method for oil samples according to the present invention; Figure 2This is a flowchart of step S110 in one embodiment of the data expansion method for oil samples of the present invention; Figure 3 This is a flowchart of step S120 in one embodiment of the data expansion method for oil samples of the present invention; Figure 4 This is a flowchart of step S130 in one embodiment of the data expansion method for oil samples of the present invention; Figure 5 This is a schematic diagram of the distribution of randomly generated random sample points in a sparse subsample set, which is an embodiment of the data expansion method for oil samples of the present invention. Figure 6 This is a schematic diagram comparing the octane number prediction results of oil samples before and after the random generation of multiple random sample points, according to an embodiment of the data expansion method for oil samples of the present invention. Figure 7 This is a flowchart of step S134 in one embodiment of the data expansion method for oil samples of the present invention; Figure 8 This is a flowchart of step S160 in one embodiment of the data expansion method for oil samples of the present invention; Figure 9 This is a structural block diagram of one embodiment of the data expansion device for oil samples of the present invention; Figure 10 This is a structural block diagram of one embodiment of the data expansion device for oil samples of the present invention.

[0022] Explanation of icon numbers: 201 - Acquisition Module; 202 - First Generation Module; 203 - Second Generation Module; 204 - Calculation Module; 205 - Filtering Module; 206 - Extension Module; 301 - Processor; 302 - Memory; 303 - Communication interface; 304 - Bus. Detailed Implementation

[0023] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present invention.

[0024] It should be noted that all directional indications in the embodiments of the present invention, such as up, down, left, right, front, back, etc., are only used to explain the relative positional relationship and movement of the components in a specific posture as shown in the attached figure. If the specific posture changes, the directional indication will also change accordingly.

[0025] Furthermore, the use of terms such as "first" and "second" in this invention is for descriptive purposes only and should not be construed as indicating or implying their relative importance or implicitly specifying the number of technical features indicated. Therefore, a feature defined with "first" or "second" may explicitly or implicitly include at least one of that feature. Additionally, the technical solutions of the various embodiments can be combined with each other, but only on the basis of being achievable by those skilled in the art. When the combination of technical solutions is contradictory or impossible to implement, such a combination of technical solutions should be considered non-existent and not within the scope of protection claimed by this invention.

[0026] For ease of understanding, the data augmentation method for oil samples according to embodiments of the present invention will be described below, such as... Figure 1 As shown, the data expansion method for oil samples in this embodiment of the invention includes steps S110 to S160.

[0027] In step S110, near-infrared spectral information of multiple initial oil samples is obtained.

[0028] like Figure 2 As shown, in some optional embodiments, step S110 includes steps S111 to S112.

[0029] In step S111, near-infrared spectroscopy is performed on each initial oil sample to generate a near-infrared spectrum for each initial oil sample.

[0030] In step S112, near-infrared spectral information is extracted from the near-infrared spectrum based on principal component analysis.

[0031] In this embodiment, the near-infrared spectrum of each initial oil sample is processed by principal component analysis. The data of all near-infrared spectra are standardized to form a covariance matrix. By calculating the eigenvalues ​​and eigenvectors of the covariance matrix, the near-infrared spectral information in the near-infrared spectrum is extracted for subsequent construction of the spectral feature space of the near-infrared spectrum.

[0032] In step S120, a corresponding spectral feature space is generated based on near-infrared spectral information. Within this spectral feature space, an initial sample set is formed, comprising initial sample points from multiple initial oil samples.

[0033] like Figure 3 As shown, in some optional embodiments, after step S120, the data expansion method for oil samples of the present invention further includes steps S121 to S122.

[0034] In step S121, the initial sample set is clustered to obtain multiple initial oil sample subsets, each subset including all initial sample points of one initial oil sample.

[0035] In step S122, a sparse subsample set with a number of initial sample points lower than a number threshold is selected from all subsample sets.

[0036] In this embodiment, the initial sample set is a set of initial sample points including all initial oil samples. Based on the extracted near-infrared spectral information, namely the aforementioned feature values ​​and feature vectors, a spectral information matrix of the near-infrared spectrum is constructed to realize the construction of the spectral feature space. The projection point of the initial sample point corresponding to each oil sample in the spectral feature space is the coordinate of the initial sample point.

[0037] By clustering the initial sample set, similar initial sample points are clustered to divide all initial sample points. Subsets with fewer initial sample points (i.e., those with fewer than a certain threshold) are identified as sparse subsets. These sparse subsets are the ones that require data augmentation for oil samples. The threshold can be 20, 32, 35, 56, 60, or any other integer. For example, a subset with fewer than 30 initial sample points can be identified as a sparse subset for subsequent data augmentation.

[0038] In step S130, multiple random sample points are generated based on the distribution area of ​​the initial sample set in the spectral feature space.

[0039] like Figure 4 and Figure 5 As shown, in some optional embodiments, step S130 includes steps S131 to S134.

[0040] In step S131, the distribution region of the sparse sub-sample set in the spectral feature space is obtained based on the distribution region of the initial sample set in the spectral feature space.

[0041] In step S132, based on the distribution region of the sparse subsample set and the number of dimensions of the spectral feature space, the maximum and minimum values ​​of the sparse subsample set on the principal component axis of each dimension of the spectral feature space are obtained.

[0042] In step S133, a random generation region is obtained based on the maximum and minimum values ​​to randomly generate random sample points.

[0043] In step S134, multiple random sample points are randomly generated within the range of the random generation area.

[0044] In this embodiment, by determining the distribution region of the sparse subsample set in the spectral feature space and the number of dimensions of the spectral feature space, the maximum and minimum values ​​of the sparse subsample set on each principal component axis in the spectral feature space are calculated, thereby determining the generation range of random sample points, i.e., the random generation region of the aforementioned random sample points.

[0045] When calculating the random generation region of random sample points, the random generation region of random sample points can be appropriately expanded according to the maximum and minimum values ​​to compensate for the data sparsity of oil samples, avoid overfitting, and improve the flexibility and accuracy of the subsequent oil property prediction model.

[0046] By introducing a range to expand the threshold γ, based on the following combination: ; ; in, The maximum value on the principal component axis j It is the minimum value on the principal component axis j, 0≤γ≤1.

[0047] By calculating the maximum and minimum values ​​on each principal component axis after expanding the range, a cubic range region is formed in the spectral feature space, and random sample points will be randomly generated within this cubic range region.

[0048] The specific process for generating multiple random sample points is as follows: like Figure 7 As shown, specifically, step S134 includes steps S1341 to S1342.

[0049] In step S1341, the first judgment formula group is obtained, and the first judgment formula group is: ; ; in, For sparse subsets, For one of the random sample points in the sparse subset. The values ​​on the principal component axis j are represented by the number of values ​​for i, where j is the principal component axis j for each dimension, f is the number of dimensions in the spectral feature space, and n is the number of random sample points. The maximum value on the principal component axis j is the minimum value on the principal component axis j, and w is a random number generated between 0 and 1.

[0050] In step S1342, when i=0 and i≤n and / or when j=0 and j≤f, random sample points are randomly generated within the range of the random generation region; otherwise, the generation of random sample points is stopped.

[0051] In this embodiment, when generating random sample points, parameters i and j perform an increment operation, i.e., i = i + 1 and j = j + 1. When generating corresponding values ​​on each principal component axis j in the spectral feature space, the increment operation allows parameter j to traverse each principal component axis sequentially until a random number is generated on each principal component axis. Finally, the coordinates of a random sample point in the spectral feature space are completely generated, resulting in a new random sample point.

[0052] In step S140, the projection distance between each random sample point and the projection point of the random sample point on the convex hull region of the initial sample set is calculated.

[0053] In some optional embodiments, the projection distance between each random sample point and its projection point on the convex hull region of the initial sample set is calculated based on the following second set of judgment formulas:

[0054] ; ; ; ; in, Let η be the convex hull region of the sparse subset, and let η be any point within the convex hull region of the sparse subset. For the convex combination points in the corresponding convex hull region, For the i-th random sample point The squared distance to the convex hull region of the sparse subset, where α is the convex combination weight vector corresponding to the convex combination point. Let be the projection point. Let be the projected distance between the random sample point and the projected point, and m be the number of random sample points in the sparse subset. For one of the convex combination points The corresponding convex combination weights, For the point of combination with convexity The corresponding convex combination weights, To be with random sample points The corresponding optimal convex combination weights.

[0055] , and Similarly, for the optimal convex combination weight of the corresponding random sample points, min is to minimize the corresponding formula or parameter.

[0056] Specifically, The optimal solution for the convex combination weight vector α is the optimal convex combination weight vector. The formula corresponding to st (subject to) in the second judgment formula group satisfies the constraint condition of minimizing the projection distance.

[0057] In this embodiment, using the cube range region obtained above, the projection distance between each random sample point and the projection point on the convex hull region is calculated according to the second judgment formula group, so as to select suitable valid random sample points from all random sample points.

[0058] In step S150, the random sample points are filtered based on the projection distance between each random sample point and the corresponding projection point, and random sample points whose projection distance is less than the distance threshold are retained.

[0059] In this embodiment, by comparing the magnitudes of the projected distances, random sample points with properties similar to those corresponding to the initial oil sample are selected from all random sample points and retained. The remaining random sample points are discarded.

[0060] The aforementioned effective random sample points are random sample points similar to the sample points of the initial oil sample, in order to ensure the correlation between the newly added random sample points and the initial oil sample. The sample set after adding sample points can more accurately reflect the properties and characteristics of the oil.

[0061] In step S160, the random oil samples corresponding to the reserved random sample points are input into the initial sample set to form a final sample set including multiple initial oil samples and multiple random oil samples.

[0062] like Figure 8 As shown, in some optional embodiments, step S160 includes steps S161 to S164.

[0063] In step S161, the convex combination points corresponding to the retained random sample points and the property parameters of the corresponding initial oil samples are obtained.

[0064] In step S162, the chemical property information of the initial oil sample corresponding to the retained random sample points is obtained based on the following formula: ; in, to The property parameters are the property parameters corresponding to the convex combination points, where α is the weight of the convex combination corresponding to the convex combination point, k is the number of convex combination points, and the property parameters include the octane number, olefin content, aromatic content, and distillation range of the initial oil sample.

[0065] In step S163, based on chemical property information, random oil samples corresponding to the retained random sample points are generated, and the random oil samples are input into the initial sample set to form a final sample set including multiple initial oil samples and multiple random oil samples.

[0066] In this embodiment, the above formula is used to calculate the chemical property information of the oil samples corresponding to the added random oil samples, so as to form random oil samples corresponding to random sample points, and finally form a final sample set including more oil samples, which can be used to build a more accurate oil property prediction model in the future.

[0067] Specifically, this embodiment performs data expansion on desulfurized gasoline samples. The results of data expansion on desulfurized gasoline samples using the data expansion method of this embodiment are shown in Table 1.

[0068] Table 1 compares the prediction deviations between the oil property prediction model built on the final sample set and the oil property prediction model built on the initial sample set. Wherein, RON is the research octane number (RON) of the oil sample, and MON is the motor octane number (MON) of the oil sample.

[0069]

[0070] Table 1 From Table 1 and Figure 6 Therefore, by using the data expansion method for oil samples provided in this embodiment of the invention to expand the initial sample set, a final sample set with more oil samples is formed. Then, an oil property prediction model based on the final sample set is constructed. The resulting oil property prediction model has a smaller prediction deviation and more accurate prediction results for oil properties, thus improving the calculation accuracy of the oil property prediction model.

[0071] The data expansion method for oil samples provided in this embodiment of the invention includes: acquiring near-infrared spectral information of multiple initial oil samples; generating a corresponding spectral feature space based on the near-infrared spectral information, wherein an initial sample set is formed within the spectral feature space, comprising initial sample points of multiple initial oil samples; generating multiple random sample points based on the distribution area of ​​the initial sample set within the spectral feature space; calculating the projection distance between each random sample point and its corresponding projection point on the convex hull region of the initial sample set; filtering the random sample points based on the projection distance between each random sample point and its corresponding projection point, retaining random sample points whose projection distance is less than a distance threshold; and inputting the random oil samples corresponding to the retained random sample points into the initial sample set to form a final sample set comprising multiple initial oil samples and multiple random oil samples.

[0072] The data expansion method for oil samples provided in this invention can generate a corresponding spectral feature space based on the near-infrared spectral information of multiple initial oil samples to obtain an initial sample set including multiple initial sample points. By randomly generating multiple random sample points in the spectral feature space and retaining random sample points whose projection distance is less than a distance threshold, random oil samples corresponding to the retained random sample points are obtained. The random oil samples are then input into the initial sample set to obtain the final sample set, thereby realizing the data expansion of oil samples. This method greatly reduces the analysis and calculation cost of complex oil products while improving the calculation accuracy of the oil property prediction model when simulating and analyzing oil samples.

[0073] In addition to the above method embodiments, the present invention also provides, for example, Figure 9 The data expansion device for an oil sample shown includes: an acquisition module 201, a first generation module 202, a second generation module 203, a calculation module 204, a filtering module 205, and an expansion module 206.

[0074] The acquisition module 201 acquires near-infrared spectral information of multiple initial oil samples. The first generation module 202 generates a corresponding spectral feature space based on the near-infrared spectral information, forming an initial sample set containing initial sample points of the multiple initial oil samples within the spectral feature space. The second generation module 203 generates multiple random sample points based on the distribution area of ​​the initial sample set within the spectral feature space. The calculation module 204 calculates the projection distance between each random sample point and its corresponding projection point on the convex hull region of the initial sample set. The filtering module 205 filters the random sample points based on their projection distances, retaining those with projection distances less than a distance threshold. The expansion module 206 inputs the random oil samples corresponding to the retained random sample points into the initial sample set, forming a final sample set including multiple initial oil samples and multiple random oil samples.

[0075] The data expansion method for oil samples provided in this embodiment of the invention includes: acquiring near-infrared spectral information of multiple initial oil samples; generating a corresponding spectral feature space based on the near-infrared spectral information, wherein an initial sample set is formed within the spectral feature space, comprising initial sample points of multiple initial oil samples; generating multiple random sample points based on the distribution area of ​​the initial sample set within the spectral feature space; calculating the projection distance between each random sample point and its corresponding projection point on the convex hull region of the initial sample set; filtering the random sample points based on the projection distance between each random sample point and its corresponding projection point, retaining random sample points whose projection distance is less than a distance threshold; and inputting the random oil samples corresponding to the retained random sample points into the initial sample set to form a final sample set comprising multiple initial oil samples and multiple random oil samples.

[0076] The data expansion device for oil samples provided in this embodiment of the invention, by implementing the above method, can generate a corresponding spectral feature space based on the near-infrared spectral information of multiple initial oil samples, thereby obtaining an initial sample set including multiple initial sample points. By randomly generating multiple random sample points in the spectral feature space and retaining random sample points whose projection distance is less than a distance threshold, random oil samples corresponding to the retained random sample points are obtained. The random oil samples are then input into the initial sample set to obtain the final sample set, thus realizing the data expansion of oil samples. This significantly reduces the analysis and calculation cost of complex oil products while improving the calculation accuracy of the oil property prediction model when simulating and analyzing oil samples.

[0077] In addition to the above method embodiments, the present invention also provides, for example, Figure 10The illustrated data expansion device for an oil sample includes a processor 301 and a memory 302, wherein the memory 302 stores instructions; the processor 301 calls the instructions in the memory 302 to cause the processor 301 to execute the data expansion method for an oil sample according to any of the foregoing embodiments of the present invention.

[0078] The data expansion method for oil samples provided in this embodiment of the invention includes: acquiring near-infrared spectral information of multiple initial oil samples; generating a corresponding spectral feature space based on the near-infrared spectral information, wherein an initial sample set is formed within the spectral feature space, comprising initial sample points of multiple initial oil samples; generating multiple random sample points based on the distribution area of ​​the initial sample set within the spectral feature space; calculating the projection distance between each random sample point and its corresponding projection point on the convex hull region of the initial sample set; filtering the random sample points based on the projection distance between each random sample point and its corresponding projection point, retaining random sample points whose projection distance is less than a distance threshold; and inputting the random oil samples corresponding to the retained random sample points into the initial sample set to form a final sample set comprising multiple initial oil samples and multiple random oil samples.

[0079] The data expansion device for oil samples provided in this embodiment of the invention, by implementing the above method, can generate a corresponding spectral feature space based on the near-infrared spectral information of multiple initial oil samples, thereby obtaining an initial sample set including multiple initial sample points. By randomly generating multiple random sample points in the spectral feature space and retaining random sample points whose projection distance is less than a distance threshold, random oil samples corresponding to the retained random sample points are obtained. The random oil samples are then input into the initial sample set to obtain the final sample set, thus realizing the data expansion of oil samples. This significantly reduces the analysis and calculation cost of complex oil products while improving the calculation accuracy of the oil property prediction model when simulating and analyzing oil samples.

[0080] Furthermore, the data expansion device for oil samples provided in this embodiment of the invention may also include a communication interface 303 and a bus 304, with the processor 301, memory 302 and communication interface 303 electrically connected via the bus 304.

[0081] The memory 302 may include high-speed random access memory (RAM) and may also include non-volatile memory, such as at least one disk storage device. Communication between this system network element and at least one other network element is achieved through at least one communication interface 303 (which can be wired or wireless), such as the Internet, wide area network, local area network, metropolitan area network, etc. The bus 304 can be an ISA bus, PCI bus, or EISA bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 10 The symbol is represented by a single double-headed arrow, but this does not mean that there is only one bus or one type of bus.

[0082] Processor 301 may be an integrated circuit chip with signal processing capabilities. In implementation, each step of the above method can be completed by the integrated logic circuitry in the hardware of processor 301 or by instructions in software form. Processor 301 can be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc.; it can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this invention. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this invention can be directly manifested as execution by a hardware decoding processor, or execution by a combination of hardware and software modules in the decoding processor. The software module can reside in a readily available storage medium in the art, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory 302. The processor 301 reads the information from memory 302 and, in conjunction with its hardware, completes the steps of the method described in the foregoing embodiments.

[0083] This invention also provides a computer-readable storage medium, which can be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium. The computer-readable storage medium stores instructions that, when executed on a computer, cause the computer to perform the steps of the above-described data expansion method for oil samples.

[0084] The computer-readable storage medium provided in this embodiment of the invention stores data and computer-executable instructions for the above-described data expansion method for oil samples. The data expansion method for oil samples includes: acquiring near-infrared spectral information of multiple initial oil samples; generating a corresponding spectral feature space based on the near-infrared spectral information, wherein an initial sample set including initial sample points of multiple initial oil samples is formed within the spectral feature space; generating multiple random sample points based on the distribution area of ​​the initial sample set within the spectral feature space; calculating the projection distance between each random sample point and its corresponding projection point on the convex hull region of the initial sample set; filtering the random sample points based on the projection distance between each random sample point and its corresponding projection point, retaining random sample points whose projection distance is less than a distance threshold; and inputting the random oil samples corresponding to the retained random sample points into the initial sample set to form a final sample set including multiple initial oil samples and multiple random oil samples.

[0085] The computer-readable storage medium provided in this embodiment of the invention, by implementing the above method, can generate a corresponding spectral feature space based on the near-infrared spectral information of multiple initial oil samples, thereby obtaining an initial sample set including multiple initial sample points. By randomly generating multiple random sample points in the spectral feature space and retaining random sample points whose projection distance is less than a distance threshold, random oil samples corresponding to the retained random sample points are obtained. The random oil samples are then input into the initial sample set to obtain the final sample set, thereby expanding the data of oil samples. This significantly reduces the analysis and calculation cost of complex oil products while improving the calculation accuracy of the oil property prediction model when simulating and analyzing oil samples.

[0086] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0087] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0088] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for data augmentation of oil samples, characterized in that, The method includes: Obtain near-infrared spectral information from multiple initial oil samples; Based on the near-infrared spectral information, a corresponding spectral feature space is generated, and an initial sample set including multiple initial sample points of the initial oil samples is formed in the spectral feature space. Based on the distribution region of the initial sample set in the spectral feature space, multiple random sample points are generated; Calculate the projection distance between each of the random sample points and the projection point of the random sample point on the convex hull region of the initial sample set; Based on the projection distance between each random sample point and the corresponding projection point, the random sample points are filtered, and the random sample points whose projection distance is less than the distance threshold are retained. The random oil samples corresponding to the retained random sample points are input into the initial sample set to form a final sample set including multiple initial oil samples and multiple random oil samples.

2. The data augmentation method for oil samples according to claim 1, characterized in that, The step of obtaining near-infrared spectral information of multiple initial oil samples includes: Near-infrared spectroscopy is performed on each of the initial oil samples to generate a near-infrared spectrum for each initial oil sample; Principal component analysis was used to extract near-infrared spectral information from the near-infrared spectrum.

3. The data augmentation method for oil samples according to claim 1, characterized in that, After the step of generating the corresponding spectral feature space based on the near-infrared spectral information, the method further includes: Clustering is performed on the initial sample set to obtain multiple subsets of the initial oil samples, each subset including all the initial sample points of one type of initial oil sample; In the entire subset of samples, select sparse subsets where the number of initial sample points is less than a certain threshold.

4. The data augmentation method for oil samples according to claim 3, characterized in that, The step of generating multiple random sample points based on the distribution region of the initial sample set in the spectral feature space includes: Based on the distribution region of the initial sample set in the spectral feature space, the distribution region of the sparse subsample set in the spectral feature space is obtained. Based on the distribution region of the sparse subsample set and the number of dimensions of the spectral feature space, the maximum and minimum values ​​of the sparse subsample set on the principal component axis in each dimension of the spectral feature space are obtained. Based on the maximum and minimum values, obtain the random generation region used to randomly generate the random sample points; Multiple random sample points are randomly generated within the range of the randomly generated region.

5. The data augmentation method for oil samples according to claim 4, characterized in that, The step of randomly generating multiple random sample points within the random generation region includes: Obtain the first set of judgment formulas, which is: ; ; in, The sparse subsample set. For one of the random sample points in the sparse subsample set The values ​​on principal component axis j, where the number of i is equal to the number of n, j is the principal component axis j corresponding to each dimension, f is the number of dimensions of the spectral feature space, and n is the number of random sample points. The maximum value on the principal component axis j. The minimum value on the principal component axis j is given by , and w is a random number generated between 0 and 1. When i=0 and i≤n and / or when j=0 and j≤f, the random sample points are randomly generated within the range of the random generation region; otherwise, the generation of the random sample points is stopped.

6. The data augmentation method for oil samples according to claim 3, characterized in that, The projection distance between each random sample point and its projection point on the convex hull region of the initial sample set is calculated based on the following second set of judgment formulas: ; ; ; ; in, Let η be the convex hull region of the sparse subsample set, and let η be any point within the convex hull region of the sparse subsample set. For the convex combination points corresponding to the convex hull region, For the i-th random sample point The squared value of the distance to the convex hull region of the sparse subset, where α is the convex combination weight vector corresponding to the convex combination point. For the projection point, Let m be the projected distance between the random sample point and the projected point, and m be the number of random sample points in the sparse subset. For the random sample points The corresponding optimal convex combination weights.

7. The data augmentation method for oil samples according to claim 6, characterized in that, The step of inputting random oil samples corresponding to the retained random sample points into the initial sample set to form a final sample set including multiple initial oil samples and multiple random oil samples includes: Obtain and retain the convex combination points corresponding to the random sample points and the property parameters of the corresponding initial oil samples; Based on the following formula, the chemical property information of the initial oil sample corresponding to the retained random sample points is obtained: ; in, to The property parameters are the property parameters corresponding to the convex combination points, where α is the weight of the convex combination points, k is the number of convex combination points, and the property parameters include the octane number, olefin content, aromatic content, and distillation range of the initial oil sample. Based on the chemical property information, random oil samples corresponding to the retained random sample points are generated, and the random oil samples are input into the initial sample set to form a final sample set including multiple initial oil samples and multiple random oil samples.

8. A data augmentation device for an oil sample, characterized in that, The device includes: The acquisition module is used to acquire near-infrared spectral information of multiple initial oil samples; The first generation module is used to generate a corresponding spectral feature space based on the near-infrared spectral information, wherein an initial sample set including initial sample points of multiple initial oil samples is formed in the spectral feature space. The second generation module is used to generate multiple random sample points based on the distribution area of ​​the initial sample set in the spectral feature space. The calculation module is used to calculate the projection distance between each of the random sample points and the projection point of the random sample point on the convex hull region of the initial sample set; The filtering module is used to filter the random sample points based on the projection distance between each random sample point and the corresponding projection point, and retain the random sample points whose projection distance is less than a distance threshold; An extension module is used to input random oil samples corresponding to the retained random sample points into the initial sample set, forming a final sample set including multiple initial oil samples and multiple random oil samples.

9. A data augmentation device for oil samples, characterized in that, The data expansion device includes a processor and a memory, wherein the memory stores instructions. The processor invokes the instructions in the memory to cause the data expansion device to implement the data expansion method for oil samples as described in any one of claims 1 to 7.

10. A computer-readable storage medium storing instructions thereon, characterized in that, When the instruction is executed by the processor, it implements the data expansion method for oil samples as described in any one of claims 1 to 7.