Method for determining an item of data representing a risk of the occurrence of a clogging event in a pumping station, corresponding device and program
A predictive method using data engineering and multivariate time series processing addresses the limitations of current clogging event detection, enabling proactive management of pumping stations to prevent unplanned downtime and costs.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- ELECTRICITE DE FRANCE
- Filing Date
- 2025-11-07
- Publication Date
- 2026-06-25
AI Technical Summary
Current methods for predicting clogging events in pumping stations are inadequate, relying on delayed and complex human expertise or snapshot detection, failing to provide predictive capabilities for sudden or large-scale clogging events.
A method combining data engineering, statistical learning algorithms, and multivariate time series processing to anticipate clogging events by comparing current environmental conditions with historical data, using an electronic device with a processing unit to determine risk data.
Enables predictive diagnostics for clogging events, allowing proactive measures to prevent downtime and reduce costs associated with emergency responses.
Smart Images

Figure EP2025082315_25062026_PF_FP_ABST
Abstract
Description
[0001] DESCRIPTION
[0002] Title: Method for determining data representative of the risk of occurrence of a clogging event in a pumping station, corresponding device and program.
[0003] Scope of the invention
[0004] The invention lies in the field of prevention and mitigation of clogging risks affecting water pumping stations used for cooling industrial installations.
[0005] Previous Art
[0006] Pumping stations are used in various industrial and environmental fields. They play important roles in diverse industrial sectors, such as nuclear power plants, where they provide large quantities of water used for cooling installations, for example.
[0007] Pumping stations face various unpredictable situations that can affect their operation. One of the main threats is the massive influx of clogging materials (MCM). These sudden events, where a large quantity of clogging material (plant debris, algae, fish, etc.) is carried by the water, can obstruct pumping and filtration systems. MCM is often caused by natural phenomena such as tides, floods, or storms.
[0008] Furthermore, conditions such as streamflow, water level, and wind speed and direction can vary unpredictably. These fluctuations can lead to changes in the amount and nature of clogging agents present in the water, thus increasing the risk of clogging. Extreme weather conditions also pose a threat. Storms, heavy rainfall, floods, and droughts can all affect the amount of debris carried by the water. These extreme weather events increase the risk of clogging by introducing additional debris into pumping systems. Seasonal changes also influence the amount of debris in the water. For example, spring floods can carry plant debris that has accumulated over the winter.Conversely, periods of low flow in summer allow debris to accumulate in waterways, thus increasing the risk of clogging during these low-flow episodes. Human activities also contribute to the presence of clogging agents in the water. Industrial discharges, construction work, and agriculture can introduce debris and sediment into waterways. Finally, biological phenomena can lead to clogging events. The proliferation of certain species, such as algae or jellyfish, can be influenced by environmental factors such as water temperature and nutrient availability. These biological phenomena increase the risk of clogging and require constant monitoring.
[0009] The current approach to detecting and / or managing clogging problems relies primarily on industry-specific rules and human expertise. For example, rules such as "if the river flow rate exceeds X m / s, then the period is at risk" are applied by teams responsible for managing pumping stations. This approach can be supplemented by forecasts of certain key variables (such as future flow rate or wind speed) provided by specialized organizations like Météo-France or EDF DTG.
[0010] However, no device exists for predicting the massive arrival of the sealant. A technique is described in patent application WO2017186603, but it is limited to diagnosis with a delay of 6 hours before the feared event and is complex to implement.
[0011] There are also devices for measuring the presence of clogging agents, such as buoys to detect seaweed (a collection of plant debris) or the use of drones to detect jellyfish swarms. However, these devices do not provide predictive capabilities, but only snapshots of clogging presence. Devices for cleaning and improving the filtration systems of pumping stations also exist, but they cannot anticipate sudden or large-scale clogging events.
[0012] It is therefore necessary to have a solution that allows us to at least partially resolve this lack of medium- to long-term forecasting regarding future clogging situations within pumping stations.
[0013] Summary of the invention
[0014] To overcome at least some of the drawbacks of the prior art, the invention combines data engineering methods, statistical learning algorithms, and multivariate time series processing techniques to anticipate clogging events and provide predictive diagnostics.
[0015] More specifically, the disclosure relates to a process for determining data representing a risk of a clogging event occurring at a pumping station on an industrial site. This process is implemented using an electronic device comprising a memory and a processing unit. The process includes at least one iteration of the following steps:
[0016] obtaining at least one time series Q of current environmental conditions of the pumping station; determining, within historical data in the form of at least one time series X of past environmental conditions, at least a partial correspondence of at least one variable of the time series Q and a corresponding variable of said at least one time series X;
[0017] when it is determined that at least a partial correspondence exists between said at least one variable of the time series Q and said at least one corresponding variable of said at least one time series X, obtaining, using a time series Y of past clogging events of the pumping station, the risk data for the occurrence of a clogging event of the pumping station.
[0018] This approach enables predictive diagnostics based on historical events, allowing pumping station managers to take preventative measures before clogging occurs. This improves the reliability and efficiency of pumping operations, reduces unplanned downtime, and minimizes costs associated with emergency response and cleanup.
[0019] According to a particular characteristic, said at least one time series Q of current environmental conditions and said at least one time series Q of past environmental conditions each include at least one time series relating to the external environment Q ENV X ENV of the industrial site and at least one time series relating to the internal environment Q TRA X TRA from the pumping station and / or the industrial site.
[0020] Thus, the process allows for capturing a comprehensive picture of the factors influencing the risk of clogging. This multivariate approach makes it possible to better identify the interactions between external and internal conditions, thereby improving the accuracy of predictive diagnoses.
[0021] According to a particular characteristic, said at least one time series relating to the internal environment Q TRA X TRA the pumping station includes variables that are independent of decisions and / or operating rules of the industrial site.
[0022] Thus, the process avoids biases introduced by human intervention or operational changes.
[0023] According to a particular feature, the step of obtaining the Q time series of current environmental conditions includes at least one step of completing missing values for at least some variables of the Q time series of current environmental conditions.
[0024] Thus, it is possible to provide a continuous character to data that may be in a discrete form.
[0025] Depending on a particular characteristic, the determination stage includes: a first search stage, based on a first time series Q ENV current including at least one variable relating to the external environment of the industrial site, with at least a partial correspondence with a time window W ENVi extracted from a first time series X ENV including at least one corresponding variable, the size of the time window W ENVibeing at most equal to the size of the first time series Q ENV .
[0026] Thus, it is easier to search in the first time series X ENV : the amount of data contained in this series is thus processed in successive windows which make it easier to find a match.
[0027] According to a particular characteristic, when at least a partial match is found during the first search step, a second search step, based on a second time series Q TRA current including at least one variable relating to the internal environment of the industrial site, with at least a partial correspondence with a time window W TRAi extracted from a second time series X TRA including at least one corresponding variable, the size of the time window W TRAibeing at most equal to the size of the second time series Q TRA According to a specific characteristic, the first step of searching for at least a partial match is implemented for a set of time windows. This allows for more efficient data processing, while also enabling the application of computational optimizations that facilitate profile creation. This methodology is obviously applicable to the second time series X. TRA ., which thus produces the same effects. In an example of a complementary implementation, rather than first performing a search on the first time series X ENV It is entirely feasible and usable to perform the first search using the second time series X TRA In other embodiments, a single search is performed on the time series X which includes both the first time series X ENVand the second time series X TRA , the set of variables composing these series then being combined into a single series. This is also applicable to Q (combining Q ENV and Q TRA ), as indicated in relation to the general presentation made previously, the set of these series being multivariate, for example.
[0028] According to a particular characteristic, at least partial correspondence between two time series is obtained by calculating a minimum distance separating the values of the different variables that correspond within the two time series.
[0029] In another aspect, the invention also relates to a device for determining the risk of a pumping station clogging event. This device comprises a memory and a processing unit. The processing unit is configured to execute at least one iteration of the following steps:
[0030] obtaining at least one multivariate time series of current environmental conditions of the pumping station;
[0031] determination, within historical data presented in the form of at least one multivariate time series of past environmental conditions, of at least a partial correspondence between the evolution of at least one variable of the multivariate time series and a corresponding variable of said at least one time series;
[0032] When it is determined that at least a partial correspondence exists between the evolution of at least one variable in the multivariate time series and at least one corresponding variable in at least one time series, the risk data for the occurrence of a pumping station clogging event is obtained using a time series of past clogging events at the pumping station. In another aspect, the invention also relates to a computer program capable of implementing the described method, as well as to a data storage medium for this computer program.
[0033] The electronic device for determining data representative of the risk of a pumping station clogging event has a computer architecture. It is equipped with one or more processors capable of executing all types of computer programs, from operating systems to application software, written in compiled or interpreted languages. The various components of the electronic device for determining data representative of the risk of a pumping station clogging event are interconnected by a communication bus.The electronic device for determining data representative of the risk of a pumping station clogging event may optionally be equipped with a communication system to communicate via protocols such as Bluetooth, Ethernet, or Wi-Fi with other systems and connect to mobile or fixed telecommunications networks. The electronic device for determining data representative of the risk of a pumping station clogging event also includes memory components that store the data and programs necessary for the device's operation.The electronic device for determining data representative of the risk of a pumping station clogging event is further modified so that it can perform the operations of obtaining current environmental data via sensors, organized into multivariate time series; searching for partial matches between this current data and historical data also in the form of time series; and when a partial match is found, using time series representing past clogging events to determine the risk of a pumping station clogging event.
[0034] Data storage media can be any entity or device capable of storing programs. For example, media can include a storage medium, such as a ROM (e.g., a CD-ROM or a microelectronic circuit ROM), or a magnetic recording medium such as a hard drive, or more commonly, flash memory. Alternatively, media can be transmissible, such as an electrical or optical signal, which can be transmitted via an electrical or optical cable, by radio, or by other means. Programs according to the invention can, in particular, be downloaded from a network such as the Internet. Alternatively, the information storage medium can be an integrated circuit in which the program is incorporated, the circuit being adapted to execute or to be used in the execution of the process in question.
[0035] Brief description of the figures
[0036] Other features and advantages of the invention will become more apparent upon reading the following description of a particular embodiment, given by way of simple illustrative and non-limiting example, and the accompanying drawings, among which:
[0037] [Fig. 1] represents the main steps in the disclosure process;
[0038] [Fig. 2] represents the implementation of a distance profile calculation for an example of implementation;
[0039] [Fig. 3] illustrates a device for implementing the disclosure process.
[0040] Description of a method of implementation
[0041] As previously explained, one objective of this disclosure is to establish a mechanism for determining the risk of a pumping station clogging event. The study of the clogging phenomenon has shown that multiple timeframes exist. Periods of high water increase the amount of debris in circulation, as plant debris is carried into the watercourse. During periods of low flow, this accumulated material settles to the bottom of the watercourse. During periods of high flow, some of this material is re-moved, which, during low tides for example, leads to a high concentration of debris in the water column, increasing the risk of clogging. The combination of these factors results in a time lag between the (multiple) causes and the effect (i.e., clogging).The difficulty in predicting such effects may be compounded by the fact that measurements may be taken not only remotely in time but also in space (water flow sensors may be located away from the pumping site itself, for example).
[0042] The inventors developed a method for searching for similarities in past environmental conditions (which had been previously measured), these past environmental conditions having led (or not) to subsequent clogging situations (which had also been documented, i.e., recorded). One of the inventors' motivations is that experts in the field suggest that the clogging phenomenon is influenced by the accumulation of sedimentary deposits forming in certain locations during periods of insufficient water flow (i.e., without flooding in winter / spring). These stored deposits can be released in the event of flooding or high tides the following year, for example.
[0043] This process is implemented iteratively, using current environmental conditions (CEC) data, which are obtained from sensors located within, near, and / or at a distance from one or more pumping stations. Ideally, the current environmental conditions (CEC) data include both environmental data around the pumping station and operational data from the pumping station itself, such as data from internal water circulation or water pressure sensors in filtration or flow devices. The current environmental conditions (CEC) data are organized as multivariate time series (Q), also known as queries.The variables in these multivariate time series represent the evolution of the values measured by the different sensors of environmental conditions (precipitation, water level, wind, strength of currents around the pumping station and water pressure, water level, opacity, within the components of the pumping station, etc.).
[0044] One or more (potentially partial) correspondence searches are performed between these current data (Q) and past historical data (DRCEx), also in the form of multivariate time series (X), notably by calculating minimizations of similarity functions, as explained later. Naturally, the variables of the current environmental conditions (DRCEQ) and the historical data (DRCEx) relate to at least identical or very similar environmental data.
[0045] When a match (partial or complete) is found, a risk data point for the occurrence of a clogging event can be searched for and / or obtained using time series (Y) that include past clogging (or non-clogging) events. In other words, this process makes it possible to anticipate clogging risks by comparing current environmental conditions Q (current) with similar past situations X, based on a multivariate approach. Thus, by identifying similar environmental conditions in the past, and similar operating conditions (of the pumping station) during these past periods, the disclosure technique provides the pumping station operator with a limited set of known past situations, these situations being as close as possible to the period of interest in terms of medium / long-term clogging risk.
[0046] The iterative process developed by the inventors consists of searching historical data for data similar to or comparable to the query. The query itself can include data spanning a longer or shorter period, depending on operational implementation conditions. For example, the query data can cover a period of one week, one month, one year, or even several years. Events that can influence the size of the query (Q) include, for example, the date of the last cleaning of the pumping station or other events affecting both the pumping station itself and the surrounding waterways.
[0047] In any case, starting with the query data (which can therefore be very numerous and voluminous), it is necessary to search for similar data in the historical data (X). This search can be performed using a sliding window (W), for example, where the size of the window (W) in the historical data (X) is equal to the size of the query (or a fraction thereof). This search can also be performed using a sliding time window, meaning that it is not the quantity of query data (Q) that is used, but the time period covered by the query. This possibility implies, according to the present document, that prior data transformation processing must take place to adapt the input data (X, Q, Y) to this search strategy. Other search possibilities are conceivable, as detailed below.
[0048] If no similarity is found, the window (14^) is shifted (shifted by one time step or shifted to the next values of the historical data X) and the search is performed on the next portion (l ^ + i) of historical data. In case of similarity, a process including the search for events, within the series of events (K) associated with this next portion (l ^ + (i) Historical data (X) is analyzed: the presence or absence of a pumping station clogging event possibly associated with the historical data portion is checked, and so on, either until all historical data is collected or based on a predetermined stopping parameter. Figure 1 illustrates the different steps of the implemented process.
[0049] More specifically, the process includes at least one iteration:
[0050] a preliminary acquisition step (S01), from at least one sensor (CAP0,... CAP Z ), of representative data of current environmental conditions (DRCEQ) relating to the environment of the pumping station, this environmental conditions data (DRCEQ) being organized in the form of at least one multivariate time series (Q);
[0051] at least one search step (S02), within historical data presented in the form of at least one multivariate time series (X) representative of past environmental conditions (DRCEx), of at least a partial correspondence between the evolution of at least one variable of the multivariate time series (Q) and a corresponding variable of said at least one time series (X);
[0052] when it is determined that at least a partial correspondence exists between the evolution of said at least one variable of the multivariate time series (Q) and of said at least one corresponding variable of said at least one time series X), obtaining (S03), using a time series (K) representative of past clogging events of the pumping station, the risk data of occurrence of a clogging event (DRSEC) of the pumping station.
[0053] The determination is therefore carried out in two stages. Depending on the implementation, the search stage (S02) can also be divided into two separate searches, particularly to limit the calculations required. The inventors propose various other computational optimizations to make this method simpler, faster, and more energy-efficient to implement.
[0054] Thus, the proposed method allows the pumping station manager (and, more broadly, the production site manager) to provide a sufficiently early warning of a likely clogging event – early warning in the sense that the manager has time to consider ways to limit the consequences of the likely event, for example, by temporarily reducing pump power during the at-risk period without halting production, or possibly switching to another cooling source. This is achieved by using regularly collected data describing the pumping station's environment (weather data, river flow rate, tidal coefficient, etc.).Using internal data related to the operation of the pumping station (pressure differentials and / or water level differentials in circuits supplying filter drums or in recirculation circuits, for example, these data providing direct or indirect indications of the congestion of the pumping station's internal components and therefore its capacity to efficiently pump the water volumes necessary for effective cooling), the proposed method allows for the production of predictive diagnoses, based on the Y series, over a given future period (future within the correspondence found in X with respect to Q) (for example, a full week). Thus, the proposed method produces an AMC susceptibility diagnosis through pure statistical learning, based on multivariate time series.
[0055] The search step (S02) itself may be preceded by a possible division of historical and current data into two distinct data types: historical data and current data relating to the external environment of the pumping station (Q ENV and X ENV ) and historical and current data relating to the internal environment of the pumping station (Q T RA and X TRA This division may pre-exist, and is therefore optional.
[0056] Assuming such a breakdown is carried out, the search step (S02) includes:
[0057] the determination of a window (W ENVi current data from historical data (X ENV ), the window size (W ENVi ) current is at most equal to the size of the query (Q ENV ); the calculation of a current distance (d ENVi ) between the data of the query variables (QENV) and thees data from the corresponding variables of the window (W ENVi ) current; When the current distance (d ENVi ) between the data of the query variables (Q ENV ) and the corresponding window variable data (W ENVi ) is greater than a predetermined threshold provided as a parameter, we move to the next window (W ENVi+1 ). When the current distance (d ENVi ) is below the predetermined threshold, it is then investigated whether the internal environmental conditions of the pumping station (Q T RA and X TRA ) are also similar. To do this, this second search includes:
[0058] the determination of a window (W TRAi current data from historical data (X TRA ), the window size (W TRAi ) current is at most equal to the size of the query (Q TRA ); the calculation of a current distance (d TRAi) between the data of the query variables (QTRA) and the es data from the corresponding variables of the window (W TRAi ) current; When the current distance (d TRAi ) is greater than a predetermined threshold (which is also a process parameter), then we move to the next external environmental conditions window (W ENVi+1 ). When the current distance (d TRAIf the value is below the predetermined threshold, the determination of a possible occurrence of a clogging situation is sought in the time series Y representing past clogging events at the pumping station. In its simplest form, this time series (Y) of clogging events specifies the dates on which these events occurred, and it is a binary series: 1s represent the days (or hours) corresponding to cloggings, while 0s represent the days (or hours) corresponding to the absence of cloggings. Other types of series Y can be considered, such as series including probabilities of clogging occurrence or other data allowing clogging prediction. In one example implementation, a future period (future with respect to the window (W)) is determined using a parameter of the method. TRAi )) in the time series (Y) of clogging events. For example, when the window (WTRAi ) spans a period of 1 month; we will look for what happens one week after this month in the time series (Y). Thus, I, the risk data for the occurrence of a pumping station clogging event, can take several forms depending on the implementation.
[0059] It is understood that the proposed breakdown into external environmental data, internal environmental data, and clogging event data is a specific, non-exhaustive implementation intended to simplify the explanations provided. It is entirely possible to use only a single multivariate time series of historical data X and, within the framework of the steps presented above, to use only certain variables to perform the calculations.
[0060] The following pseudocode illustrates the proposed methodology in an example implementation. This pseudocode performs a calculation for four pumping stations (TRAlà4), which explains the division into four series.
[0061] **Entry**: X £ IR m , d , Y £ K m , Q £ IR 1 , d , T, W £ IR d , [t env , i, ■ ■ ■, tenv, e], [Ai, ■ ■ ■, A e ], lenv, lira, strategy ** Output **: ÿ £ IR 1. Required conditions: l > ^env + maxi t env , i — m > l 2. Extract the subsets ■■ — Aenv, A t ra <- extract_subsets (W) - A'env, Xtral, Y t ra2, Ytra3, Ytra4 extract _SUbsets(X) — Qenv, Qtral, Qtra2, QtraS, Qtra4 eXtraCt_SUbsets(Q) 3. Initialize P env <- zeros_array(m — l + 1) 4. For each i ∈ [1, e]: Pi P (Qenv, X env , (nv, ^env, i, A e nv) X Aj — Penv Penv + Pi 5. Extract regions below the threshold ■■ — Renv <- extract j'egionsJbellow(Pe n V ,min(P eiiV ) x T) 6. Initialize the arrays — y <- zeros_array(4') — distances <- zeros_array(4') 7. For each slice i ∈ [1,4]: — Initialize P tia <- infinity_array(4,m — l + 1) — For each slice j ∈ [1,4]:
[0062]
[0063] — For each region region £ R env ■■ - p tra[ / > region] <- P(Qtrai, Xtraj, region, l tTa , 0, A tra )
[0064] — Extract R tia <- extract_regions_bellow(Pt Ta i,min(Pt Tai ') x T)
[0065] — ÿ[i] l / l|Ptra|| ||Ptra| | max (Y [P tra[ / ]])
[0066] — distances [i] <- min(P tra i) x T
[0067] 8. Select the strategy ■■
[0068] — If strategy == 'best', return ÿ[argmin(distancesy]
[0069] — If strategy == 'max', return max(ÿ)
[0070]
[0071] — If strategy == 'average', return mean(y)
[0072] The notations used in this pseudocode example are as follows. The algorithm takes as input one (or more) historical data points XG Æ m ' d , where m is the size of the history and d is the number of sensors. As mentioned previously, it also uses a history of the YGR clogging risk score m which can be built via the history of clogging events. Next, a QGR query is considered. lid , where l is the query size (with l < m), representing the data from which we wish to estimate the possible future occurrence of a clogging. In the most common case, Q contains the l latest data emitted by the system (system viewed as a set of sensors), in order to provide an evaluation at the present time.
[0073] The query Q typically contains data from the previous N years (for example, 1 or 2 years) to account for phenomena that are sometimes slow and sometimes rapid in their dynamics. In other situations, such as those related to different geographical areas, the query Q may include data spanning shorter periods (from a few months to a few days). To account for these phenomena, as explained previously, the proposed methodology is broken down into two steps:
[0074] 1. Calculation of the environmental profile between Q ENV and X ENV , which aims to find situations similar to Q in X in terms of environmental conditions (wind, water height, flow rate, etc.).
[0075] 2. Calculation of slice profiles between Q TRA and X TRA, which seeks to identify, within the similar environmental situations of stage 1, similar situations in terms of internal operating conditions of the pumping station.
[0076] A very simplified example of calculating a profile between XGR m,d and QGR l,d , is illustrated by Figure 2. For easier understanding of the example in Figure 2, m = 10, l = 3 and d = 1. This is a definite operation such that:
[0077] [Math 1]
[0078] P
[0079]
[0080] (Q, X) = {Pi,...,p 7n -(ji)}
[0081] With Pi = dQ, Wt) knowing d is a dissimilarity function between two vectors (e.g., the dissimilarity function is a Euclidean distance) and W t = [xt,...,x i+l _^ GR l,d, the subsequence of size l starting at index i in X. A situation W, from the history is considered similar to Q if p t <.minP Q, X), with T a tolerance parameter.
[0082] We can also calculate a distance profile with a set of subqueries defined by time constraints [t 1( ..., t e and a length 1' < l with the constraint such that 1' + maXj tj < l. In this configuration, we define:
[0083] [Math 2]
[0084] e
[0085] Pi = ^ pi,j
[0086]
[0087] j
[0088] With:
[0089] [Math 3]
[0090] W?' 1 ' = [x i+t7 . ^ +ty+i ,_1] e R l ' ,d
[0091]
[0092] Q tj ' 1 ' = [^tj. q tj+ i'-i] t R l '' d
[0093] In view of these explanations, in the example in Figure 2, the fourth window (W4) is the one that has the smallest distance from the query Q.
[0094] It is also possible to assign a weight  to each subquery to establish its importance in the similarity search, such as:
[0095] [Math 4]
[0096] e
[0097] Pi = ^pt,^j
[0098]
[0099] j
[0100] These profile calculations are applicable, as explained above, to the X series TRA X ENVQTRA > QENV > when this profile calculation methodology is preferred. Of course, the example in Figure 2 is not representative of reality. On the one hand, the length of the series is much longer and can represent several thousand values, both for current environmental series and for historical environmental series, and this across several dimensions (multivariate). Processing such quantities of data is therefore not considered feasible mentally or manually, primarily due to the length of the calculations required, which would result in such a long computation time that the result would likely be obtained after the clogging event has occurred, which is unacceptable, and secondly because of the risk of errors.It must be noted, however, that given the length of these series, even the computerized processing of the previously presented calculations may require optimizations, both to increase the speed of the calculations and to reduce the resources allocated to them. For these reasons, as mentioned earlier, optimization calculators can be implemented. Thus, to increase the efficiency of the proposed method, the following optimizations have been successfully implemented:
[0101] Sliding calculation of the mean and standard deviation: instead of recalculating the mean and standard deviation for each subsequence (V / ), sliding sums are used. This reduces the time complexity from O(ml~) to O(m), where m is the size of the time series and l is the length of the subsequence (V / );
[0102] More specifically, given a time series X = {x lt x mGiven a sequence of size m and a parameter of length l, we wish to extract the mean and standard deviation of all subsequences
[0103]
[0104] such as W t = {%,,...,
[0105]
[0106] To avoid having to calculate ( -, o L independently for each W t The inventors propose to keep two sliding sums to calculate [1^ such that:
[0107] [Math 5]
[0108] ii
[0109] XL = Wj, Zj = Wj 2
[0110]
[0111] j=lj=l
[0112] SO
[0113] [Math 6]
[0114] X
[0115]
[0116] i = Xi-! + Wi +l _! ~Wi-^X? =X?-! + w^i^-wf-!
[0117] And
[0118] [Math 7] >
[0119] x, 1 - lx l? 2
[0120]
[0121] = ~ , (Ti =
[0122] This avoids recalculating the sums X t And? For each W t , because they share the values l - 1 with the sums of
[0123]
[0124] thus significantly reducing temporal complexity.
[0125] Euclidean distance optimization: The Euclidean distance is reformulated to avoid recalculating redundant operations. The sum of squares of the query elements (Q) is calculated only once. The sum of squares of the sliding window elements (W) is calculated using a sliding sum. Cross-correlation is calculated using the convolution theorem, resulting in significant performance gains for large query and sliding window sizes. More specifically, the Euclidean (squared) distance can be expressed as follows:
[0126] [Math 8] iiii dist(Q, W) = ^(q y - Wj) 2 = ^(q7) 2 + ^(w,) 2 - 2 qj. Wj
[0127]
[0128] J=i J=i J=i J=i In the context of a distance profile P(Q, X) between a series X and a query Q, this way of expressing the operation has certain advantages that allow us to avoid recalculating many operations when calculating distances with sliding windows (for example, dist Q,
[0129]
[0130] and dist Q, V i+ i)):
[0131] S;=I(Q7) only needs to be calculated once for any dist Q, W),
[0132] 2
[0133] S=iK) can be calculated using a sliding sum,
[0134] 2
[0135]
[0136] ) =1qj-Wj is an element of the cross correlation Q * X. Using the convolution theorem, we can calculate the cross correlation in a manner equivalent to a convolution in the frequency domain with point-by-point multiplication, allowing a significant performance gain on large time series.
[0137] Cross-correlation for sliding queries: When calculating distance profiles for sliding queries, the cross-correlation is updated linearly from the previous correlation. This avoids recalculating the entire correlation for each new query.
[0138] Indeed, considering two series X, Y of size m, when it is decided to calculate the distance profiles PQ Q, P(X, Q2),... with Q t = yi,..., y (+( -i ]extracted from Y, a sliding window on Y is used to obtain the queries Q tIt is then possible to avoid recalculating the cross correlation Q t * X for each new request;
[0139] Considering the following cross-correlation
[0140]
[0141] * X = {c 1( ..., c m-i+1 ] calculated for the Qi_ query 1; the following cross-correlation for query Q t Qt- *X =
[0142] {c'x,..., c' m-i+1}) can be obtained in linear time from the
[0143]
[0144] * X such that: [Math 9]
[0145] c 'j = c ji ~ yi-i- x ji + yi+ii- x j+ii
[0146] Using this method, it is sufficient to recalculate the first element (c') each time the query Q is updated. Similarly, it is also possible to parallelize the calculation of the matrix profile of all distance profiles.
[0147] Optimizations for standard normalized distance profiles: The standard normalized Euclidean distance can also be similarly reformulated to include mean and standard deviation terms. This allows the use of methods similar to those used for simple Euclidean distance optimizations. Thus, the distance can be reformulated as follows:
[0148] [Math 10] distÇQ, W) = 21 ( 1 - QW l ^ w \
[0149]
[0150] IVQVW J
[0151] Further, more advanced optimizations are also possible using graphics processing units (GPUs) to perform the calculations. Optimizations for other distances, such as dynamic time warping, are also feasible and allow for a similar reduction in complexity to that achieved with Euclidean distance. Furthermore, time series indexing methods can be implemented in conjunction with, or as a replacement for, the methods described above.
[0152] Depending on the implementation, the historical data (X) and query data (Q) may also be incomplete, excessive, or insufficient. For example, data may be missing (due to inconsistent time intervals for data collection or faulty sensors). Therefore, preprocessing this data can be crucial in such cases. In industrial sites like those described earlier, there is a significant amount of sensor data. To avoid bias, the inventors selected data that is not influenced by the site's operating decisions or rules. For instance, the activation of washing pumps or changes in the speed of specific mechanisms (such as a filter drum) were excluded.
[0153] According to the present method, the retained variables (data) undergo completion, for example, linear interpolation when the number of successive missing data points is less than or equal to a predetermined threshold (e.g., starting from 3 missing values), and interpolation via a regression model in other cases. For example, to predict the missing water level over certain intervals, it is possible to use water flow data and the distances from the moon and the sun (data that influences tides) to predict the missing values. Also, some data may exhibit discrete rather than continuous patterns over certain time intervals (possibly due to interpolation). A moving average can then be applied to prevent the re-identification of these intervals by the method of the invention, due to this discrete nature.
[0154] In relation to Figure 3, the computerized device (100) for implementing the method for determining the risk of a clogging event occurring at a pumping station includes several interconnected components to ensure the efficient processing of environmental and operational data.
[0155] The device (100) includes a central processing unit (CPU) (110) that executes the computer program instructions. This CPU is connected to main memory (120) that stores the data and programs necessary for the device's operation. The main memory (120) may include random access memory (RAM) modules and non-volatile memory modules (ROM, SSD).
[0156] To accelerate intensive calculations, the device (100) is equipped with graphics coprocessors (GPUs) (130). These GPUs are used to perform parallel calculations, particularly for cross-correlation and convolution operations, thus reducing the processing time of multivariate time series.
[0157] The device (100) also includes interfaces for obtaining representative data of current environmental conditions (DRCE) from the sensing devices (CAPO, CAP1,..., CAPx) (140). These sensors are, for example, connected to the device via communication interfaces (150) such as USB ports, network interfaces (Ethernet, WiFi), or wireless communication protocols (Bluetooth).
[0158] The captured data are organized into multivariate time series (Q) and stored in a database (160). This database can be hosted locally on a hard disk drive (HDD) or an SSD, or remotely on a cloud server accessible via a network connection.
[0159] The device (100) may also include a dedicated or non-dedicated data processing unit (170) that executes the similarity search and distance profile calculation algorithms. This processing unit may be integrated into the central unit (110) or be a separate module optimized for time series calculations.
Claims
DEMANDS 1. Method for determining data representing a risk of occurrence of a clogging event of a pumping station of an industrial site, method implemented by means of an electronic device comprising a memory and a computing unit, method comprising at least one iteration of the following steps: obtaining (S01) at least one time series Q of current environmental conditions (DRCEQ) of the pumping station; determination (SO2), within historical data in the form of at least one time series X of past environmental conditions (DRCEx), of at least a partial correspondence of at least one variable of the time series Q and a corresponding variable of said at least one time series X; when it is determined that at least a partial correspondence exists between said at least one variable of the time series Q and said at least one corresponding variable of said at least one time series X, obtaining (S03), using a time series Y of past clogging events of the pumping station, the risk data for the occurrence of a clogging event of the pumping station.
2. A determination method according to claim 1 characterized in that said at least one time series Q of current environmental conditions (DRCEQ) and said at least one time series X of past environmental conditions (DRCEx) each comprise at least one time series relating to the external environment Q ENV X ENV of the industrial site and at least one time series relating to the internal environment Q TRA X TRA from the pumping station and / or the industrial site.
3. A determination method according to claim 2, characterized in that said method determines at least one time series relating to the internal environment Q TRA X TRA the pumping station includes variables that are independent of decisions and / or operating rules of the industrial site.
4. Method of determination according to claim 1 to 3, characterized in that the step of obtaining (S01) the time series Q of current environmental conditions (DRCEQ) comprises at least one step of completing missing values for at least some variables of the time series Q of current environmental conditions (DRCEQ).
5. A determination method according to claim 1, characterized in that the determination step (SO2) comprises: a first research step, based on a first time series Q ENVcurrent including at least one variable relating to the external environment of the industrial site, with at least a partial correspondence with a time window W ENVi extracted from a first time series X ENV including at least one corresponding variable, the size of the time window W ENVi being at most equal to the size of the first time series Q ENV .
6. A determination method according to claim 5, characterized in that when at least a partial match is found during the first search step, a second search step is performed, based on a second time series Q TRA current including at least one variable relating to the internal environment of the industrial site, with at least a partial correspondence with a time window W TRAi extracted from a second time series X TRAincluding at least one corresponding variable, the size of the time window W TRAi being at most equal to the size of the second time series Q T RA- 7. A determination method according to claim 5 or 6, characterized in that the first step of searching for at least a partial match is implemented for a set of time windows (W ENV1 , W ENVi-m+1 ).
8. Method of determination according to claim 1 to 7, characterized in that the at least partial correspondence between two time series is obtained by calculating a minimum distance separating the values of the different variables which correspond within the two time series.
9. Device for determining data representing a risk of occurrence of a clogging event at a pumping station, device comprising a memory and a computing unit, said computing unit being configured to execute at least one iteration of the following steps: obtaining at least one Q time series of current environmental conditions (DRCEQ) of the pumping station; determination, within historical data in the form of at least one past environmental conditions time series X (DRCEX), of at least a partial correspondence between at least one variable of the time series Q and a corresponding variable of said at least one time series X; when it is determined that at least a partial correspondence exists between said at least one variable of the time series Q and said at least one corresponding variable of said at least one time series X, obtaining, using a time series Y of past clogging events of the pumping station, the risk data for the occurrence of a clogging event of the pumping station.
10. Computer program comprising instructions for the implementation of a method according to any one of claims 1 to 8, when said instructions are executed by a processor of a computer processing circuit.