System and computer-implemented method for detecting non-connected WI-FI devices with random mac address in high-density environments
The system addresses the challenge of counting devices with random MAC addresses in high-density environments by using post-processing techniques to analyze probe request cadences and movement patterns, providing accurate and reliable device counting in large-scale events.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- GALGUS GLOBAL SRL
- Filing Date
- 2025-01-10
- Publication Date
- 2026-06-18
AI Technical Summary
Existing Wi-Fi analytics systems struggle to accurately count and track devices with random MAC addresses, especially in high-density environments, due to MAC randomization and the inability to distinguish between similar fingerprints, leading to unreliable data and inaccurate device counts.
A system and method that uses post-processing of probe request data from multiple access points to determine the number of non-connected Wi-Fi devices with random MAC addresses by analyzing probe request cadences, movement patterns, and rarity of fingerprints, allowing for robust device counting in large-scale events and crowded spaces.
Enables accurate and reliable device counting in high-density environments over extended time windows, improving the precision of Wi-Fi device detection by accounting for anomalies and movement patterns, even when devices with similar fingerprints are present.
Smart Images

Figure EP2025050570_18062026_PF_FP_ABST
Abstract
Description
[0001] SYSTEM AND COMPUTER-IMPLEMENTED METHOD FOR DETECTING NONCONNECTED WI-FI DEVICES WITH RANDOM MAC ADDRESS IN HIGH-DENSITY ENVIRONMENTS
[0002] DESCRIPTION
[0003] FIELD
[0004] The present invention relates to systems and methods for detecting Wi-Fi devices that employ random MAC address and are not connected to access points. More particularly, the present invention estimates the number of these Wi-Fi devices (non-associated and using random MAC) that are present in overcrowded environments.
[0005] BACKGROUND
[0006] Wi-Fi, based on the IEEE 802.1 1 standard, is one of the most pervasive technologies in today’s society. There are more than 14 billion Wi-Fi devices in the world, and this trend is only increasing. In recent years, analytics performed on Wi-Fi data allows passive device counting, segmentation according to visiting patterns, obtaining user density maps, location and tracking of Wi-Fi devices, among other services.
[0007] Wi-Fi devices broadcast wireless signals with a certain cadence that can be captured by nearby access points (APs). The access points listen to these signals and report the RSSI (Received Signal Strength Indicator), which indicates the received power. These signals, in the form of management frames called "Probe Request", are used by devices (smartphones, tablets, laptops, etc.) to discover new Wi-Fi networks nearby, as well as to search for those they already know. In these messages, devices announce their capabilities, what they expect to find on nearby networks, the speeds they support, etc. In addition, the probe request frames include an identifier of the sending device, the source MAC address.
[0008] A Wi-Fi analytics system can gather the probe requests using the access point’s radios, and then build statistics with this data (e.g. device identification for counting or tracking). Unfortunately, there are several problems that cause conventional Wi-Fi analytics systems to have very low accuracies and significant distortion in the data, rendering the metrics that are built with them useless. The two fundamental problems are: • Each device, depending on numerous variables, broadcasts frames at a different cadence, and with a different burst pattern. Among these uncontrolled variables are the manufacturer and model of the device, the operating system version, the remaining battery, how close it is to an access point, whether it is in use or not, whether the screen is off or not, and so on.
[0009] • Modern devices falsify their source MAC address, precisely to confuse Wi-Fi analytics systems so that they are unable to count, track, etc. More than 90% of modern smartphones, tablets and laptops exhibit this behaviour. Wi-Fi devices randomly change their unique identifier, yielding their identification much more difficult. This phenomenon is known as MAC randomization and has a critical impact on the analytics that can be done with Wi-Fi.
[0010] Faced with these problems, most Wi-Fi analytics systems choose to either perform analytics only (i) on all MAC addresses, regardless of whether they are random or not, or (ii) on the small percentage of devices that do not randomise their MAC address. In the first case, as many devices are continuously changing their MAC, the system could be counting 1200 devices when there are actually 90, for instance. In the second case, the system is throwing away the vast majority of information (for instance, it could be counting 5 devices when there are actually 90), since these devices with real MAC address are usually old and therefore not representative of the current corpus of commercial devices.
[0011] Neither of these two options provides robust and reliable data against the MAC randomization problem. And this behaviour will only increase in the coming years until devices with static (real) MACs disappear. There is a need for a technology capable of building stable and robust identifiers of the devices around it, as MAC addresses are no longer reliable for applications such as counting, location, tracking, segmentation, etc.
[0012] Patent document WO2021 104657-A1 solves this problem by performing a cluster analysis on a time series of the sequence numbers included in the header of the probe request frames. The invention works fine when the sequence numbers of the probe request frames are sequentially used by the Wi-Fi devices (e.g. sequence numbers: 256, 257, 258, etc.). However, many Wi-Fi devices are currently sending random sequence numbers (e.g. sequence numbers: 256, 122, 3, etc.), and for these Wi-Fi devices the disclosed cluster analysis would not properly work. Patent document WO2023241815-A1 solves this problem, by unambiguously identifying Wi-Fi devices that are not connected to a Wi-Fi network and employ MAC address randomization, even when they use random sequence numbers in their probe request frames. The invention disclosed therein generates a fingerprint of each Wi-Fi device and performs, for each fingerprint, an N-dimensional cluster analysis on the averaged RSSI measurements received by each access point, obtaining at least one cluster per fingerprint and identifying a Wi-Fi device for each different cluster.
[0013] However, when facing high-density environments with tens or hundreds of thousands of users in an area covered by multiple access points, the N-dimensional cluster analysis of patent document WO2023241815-A1 is not prepared to perform a count of user devices for the following reasons: a) It requires that several access points receive each probe request simultaneously. It is designed to work in small areas (offices, classrooms, coffee shops, theaters, supermarkets, etc.) wherein the access points are very close to each other and multiple access points receive each probe request (these access points are used to separate the RSSI measurements of the probe requests by clustering in the signal space). However, in large events (an entire town, Holy Week processions, fairs, etc.) hundreds or even thousands of access points would be required to separate the user devices by clustering, which is not economically viable. b) It works well with low density of user devices (tens, hundreds), but when there are many more people together (thousands of people simultaneously next to an access point), it cannot distinguish the different clusters in the signal space, since many people could appear with very similar signals. c) It counts devices in a very short term (1 or 2-minute time windows), but it does not work for longer time windows (of hours, days or weeks), since it is not prepared to take into account whether a fingerprint that disappears and is later seen again is the same user device or corresponds to a different user device. It works for instantaneous counts but not for long-term counts.
[0014] Therefore, the invention disclosed in WO2023241815-A1 performs instant counts in small areas with high density of access points (usually rooms, interiors or buildings) and low density of people. The present invention solves these problems, providing robust and reliable device count in large-scale events and overcrowded spaces.
[0015] SUMMARY
[0016] The present invention refers to a system and a computer-implemented method for detecting non-connected Wi-Fi devices with random MAC address in high-density environments that solves the aforementioned problems.
[0017] Unlike the prior art, the present invention is designed to cover large areas (entire cities, towns, fairgrounds, etc.), where user devices may disappear and be later detected as the users move. The access points are preferably arranged far from each other so that they do not receive the same probe requests simultaneously and with a small number of access points a lot of ground can be covered. In addition, the invention is also designed to work in large and crowded events where there are a lot of people together. Moreover, the invention can perform device counts on wide time windows (hours, days).
[0018] The invention addresses the analysis of non-connected Wi-Fi devices, mainly smartphones, adding a series of post-processing steps to provide more robust and reliable device count in large-scale events, where fingerprinting alone falls short. Specifically, it relies on post-processing of analytical data from non-connected WiFi devices (probe request signals) to improve the calculated device count in high density environments (tens or hundreds of thousands of users) or over extended time windows (hours or days). The invention aims to solve the masking problem produced in large crowds, wherein several individuals are likely to own identical devices (brand, model, OS version, installed applications, etc.), which means that a single fingerprint can mask several devices belonging to different people. The post-processing involves statistical analysis based on reasonable assumptions about anomalies in probe request cadences, movement patterns of individuals near access points, the rarity or exclusivity of their smartphones, or the length of the observation window.
[0019] The computer-implemented method comprises the following steps performed by a detection unit:
[0020] Receiving a probe request dataset including a timestamp and a fingerprint of each probe request sent by Wi-Fi devices with random MAC address and captured by a plurality of access points during an interrogation time. The plurality of access points may be arranged, for instance, at an installation (e.g. a shopping centre, a school, a beach).
[0021] - Computing, for each fingerprint and access point, a number of probe requests captured during a time window of the interrogation time, thereby obtaining a cadence of probe requests.
[0022] - Obtaining, from the probe request dataset and for each fingerprint, an average cadence of probe requests, the average cadence being an average number of probe requests per time period. The average cadence of probe requests may be obtained from the probe requests captured during the interrogation time.
[0023] - Computing, for each fingerprint and access point, a ratio between the corresponding cadence of probe requests and the corresponding average cadence of probe requests, thereby obtaining a cadence ratio.
[0024] Determining, for each fingerprint, an access point with the highest cadence ratio.
[0025] Determining, for each fingerprint, a first number of Wi-Fi devices, the first number of Wi-Fi devices being the highest cadence ratio.
[0026] Computing, for each fingerprint, a second number of Wi-Fi devices, the second number of Wi-Fi devices being a dragging coefficient multiplied by a sum of the cadence ratio of each access point different from the access point with the highest cadence ratio.
[0027] - Obtaining, for each fingerprint, an aggregate number of Wi-Fi devices associated to the fingerprint as a sum of the first number and second number of Wi-Fi devices.
[0028] Determining a number of non-connected Wi-Fi devices with random MAC address during the time window as a sum of the aggregate number of Wi-Fi devices of each fingerprint.
[0029] The method may further comprise receiving, by the plurality of access points, probe requests sent by non-connected Wi-Fi devices with random MAC address; generating a fingerprint associated to each probe request; and sending probe request data including the generated fingerprint and a timestamp associated to each probe request to the detection unit. The probe request data may include additional information, such as the MAC address and an RSSI measurement of the probe request and an identifier of the access point.
[0030] In an embodiment, the dragging coefficient is a function of the duration of the time window, preferably decreasing with the duration of the time window. In an embodiment, --) the dragging coefficient is d = eV T being the time window and a being a positive real number.
[0031] In an embodiment, the method further comprises retrieving, from a database, a fingerprint rarity coefficient associated to each fingerprint in the time window, the fingerprint rarity coefficient being representative of a degree of uniqueness of the fingerprint, wherein the dragging coefficient is a function of the fingerprint rarity coefficient. The dragging coefficient preferably decreases with the degree of uniqueness
[0032] / V \ of the fingerprint. In an embodiment, the dragging coefficient is dj = ea pi), T being the
[0033] 10 time window, a being a positive real number, and p, being defined as p, =Rj being the fingerprint rarity coefficient.
[0034] The dragging coefficient is preferably comprised within the range [0,1]. In an embodiment, the dragging coefficient is 1 .
[0035] Another aspect of the present invention refers to a system for detecting non-connected Wi-Fi devices with random MAC address in high-density environments. The system comprises a detection unit with a memory and a processing unit configured to perform the steps of the method. The system may further comprise the plurality of access points, each access point being configured to receive probe requests sent by non-connected Wi-Fi devices with random MAC address; generate a fingerprint associated to each probe request; and send the generated fingerprint and a timestamp associated to each probe request to the detection unit.
[0036] A further aspect of the present invention refers to a non-transitory computer-readable medium for detecting non-connected Wi-Fi devices with random MAC address in high- density environments, comprising executable programming instructions stored thereon that, when executed by a processor, cause the processor to carry out the steps of the method.
[0037] The present invention allows long-term device counts (although it can also perform instantaneous counts) and is prepared to work with low density of access points where the access points are remote to each other (such as towns, cities, fairs, beaches) and with high density of people (although it also works when there are few people).
[0038] BRIEF DESCRIPTION OF THE DRAWINGS
[0039] A series of drawings which aid in better understanding the invention and which are expressly related with an embodiment of the said invention, presented as a non-limiting example thereof, are very briefly described below.
[0040] Figure 1 represents an embodiment of a system for detecting non-connected Wi-Fi devices with random MAC address in high-density environments.
[0041] Figure 2 depicts another embodiment of the system, including only the detection unit.
[0042] Figure 3 represents a flow diagram of a computer-implemented method of detecting nonconnected Wi-Fi devices with random MAC address in high-density environments.
[0043] Figure 4 shows the internal data processing performed by the processing unit according to an embodiment.
[0044] Figure 5 is a table showing the (partial) content of an exemplary probe request dataset.
[0045] Figure 6 is the table of Figure 5 additonally including the cadences obtained for the captured probe requests.
[0046] Figure 7 is a table including the cadence ratio of each fingerprint (rows) and access point (columns).
[0047] Figure 8 is the table of Figure 7 with additional information.
[0048] Figure 9 is a table similar to that of Figure 8, obtained for a longer time window. Figure 10 shows the relationship between the width of the analysis window and the dragging coefficient for different values of the parameter a.
[0049] Figure 11 is a table showing the total contributions, for each fingerprint, in the number of Wi-Fi devices, when considering the dragging coefficient.
[0050] Figure 12 is a table similar to that of Figure 1 1 , obtained for a longer time window.
[0051] Figure 13 is a table similar to that of Figure 1 1 , obtained for a higher value of the parameter a.
[0052] Figure 14 shows an exemplary database containing fingerprint rarity coefficients for each fingerprint.
[0053] Figure 15 shows a table similar to that of Figure 9, with an added column including the fingerprint rarity coefficient of each fingerprint.
[0054] Figure 16 is a graph showing the dependence of the rarity weight with respect the fingerprint rarity coefficient.
[0055] Figure 17 shows the relationship between the width of the analysis window, the rarity weight and the dragging coefficient, for different values of the parameter a.
[0056] Figure 18 is a table showing the contributions that are affected by the values of the rarity weight and the dragging coefficient.
[0057] DETAILED DESCRIPTION
[0058] Figure 1 depicts an exemplary embodiment of a system 1 for detecting non-connected Wi-Fi devices 2 with random MAC address in high-density environments with hundreds or thousands of user Wi-Fi devices 2, such as smartphones or laptops, a few of which are shown in the figure for illustrative purpose. The Wi-Fi devices 2 to be detected are not yet associated or connected to an access point, and are periodically (e.g., every few seconds) sending probe requests 3 using a random MAC address instead of the real MAC address of the device. The access points are preferably separated by a distance such that they never simultaneously receive the same probe request 3 (their coverage areas 12 do not overlap).
[0059] In this embodiment, the system 1 comprises a detection unit 4 and a plurality of access points 5 (APi, AP2,..., APN) which receive the probe requests 3 of the Wi-Fi devices 2 with random MAC address that are within range. The access points 5 may also capture probe requests 3 from other Wi-Fi devices transmiting their real MAC address, but the unambiguous detection of these devices is straightforward and will not be considered in this embodiment. The access points 5 are configured to generate a fingerprint associated to each acquired probe request 3 and send probe request data 6, including the generated fingerprint and a timestamp associated to each acquired probe request 3, to the detection unit 4. The probe request data 6 is preferably sent periodically (e.g. each minute, hour or day), although it may be sent continuously as soon as a predetermined number of probe requests 3 (e.g., 1 or 1000) are processed.
[0060] The detection unit 4 (e.g. a computer or a server) includes a memory 7, which stores on a probe request database 8 the probe request data 6 received from the different access points 5, and a processing unit 9 (e.g. a processor) configured to obtain, by accessing the probe request database 8, a probe request dataset 10 including probe request data 6 having a timestamp comprised within an interrogation time. For instance, the probe request database 8 may include probe request data 6 acquired during several days, whereas the probe request dataset 10 may include probe request data 6 with a timestamp corresponding to a specific date and / or hour (i.e. the interrogation time). The interrogation time may be such that the query includes the whole content of the probe request database 8. The processing unit 9 is configured to determine, by analysing the probe request dataset 10, a number 1 1 of Wi-Fi devices 2 which are not connected to any access point and have random MAC address during a particular time window T of the interrogation time, the time window T being the span of time during which the count of Wi-Fi devices is performed.
[0061] Figure 2 depicts another embodiment of the system 1 , which in this case only comprises the detection unit 4. The probe request dataset 8, which is stored on a storing device 13 external to the system 1 , such as a hard drive, includes P instances corresponding to timestamps 14 {to, ti ,... , tp}, ordered by increasing date and time, each instance further including a fingerprint 15 associated to the corresponding probe request 3, an access point identifier 16 (APJd, which identifies the access point 5 that captured the corresponding probe request 3) and, optionally, additional information (depicted with dashed lines) such as a MAC address 17 and an RSSI measurement 18 (a signal strength indicator) of the corresponding probe request 3. The processing unit 9 retrieves from the probe request dataset 10 the probe request data 6 having a timestamp 14 comprised within an interrogation time At.
[0062] The retrieved probe request data 6 forms the probe request dataset 10 that will be analysed by the processing unit 9, following the process depicted in Figure 3, to determine the number 1 1 of Wi-Fi devices 2, which are non-connected and with random MAC address, during a specific time window T of the interrogation time At, wherein T< At. Figure 3 is a flow diagram of a computer-implemented method of detecting nonconnected Wi-Fi devices with random MAC address in high-density environments according to an embodiment. The method 100 comprises the following steps:
[0063] - Receiving 102 a probe request dataset 10 including a timestamp 14 and a fingerprint 15 of each probe request 3 sent by Wi-Fi devices 2 with random MAC address and captured by a plurality of access points 5 during an interrogation time At.
[0064] - Computing 104, for each fingerprint 15 and access point 5, a number of probe requests captured during a time window T of the interrogation time At, thereby obtaining a cadence 105 of probe requests.
[0065] - Obtaining 106, from the probe request dataset 10 and for each fingerprint 15, an average cadence 107 of probe requests, the average cadence 107 being an average number of probe requests per time period.
[0066] - Computing 108, for each fingerprint 15 and access point 5, a ratio between the corresponding cadence 105 of probe requests and the corresponding average cadence 107 of probe requests, thereby obtaining a cadence ratio 109.
[0067] - Determining 110, for each fingerprint 15, an access point 5 with the highest cadence ratio 109.
[0068] - Determining 112, for each fingerprint 15, a first number 1 13 of Wi-Fi devices, the first number 113 of Wi-Fi devices being the highest cadence ratio 109.
[0069] - Computing 1 14, for each fingerprint 15, a second number 1 15 of Wi-Fi devices, the second number 115 of Wi-Fi devices being a dragging coefficient d multiplied by a sum of the cadence ratio 109 of each access point different from the access point mj with the highest cadence ratio 109.
[0070] - Obtaining 116, for each fingerprint 15, an aggregate number 117 of Wi-Fi devices associated to the fingerprint 15 as a sum of the first number 113 and second number 115 of Wi-Fi devices.
[0071] - Determining 118 a number 11 of non-connected Wi-Fi devices 2 with random MAC address during the time window T as a sum of the aggregate number 1 17 of Wi-Fi devices of each fingerprint 15.
[0072] Figure 4 depicts a block diagram of the probe request data processing performed by the processing unit 9, according to an embodiment, that unmasks how many Wi-Fi devices 2 are hiding under the same fingerprint 15 in high user density Wi-Fi environments. The processing unit 9 receives the probe request dataset 10 including the probe request data 6 transmitted by the access points 5. The probe requests coming from the 5 GHz band are kept, the probe requests from the 2.4 GHz band are discarded (or even not acquired) to avoid duplications in the device counts. The 5 GHz band is preferred over the 2.4 GHz band since the former contains more parameters (information elements) to generate the fingerprints.
[0073] During a preprocessing 20 of the received data, the processing unit 9 applies burst aggregation 21 , aggregating the bursts of probe requests (similar probe requests that only vary in RSSI and arrive within a short time, e.g., in less than 5 seconds) into one. That is, if it receives three of the same probe requests almost simultaneously, they are combined into one (with an average RSSI, and keeping one of the probe timestamps or using an average timestamp). Optionally, thresholding 22 is applied to the probe requests, such that those having an RSSI measurement 18 lower than a desired threshold (for example, -95 dBm) are filtered. This way, very far probe requests are removed (they are not located in the vicinity of the access point that captured the probe request).
[0074] An exemplary table showing 59 instances of probe requests contained in a probe request dataset 10 is shown in Figure 5 (the probe request dataset 10 includes many more probe requests, only the probe request corresponding to a few seconds -timestamp from 14:11 :14 to14:11 :23- are represented for illustrative purpose), wherein each row corresponds to a different probe request and the columns include the following fields: “id” is the index of the probe request in the table; “datetime” contains the date and time (up to millisecond resolution) of capture; “fingerprint” is the identifier built with the characteristics received in the probe request; “mac_probe” is the MAC address of the Wi-Fi device 2 that sent the probe request; “mac_type” determines whether it is a random or real MAC; “apjd” is the identifier of the access point 5 that acquired the probe request; “rssi” is the received signal power which represents how close the Wi-Fi device 2 is with regard to the access point 5.
[0075] Assuming that in an embodiment the probe request dataset 10 may contain probe requests 3 from Wi-Fi devices 2 with random MAC address (i.e. showing a false MAC address in the probe requests) and Wi-Fi devices without random MAC address (i.e. showing always the real MAC address of the device), during a basic processing 23 of the preprocessed data, the probe requests are split 24 into two dataframes or tables:
[0076] - A first table for those probe requests belonging to Wi-Fi devices 2 that randomize their MAC (random MAC 25). They are easily identified because the second character of their MAC address field is one of these four: 2 / 6 / A / E. These devices are usually modern smartphones or tablets, and their identifier will be the fingerprint, instead of the MAC address. Their MAC addresses are irrelevant, as they are fake, random (“mac_type” = random).
[0077] - A second table for those probe requests belonging to Wi-Fi devices that do not randomize their MAC address (real MAC 26). That is, they do not have 2 / 6 / A / E in the second character of the MAC address. These devices are usually old or very low-end smartphones, or fixed devices (PCs, cameras, loT sensors and actuators, etc). Their identifier is the MAC address. Its fingerprint is irrelevant, because the MAC address is reliable (“mac_type” = real). There are notably fewer probe requests with real MAC than probe requests with random MAC. In the example, only 6 out of the total 59 probe requests display the real MAC.
[0078] A basic estimation may be performed during the basic processing 23 to provide a count of the unique devices in both dataframes (first table and second table).
[0079] First, the unique fingerprints of the probe requests within the first table (random MAC) are counted. However, we are not sure that each different fingerprint corresponds to a different Wi-Fi device. In the example, 23 different fingerprints were counted. The firgerprints may also be counted by access point, the result being:
[0080] - “ap_id”=8: 2 unique fingerprints.
[0081] - “ap_id”= 18: 3 unique fingerprints.
[0082] - “ap_id”=25: 8 unique fingerprints.
[0083] - “ap_id”=26: 9 unique fingerprints.
[0084] - “ap_id”=28: 13 unique fingerprints.
[0085] Secondly, the unique MAC addresses of the probe requests within the second table (real MAC) are counted. In this case, we do have the guarantee that each different MAC address corresponds to a different Wi-Fi device. Therefore, this count will not need to be post-processed. In the example, 6 unique MACs were counted. If the count is performed by access point, the result would be:
[0086] - “ap_id”=8: 0 unique MACs.
[0087] - “ap_id”=18: 2 unique MACs.
[0088] - “ap_id”=25: 3 unique MACs.
[0089] - “ap_id”=26: 1 unique MACs.
[0090] - “ap_id”=28: 0 unique MACs.
[0091] Normally the count of unique fingerprints (first table) will be significantly higher than the count of unique MAC address (second table), because as the years go by devices with real MAC are no longer used.
[0092] Both counts may be summed, providing a first (conservative) estimate of the number of devices that have been listened to. In the example, a first conservative estimate would indicate that there are 29 different Wi-Fi devices (23 unique fingerprints + 6 real MAC address) in the vicinity of the access points. However, if we estimate the devices using the count by access points, it would turn out:
[0093] - “ap_id”=8: 2 total unique.
[0094] - “ap_id”=18: 5 total unique.
[0095] - “ap_id”=25: 11 total unique.
[0096] - “ap_id”=26: 10 total unique.
[0097] - “ap_id”=28: 13 total unique. The conservative estimate (29 unique devices) would be below the real number because it assumes that each different fingerprint corresponds to a different device, which is almost true in environments with low density of people, but as the density of people increases (hundreds, thousands, tens of thousands or more) or the number of hours (or days) of the analysis window increases, it is more and more inaccurate.
[0098] The count of the Wi-Fi devices corresponding to the unique fingerprints should be improved because the same fingerprint can (and usually does) mask several devices from different people but with the same characteristics (in some cases, more than 600 features are used) that make up their fingerprint. When analysing very short-time windows (in the order of seconds), and assuming that the access points are far enough with regard to each other, it is impossible for the same fingerprint corresponding to the same Wi-Fi device be seen in two different access points, because the user of the Wi-Fi device has not had enough time to move to an adjacent access point.
[0099] At the other extreme, a bold estimate would be to add up the count of each of the access points, since they are far apart and the analysis window is very small (less than 10 seconds). In this case, it would result in more Wi-Fi devices (41 unique devices) than the initial conservative estimate.
[0100] The actual number of devices, without additional post-processing, will be between the conservative estimate (underestimation) and the bold estimate (overestimation). That is, between 29 and 41 . In this particular case, being a short time window and access points far from each other, it is very likely that the real value is closer to the bold estimate because if two equal fingerprints in different access points are seen almost simultaneously, they will correspond to different devices. However, this assumption will not be valid in high-density device environments or when analysing long-time windows.
[0101] Fingerprints 15 are normally built using more than 600 characteristics of the Wi-Fi device that sent the probe request (maximum power, supported transmission rates, number of antennas, etc.). These characteristics change depending on the brand and model of the device, the operating system version and even the apps installed. With smartphones it is possible to have very common models that are simultaneously in the same environment. If two people have the same features on their smartphones, they will exhibit the same fingerprint and there will be a “collision”. These collisions are unlikely when there is no high density of people or the analysis time window is not large, but when there are thousands of people over several days, the probability of a collision is no longer negligible.
[0102] The present invention proposes an advanced post-processing 27 to improve the count estimate of Wi-Fi devices. The advanced post-processing 27 will only process probe requests with random MAC 25, as they are those that may be masking several users with similar devices under the same fingerprint.
[0103] In the advanced post-processing 27, the processing unit 9 analyses how anomalous is the rate of probe requests for a given fingerprint (cadence anomalies analysis 28). For instance, if a single smartphone with a certain fingerprint emits 10 probe requests per minute, and suddenly in a 1 -minute window we receive 80 probe requests from that fingerprint, there are likely more than one of them (maybe 8 of the same smartphones). That fingerprint is said to "contribute" 8 at that time. The post-processing includes:
[0104] - Calculating the cadence 105, i.e. the number of probe requests per time period, during the time window T (e.g., 1 min, 2 min, 5 min, etc), for each fingerprint in the vicinity of each access point. For example, a certain fingerprint #J45 is showing in 42 different probe requests between 22:00 and 22:05 next to access point AP2; the cadence 105 of probe requests for that fingerprint (#J45) and access point (AP2) is 8.4 probe requests / minute.
[0105] - Obtaining a baseline of the usual cadence of each fingerprint (average cadence 107). This reference is inherent to the fingerprint, and therefore it is calculated by using all cadences calculated for that fingerprint, regardless of the access point. The average cadence 107 is calculated and updated as long as the system is working; e.g., if the system has been working for months, it will already be quite stable and will hardly change. The average cadence 107 is a property of each fingerprint that stabilizes over time. It may be stored on a database and acquired by a database query. The average cadence 107 may instead be calculated using the probe request dataset 10. For example, assuming that during each minute from t=1 min to t=10 min the following cadences have been observed for a particular fingerprint:
[0106] That is, during the first minute (from t=0 min to t=1 min) 5 probe requests have been acquired, during the second minute (from t=1 min to t=2 min) 4 probe requests have been acquired, and so on. For the calculation of the average cadence 107, those cadences that are 0 (at t=3, t=4, t=8, t=9, t=10) will not be taken into account. In this example the cadence average 107 would be (5+4+10+4+7) / 5 samples = 6 probe requests / sample (those samples in which the fingerprint has not be received, i.e. cadence = 0, do not contribute to calculating the average cadence 107 since over time they would drop it to almost zero). That is, it is normal for that fingerprint to emit about 6 probe req uests / m inute (since the duration of the sample is in this case 1 minute) when there is only one device.
[0107] For example, fingerprint #J45 has as an average cadence of 6 probe requests every 5 minutes (i.e. 1 .2 probe requests / minute) during the interrogation time At. This would be the normal cadence for that fingerprint #J45 when there is a single device in the vicinity of any access point.
[0108] - Calculating the anomalies as the increase of the cadence with respect to that baseline (cadence ratio 109). For example: fingerprint #J45 is showing a cadence of 8.4 probe requests / minute in the time window T (between 22:00 and 22:05) next to access point AP2, whereas the average cadence of that fingerprint #J45 is 1.2 probe requests / minute. The cadence ratio is 8.4 / 1 .2=7 times higher than expected for a single device. A first estimation would tell us that this fingerprint contributes with 7 users in that period of time T with the access point AP2. This cadence ratio 109 can have decimals. In many access points the cadences 105 of a given fingerprint are normally going to zero and therefore the cadence ratios 109 for these access points will also be zero (the access point did not see any activity from that fingerprint in the time window T). However, if at any time it is lower than 1 , but greater than 0, it is forced to be 1 .
[0109] If the maximum rate increase observed for a particular fingerprint next to a particular access point is, for instance, x7, we can estimate that at any given time there have been at most 7 people with exactly the same smartphone next to that access point, even if some of them have spent the whole day, and others just a few minutes passing by. At some instant, 7 devices coincidentally exhibited the same fingerprint next to that access point.
[0110] Let Qij be the cadence 105 of the j-th fingerprint 15 in the i-th access point 5 during the time window T. Let B}be the average cadence 107 (or reference cadence or baseline) of the j-th fingerpint 15 over time. The cadence ratio 1097- is defined as:
[0111] The cadence ratio 109 acts as an estimator of the different devices that are masked under the j-th fingerprint 15 in the i-th access point 5 during the time window T.
[0112] In addition, the same fingerprint could be present (with different cadences) simultaneously in different access points. For example, between 23:00 and 23:05 the fingerprint #J45 brings 7 devices next to access point AP3and 4 devices next to access point AP8(i.e., in that time span that fingerprint masks 11 devices). A basic processing would have estimated a single device, instead of the 11 that were masked under the same fingerprint and obtained with the advanced post-processing (same smartphone features, there were 11 people with the same brand and model of smartphone simultaneously in the control area).
[0113] The cadences 105 obtained for the captured probe requests and the average cadence 107 are shown in table 34 of Figure 6, this table corresponding to the instances of Figure 5 with random MAC and further adding three columns for the cadence 105, average cadence 107 and cadence ratio 109. In this example the time window T for the calculation of the cadence 105 is 2 minutes. It can be seen how the fingerprint 2489DE93A77537AB (highlighted in the figure), which is a fairly common fingerprint, shows a cadence ratio 109 of 1 for access points (apjd) AP26, AP25 and AP8, and simultaneously a cadence ratio 109 of 4.98 next to access point AP28. i.e., that fingerprint will have different contributions to the total sum in each access point, the contributions being defined by the respective cadence ratio 109. These counts indicate that there is only one device next to each of access points AP26, AP25 and AP8, and five devices next to access point AP28. A total of eight devices are masked under the same fingerprint during time window T. By adding up the contributions of each fingerprint seen in the vicinity of each access point, the estimated number of Wi-Fi devices next to each access point is obtained. As this is an instantaneous count (i.e., if we see the same fingerprint next to two different access points, they will be two different devices), to get the number of Wi-Fi devices 11 with random MAC, the contribution (cadence ratio 109) of each access point is added, obtaining a number of 74.78 devices (which is rounded up to 75 devices, although rounding down may also be applied):
[0114] - “ap_id”=8: 2 devices.
[0115] - “ap_id”= 18: 5.50 devices.
[0116] - “ap_id”=25: 12.10 devices.
[0117] - “ap_id”=26: 10.49 devices.
[0118] - “ap_id”=28: 45.69 devices.
[0119] The number of Wi-Fi devices 1 1 obtained using the cadence ratio of the advanced postprocessing 27 will be higher or equal than the number obtained with a basic processing, since each fingerprint contributes with one or more devices to the total count. Of all the access points, there is one (access point AP28) that was significantly underestimated, as the same fingerprint was masking several devices next to it: instead of 13 devices that would be obtained with a basic processing, 45.69 devices (rounded up to 46) are detected using the cadence ratio of the advanced post-processing 27. Finally, a sum 31 of the Wi-Fi devices with random MAC 25 and the Wi-Fi devices with real MAC 26 is carried out, obtaining a total number of Wi-Fi devices 32 (including real MAC and random MAC devices).
[0120] The count using advanced post-processing is able to obtain a more precise number of Wi-Fi devices when it takes into account increases in fingerprint cadence with respect to the average cadence of a single device.
[0121] The shorter the time window T, the more precise is the count obtained using only cadence anomalies analysis 28. The analysis window being small (e.g., less than 10 seconds) allows applying the following assumption: if the processing unit observes the same fingerprint simultaneously on several access points, they must necessarily be different devices, because there has not been enough time for them to move between adjacent access points. However, this will not be necessarily true as the time window T of analysis is increased. For instance, when analysing a 24-hour time windows, the same fingerprint may correspond to a single device moving between access points or to different devices.
[0122] Back to Figure 4, an additional data processing may be optionally applied in the advanced post-processing 27 to obtain a more precise count estimation in longer time windows T (minutes, hours or days). This analysis (time window analysis 29) will take into account the duration of the time window T. A problem involved when considering longer time windows T is that if a same fingerprint is seen on two different access points at different times, said fingerprint may correspond to either a single device that has moved between the access points or two different devices.
[0123] In this case, the contributions of each fingerprint over time (the time window) and space (distance between access points) can be modulated, so that some contributions will be greater than others. The following features can thus be considered:
[0124] • The width of the time window T. If the same fingerprint is seen on three different access point in a short span of time (e.g. the same minute), it is very likely to be three different devices. If they are seen over the course of several hours, it may be the same device who has moved or different devices (or a combination of both cases).
[0125] • In addition, the distance between access points has also an influence. For instance, if the access points are 1 kilometer apart and a fingerprint has left access point APi and is detected 5 minutes later on access point AP2, it is unlikely that it has had time to get from one place to another, it is more likely that the device has left APi and another person with the same device has appeared by chance next to AP2.
[0126] The analysis is performed on a fingerprint basis, one by one, determining the contribution of each fingerprint to the total count. The time window T in which the count of Wi-Fi devices is carried out is preferably expressed in minutes. In the table 35 of Figure 7 the analysis is performed by fingerprint and by access point, to determine the access point having the maximum contribution. In this example, almost all the fingerprints have reached their maximum contribution in access point AP28.
[0127] The access point showing the maximum contribution (that is, the maximum number of devices under the same fingerprint that each access point listened to, being defined by the cadence ratio 109) in each case has been underlined in the table 35. For each fingerprint and in the entire time window T, the time window analysis 29 includes: 1 . Calculate the access point that has the highest contribution during the whole time window T. In that access point, at some point is where that fingerprint contributed with the most users. Under the conservative assumption that all users seen have moved between access points (no new users have entered), this maximum contribution would be the definitive count for that fingerprint. In the table 36 of Figure 8, this is the “MAX” column.
[0128] 2. Calculate the sum of the maximum contributions of the other access points during the entire time window T, shown in the “SUM” column of table 36. During the time window T, many fingerprints have only been seen next to a single access point (SUM column = 0). This will no longer occur, in general, as the time window T increases.
[0129] 3. Under the bold assumption that no user has moved between access points (static users, users who were close to a certain access point but never close to another access point), the sum of all the contributions (MAX + SUM columns) would be the final count for that fingerprint.
[0130] In the example of Figure 8, a conservative estimate would give 76 devices in total (the sum of the MAX column), while a bolder estimate would give 83 devices in total (the sum of SUM and MAX columns). Since this analysis only looks at a 3-second window, both are very coincidental (SUM column contributes practically nothing and there are few repeated fingerprints in different access points).
[0131] The table 37 of Figure 9 is an example with a longer time window T (over 1 minute instead of 3 seconds). The table 37 grows both in the number of access points (columns) observing probe requests and in the number of fingerprints (rows) being detected. A conservative estimate would count 253 devices in total (the aggregation of the MAX column), while a bolder estimation would count 495 devices in total (the aggregation of the MAX + SUM columns). In general, the MAX column will not increase much as the duration of the time window T grows, but the SUM column may highly grow since as time goes by users will move between access points or new users will appear and contribute to the total sum. Therefore, an intermediate solution between these two estimates should be reached:
[0132] A conservative estimate assumes that if a fingerprint is seen on several access points, it is the same device that has moved. It will have more consistency as the duration of the time window T increases, since the users will have more time for them to move between access points.
[0133] • A bold estimate assumes that if a fingerprint is seen on several access points, they are different devices. It will have more consistency as the duration of the time window T decreases, so that users are not given time for them to shift between access points.
[0134] The advanced post-processing combines the MAX and SUM columns to fit the T window, using the following parameters: max ij is the maximum contribution of the j-th fingerprint in the i-th access point, that is, the maximum cadence ratio 109 for that fingerprint and access point. m.j = arg max{7} is the access point that has shown the maximum contribution i of the j-th fingerprint.
[0135] - a is an aggressiveness factor which measures how aggressive the hypothesis that the same fingerprint corresponds to the same or different users moving between access points is. It is a real positive number that usually adopt a value between 1 and 100.
[0136] - d is a dragging coefficient that represents how the access points other than m7(i.e. the access points that do not show the highest cadence ratio for a given fingerprint) contribute to the total count, d adopts a value between 0 and 1 . It is a common property of the set of access points and depends on the time window T of analysis. In an embodiment, d is defined as:
[0137] Figure 10 shows how the dragging coefficient d decreases with respect to the duration of the time window T (expressed in minutes, from 1 minute to 1 day) for different values Of a = 2, 10, 30, 100.
[0138] Therefore, the total contribution ) of the j-th fingerprint, taking into account the possible movement of the devices through several access points, would be:
[0139] In this formula7is the cadence ratio 109 for the i-th access point and the j-th fingerpint, Cmjbeing the cadence ratio 109 of access point m7defined before (the access point with the highest cadence ratio 109 for the j-th fingerprint). The first term Cmjis therefore the highest cadence ratio of fingerprint j and corresponds to the first number 1 13 of Wi-Fi devices in Figure 3 and the column MAX in Figure 9, whereas the second term d ■ ZN
[0140] ,=1Cij corresponds to the second number 1 15 of Wi-Fi devices in Figure 3 and the column SUM in Figure 9, N being the total number of access points and i =# m7being the N-1 access points different from the access point m7. The total contribution Kj is the aggregate number 1 17 of Wi-Fi devices in Figure 3.
[0141] This formula includes both conservative and bold estimation:
[0142] • If d = 0: Kj = CmjThat is, MAX column in Figure 9. This is the conservative estimation (underestimation), in which it is assumed that if a fingerprint is seen in several access points, it is the same device that has moved between the access points. ZN
[0143] Cjj: That is, MAX + SUM columns in Figure 9. This is the bold estimation (overestimation), in which it is assumed that if a fingerprint is seen in several access points, they are always different devices. This assumption works well for short time windows T.
[0144] For short time windows T (low values of T), the dragging coefficient d is likely to be high (bold estimate). That is, the best dragging coefficient d that explains those values of MAX and SUM is a high d (because there has been simultaneity of devices under the same fingerprint). The longer the time window T is (i.e. the higher the value of T), the more likely it is that there is not so much simultaneity (there have been movements of the users). That is, the dragging coefficient d that best explains the values of MAX and SUM is a low d (conservative estimate). An example of total contributions Kj for each fingerprint / ' is shown in table 38 of Figure 11 , for T = 5 minutes and an aggressiveness a = 10. In this case the fingerprints are ordered by contribution A) from highest to lowest. The fingerprint contributing the most to the total count is 2489DE93A77537AB (highlighted in the figure). For this fingerprint, a maximum of almost 5 users were observed in access point AP28 while in the other access points there were almost 15 users exhibiting that same fingerprint. In this case d = 0.8, which results in a contribution of A = 16.9 for this fingerprint. That is, it is estimated that in those 5 minutes there were approximately 17 users in the control area within the range of all the access points. All M fingerprints have contributed to the total sum K:
[0145] The total sum K corresponds to the number 11 of Wi-Fi of Figure 3 and amounts in this example to 446.9 unique users (the table shown in Figure 1 1 is incomplete, there are many other fingerprints with Kj = 1).
[0146] The table 39 in Figure 12 shows what happens as the time window T is increased (T= 60 minutes), keeping the aggressiveness in a = 10. In this case, d = 0.46, showing a compromise between conservative and bold estimation, since some of the fingerprints seen in various access points will correspond to users who have moved, and some will not.
[0147] The table 40 in Figure 13 shows what happens if the estimation is much more aggressive (a = 100), keeping the original T = 5 minutes. In this case it is practically considered that there has been no mobility of the users (d = 0.98) and therefore if a same fingerprint has been seen in two different places, they must be different users masked under the same characteristics (Kj is almost the same as MAX + SUM). This example can be repeated with a longer time window (T = 60 minutes) to verify that with high values of a (a = 100), the counts becomes quite immunized with respect to the increase of the time window, showing d > 0.9 for almost any T.
[0148] Back to Figure 4, the count estimation can optionally be further improved by applying rarity analysis 30, taking into account that some fingerprints are rare and, when seen in several access points, it is most likely that the fingerprint corresponds to a single user that has moved rather than several different users. The rarity of the fingerprint indicates its uniqueness or traceability, since the rarer it is the greater the probability that a fingerprint in several places (not simultaneously) corresponds to a single device moving. Fingerprints 15 are generated using a set of features of the Wi-Fi device. Those devices sharing the same set of features used to generate the fingerprint will have the same fingerprint. Therefore, depending on the particular set of features, there will be common fingerprints and other fingerprints that are more rarely found.
[0149] Including in a database a rarity (or "uniqueness") coefficient of each fingerprint can be used to modulate the rarity effect on the count estimate. For example, a mobile with very unique characteristics will have a very rare fingerprint, and that makes it very traceable: if that fingerprint is seen in two distant locations over a considerable period of time, it is likely to be the same device that has moved, as it would be rare for two people to be in the premises with such a rare mobile (this would be unreliable if it were a more common fingerprint).
[0150] Figure 14 shows a part of a fingerprint rarity coefficients database 41 . In an embodiment, the value of the fingerprint rarity coefficients Rj can vary between 0 (very common fingerprint) and 10000 (extremely rare). In a real environment with a high density of devices, most fingerprints will be common (e.g., Rj < 5000), which makes them less traceable. Uncommon fingerprints (e.g., Rj > 8000) provide very valuable information about users' movements through the control area. The fingerprint rarity coefficients Rj is preferably updated periodically (e.g., each month to take into account the newer smartphone models and adapt to the changes in the rarity of each model as time goes by) and adjusted according to the following process:
[0151] For each fingerprint, performing a count of how many times the fingerprint has been seen in the access points 5 (which may include other access points, e.g., different Wi-Fi networks around the world). This value (the number of occurrences of fingerprint / ) is ry.
[0152] Calculating Rj with the following formula:
[0153] R< = 10000
[0154] J That is, the more times the fingerprint has been seen, the lower the fingerprint rarity coefficient Rj will be, the most common fingerprint having the lowest Rj (a value of 0).
[0155] - Storing each fingerprint and its corresponding fingerprint rarity coefficient Rj in a database (fingerprint rarity coefficients database 41 ).
[0156] Figure 15 shows a table 42, corresponding to the table 37 of Figure 9 with an added column, this new column including the fingerprint rarity coefficients Rj for each fingerprint (row) retrieved from the fingerprint rarity coefficients database 41 . The fingerprint rarity coefficients Rj are used to modulate the dragging coefficient d, which in this embodiment depends on each fingerprint
[0157] In an embodiment, a rarity weight pj is calculated as:
[0158] 10
[0159] Pi =In (Rj + 1)
[0160] Figure 16 shows how the rarity weight pj depends on the value of the fingerprint rarity coefficient Rj, according to this formula. Subsequently, the dragging coefficient dj for the j-th fingerprint is calculated as:
[0161] Figure 17 shows how the dragging coefficient dj for the j-th fingerprint decreases with respect to the duration of the time window T (from 1 minute to 1 day), for different values of a.
[0162] Figure 18 depicts a table 43 of contributions including the rarity weight pj and the dragging coefficient d obtained using the duration of the time window T, the aggressiveness factor a and the rarity weight p, (for T = 5 min and a=10). Rarer fingerprints cause more limited drags (d low). That is, more common fingerprints have a larger drag, causing access points that did not reach the maximum to contribute more to the total contribution. This takes into account the fact that more common fingerprints seen at multiple points are more explainable by multiple devices, and that on the other hand rarer fingerprints tend not to mask multiple devices if seen at multiple points. The overall effect is that the final contribution is slightly higher. The total count amounts to 464.69 devices, compared to the 446.9 devices obtained without taking rarity into account.
[0163] The sum 31 obtains a total number of Wi-Fi devices 32 (including real MAC and random MAC devices) in the access point environment during the desired analysis time window T, having unmasked how many devices are hiding under the same fingerprints.
[0164] Instead of using specific formulas to obtain the different parameters (e.g., dragging coefficient d, rarity weight p7), an artificial neural network may be employed. The real MACs (which are perfectly traceable and countable) could be used to train these neural networks. Once trained, these calculations would be performed without the need for an analytical formula.
[0165] To summarize, the present invention for device count estimation of non-connected WiFi devices focuses on overcoming the limitations of fingerprinting in high-density events (tens or hundreds of thousands of people) and long-time windows (hours or days) through statistical post-processing. The invention is aimed to unmask how many devices exhibit the same fingerprint, either simultaneously or over time, and taking into account possible movements between access points, by considering the cadence recorded with respect to what it is considered normal for a given fingerprint, the duration of the time window T of analysis, as well as the rarity or uniqueness of the fingerprints themselves in contributing more or less to the total count.
Claims
CLAIMS1. A computer-implemented method of detecting non-connected Wi-Fi devices with random MAC address in high-density environments, the method (100) comprising: receiving (102), by a detection unit (4), a probe request dataset (10) including a timestamp (14) and a fingerprint (15) of each probe request (3) sent by Wi-Fi devices (2) with random MAC address and captured by a plurality of access points (5) during an interrogation time (At); computing (104), for each fingerprint (15) and access point (5), a number of probe requests captured during a time window (T) of the interrogation time (At), thereby obtaining a cadence (105) of probe requests; obtaining (106), for each fingerprint (15), an average cadence (107) of probe requests, the average cadence (107) being an average number of probe requests per time period; computing (108), for each fingerprint (15) and access point (5), a ratio between the corresponding cadence (105) of probe requests and the corresponding average cadence (107) of probe requests, thereby obtaining a cadence ratio (109); determining (1 10), for each fingerprint (15), an access point with the highest cadence ratio (109); determining (112), for each fingerprint (15), a first number (113) of Wi-Fi devices, the first number (1 13) of Wi-Fi devices being the highest cadence ratio (109); computing (1 14), for each fingerprint (15), a second number (1 15) of Wi-Fi devices, the second number (1 15) of Wi-Fi devices being a dragging coefficient (d) multiplied by a sum of the cadence ratio (109) of each access point different from the access point with the highest cadence ratio (109); obtaining (1 16), for each fingerprint (15), an aggregate number (1 17) of Wi-Fi devices associated to the fingerprint (15) as a sum of the first number (1 13) and second number (1 15) of Wi-Fi devices; and determining (118) a number (11 ) of non-connected Wi-Fi devices (2) with random MAC address during the time window (T) as a sum of the aggregate number (117) of WiFi devices of each fingerprint (15).
2. The computer-implemented method of claim 1 , further comprising: receiving, by the plurality of access points (5), probe requests (3) sent by nonconnected Wi-Fi devices (2) with random MAC address;generating a fingerprint (15) associated to each probe request (3); and sending probe request data (6) including the generated fingerprint (15) and a timestamp (14) associated to each probe request (3) to the detection unit (4).
3. The computer-implemented method of any preceding claim, wherein the dragging coefficient (d) is a function of the duration of the time window (T).
4. The computer-implemented method of claim 3, wherein the dragging coefficient (d) decreases with the duration of the time window (T).
5. The computer-implemented method of any preceding claim, further comprising retrieving, from a database, a fingerprint rarity coefficient (Rj) associated to each fingerprint (15) in the time window (T), the fingerprint rarity coefficient ( / ?,) being representative of a degree of uniqueness of the fingerprint (15); and wherein the dragging coefficient (d) is a function of the fingerprint rarity coefficient (Rj)-6. The computer-implemented method of claim 5, wherein the dragging coefficient (d) decreases with the degree of uniqueness of the fingerprint (15).
7. The computer-implemented method of any preceding claim, wherein the dragging coefficient (d) is comprised within the range [0,1 ].
8. The computer-implemented method of any preceding claim, wherein the average cadence (107) of probe requests is obtained from the probe requests (3) captured during the interrogation time (At).
9. A system for detecting non-connected Wi-Fi devices with random MAC address in high-density environments, the system (1 ) comprising a detection unit (4) with a memory (7) and a processing unit (9) configured to: receive a probe request dataset (10) including a timestamp (14) and a fingerprint (15) of each probe request (3) sent by Wi-Fi devices (2) with random MAC address and captured by a plurality of access points (5) during an interrogation time (At);compute, for each fingerprint (15) and access point (5), a number of probe requests captured during a time window (T) of the interrogation time (At), thereby obtaining a cadence (105) of probe requests; obtain, for each fingerprint (15), an average cadence (107) of probe requests, the average cadence (107) being an average number of probe requests per time period; compute, for each fingerprint (15) and access point (5), a ratio between the corresponding cadence (105) of probe requests and the corresponding average cadence (107) of probe requests, thereby obtaining a cadence ratio (109); determine, for each fingerprint (15), an access point with the highest cadence ratio (109); determine, for each fingerprint (15), a first number (1 13) of Wi-Fi devices, the first number (1 13) of Wi-Fi devices being the highest cadence ratio (109); compute, for each fingerprint (15), a second number (1 15) of Wi-Fi devices, the second number (1 15) of Wi-Fi devices being a dragging coefficient (d) multiplied by a sum of the cadence ratio (109) of each access point different from the access point with the highest cadence ratio (109); obtain, for each fingerprint (15), an aggregate number (1 17) of Wi-Fi devices associated to the fingerprint (15) as a sum of the first number (1 13) and second number (1 15) of Wi-Fi devices; and determine a number (1 1 ) of non-connected Wi-Fi devices (2) with random MAC address during the time window (T) as a sum of the aggregate number (1 17) of Wi-Fi devices of each fingerprint (15).
10. The system of claim 9, further comprising the plurality of access points (5), each access point (5) being configured to: receive probe requests (3) sent by non-connected Wi-Fi devices (2) with random MAC address; generate a fingerprint (15) associated to each probe request (3); and send probe request data (6) including the generated fingerprint (15) and a timestamp (14) associated to each probe request (3) to the detection unit (4).1 1 . The system of any of claims 9 to 10, wherein the dragging coefficient (d) decreases with the duration of the time window (T).
12. The system of any of claims 9 to 1 1 , wherein the processing unit (9) is furtherconfigured to retrieve, from a database, a fingerprint rarity coefficient (R associated to each fingerprint (15) in the time window (T), the fingerprint rarity coefficientbeing representative of a degree of uniqueness of the fingerprint (15); and wherein the dragging coefficient (d) is a function of the fingerprint rarity coefficient (Rj).
13. The system of claim 12, wherein the dragging coefficient (d) decreases with the degree of uniqueness of the fingerprint (15).
14. The system of any of claims 9 to 13, wherein the processing unit (9) is configured to obtain the average cadence (107) of probe requests from the probe requests (3) captured during the interrogation time (At).
15. A non-transitory computer-readable medium for detecting non-connected Wi-Fi devices with random MAC address in high-density environments, comprising executable programming instructions stored thereon that, when executed by a processor, cause the processor to carry out the steps of the method of any of claims 1 to 8.