A method for distinguishing indoor and outdoor state of user based on mobile phone signaling data

By using a method for identifying the indoor and outdoor status of users based on mobile signaling data and employing a random forest model, the high data cost and collection difficulties in existing technologies are resolved. This enables large-scale analysis of the indoor and outdoor status of users, supporting urban public services and planning.

CN116723468BActive Publication Date: 2026-06-26INST OF GEOGRAPHICAL SCI & NATURAL RESOURCE RES CAS

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INST OF GEOGRAPHICAL SCI & NATURAL RESOURCE RES CAS
Filing Date
2023-07-13
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing methods for determining the indoor and outdoor status of users rely on mobile terminal sensor data, resulting in high data costs and the inability to collect data on a large scale, making it difficult to achieve a comprehensive determination of the indoor and outdoor status of users in cities.

Method used

A user indoor/outdoor status discrimination method based on mobile signaling data is adopted, including mobile signaling data preprocessing, potential indoor trajectory point identification, primary connection base station type identification and signaling handover feature extraction, building coverage and height extraction within the service range of macrocell base stations, random forest model training and parameter tuning, and the random forest model is used to discriminate the user indoor/outdoor status.

Benefits of technology

It reduces data acquisition costs, enables large-scale and comprehensive identification of users' indoor and outdoor status, and can analyze the temporal distribution of users' indoor and outdoor status, which is helpful for urban public services and planning.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116723468B_ABST
    Figure CN116723468B_ABST
Patent Text Reader

Abstract

The application discloses a user indoor and outdoor state discrimination method based on mobile phone signaling data, comprising the following steps: mobile phone signaling data preprocessing; potential indoor trajectory point identification; potential indoor trajectory point main connection base station type identification and signaling switching feature extraction; building coverage and height extraction in the service range of a macro cellular base station; random forest model training and parameter optimization for discriminating the indoor and outdoor state of a user at a potential indoor trajectory point; indoor and outdoor state discrimination of the user and time distribution representation. The application can break through the limitations of small application range and difficult data collection of traditional methods, can comprehensively discriminate the indoor and outdoor state of a user on a large scale, can analyze the time distribution of the indoor and outdoor state of the user, and is helpful for reasonably constructing city public services and developing city planning.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to a method for determining indoor and outdoor status, and more particularly to a method for determining the indoor and outdoor status of users based on mobile phone signaling data. Background Technology

[0002] With the acceleration of urbanization, more and more people are living in cities, relying more on urban public services, and demanding higher levels of rational urban planning. Understanding the indoor and outdoor conditions of people, as well as the time distribution of people indoors and outdoors throughout the day, is fundamental to urban public services and urban planning, and is of great significance.

[0003] Currently, most common methods for determining a user's indoor / outdoor status are implemented from the perspective of mobile devices. These methods rely on data collected by the mobile terminal's sensors for analysis. For example, patent document CN110472644A determines a user's indoor / outdoor status based on GPS feature information collected from the mobile terminal; patent document CN107655564A determines a user's indoor / outdoor status based on light intensity, magnetic field strength, and cellular signal strength data collected from the mobile terminal's light detector, geomagnetic detector, and base station signal detector; and patent document CN114091542A determines a user's indoor / outdoor status based on the Wi-Fi signal strength collected from the mobile terminal and the number of visible satellites detected by the GPS module.

[0004] However, the above methods all suffer from high data costs and cannot collect data on a large scale, making it difficult to make a comprehensive assessment of the indoor and outdoor status of users in the city. Summary of the Invention

[0005] To address the shortcomings of the aforementioned technologies, this invention provides a method for determining the indoor and outdoor status of users based on mobile phone signaling data. This method reduces the cost of acquiring the data and solves the current problem of not being able to conduct large-scale and comprehensive analysis of the indoor and outdoor status of users in cities.

[0006] To solve the above technical problems, the technical solution adopted by the present invention is: a method for determining the indoor and outdoor status of users based on mobile phone signaling data, which mainly includes the following processing steps:

[0007] S1, Mobile signaling data preprocessing;

[0008] S2, Potential indoor trajectory point identification;

[0009] S3. Identification of primary connected base station type and extraction of signaling handover features for potential indoor trajectory points;

[0010] S4. Building coverage and height extraction within the service range of macrocell base stations;

[0011] S5. Random forest model training and parameter tuning to determine the indoor and outdoor states of a user when the user is located at a potential indoor trajectory point;

[0012] S6. Determine the user's indoor and outdoor status and represent the time distribution.

[0013] Furthermore, in S1, the specific method for preprocessing mobile signaling data is as follows: erroneous records in the acquired mobile signaling data are removed, including but not limited to missing and duplicate records in the data.

[0014] Furthermore, in S2, the process of identifying potential indoor trajectory points is as follows: noise points are identified based on the speed, distance and number of visits between trajectory points, and the noise points are aggregated with the neighboring non-noise points, retaining the latitude and longitude, cell number and type of the base station originally connected to them; after the noise points are aggregated, trajectory sequences are extracted based on the location of each trajectory point, and potential indoor trajectory points are identified based on the number of trajectory points contained in the trajectory sequence.

[0015] Furthermore, in S3, the method for identifying the primary connected base station type is as follows: based on the connection of macro and micro cells in the potential indoor trajectory point sequence extracted in step S2, the primary connected base station cell type of the potential indoor trajectory points in the sequence is determined, and they are divided into macro cell trajectory points and micro cell trajectory points;

[0016] If one or more points in the trajectory sequence of a potential indoor trajectory point were originally connected to a microcell base station, then the primary connected base station type of that trajectory point is microcell, and that trajectory point is a microcell trajectory point; otherwise, the primary connected base station cell type of that sequence is macrocell, and that trajectory point is a macrocell trajectory point.

[0017] Furthermore, in S3, the signaling handover features extracted from each trajectory point include the handover frequency between macrocell base stations, the handover frequency between macrocell base stations, the handover frequency between microcell base stations, the handover frequency between microcell base stations, and the proportion of microcell base stations.

[0018] Furthermore, the specific methods for extracting each signaling handover feature are as follows:

[0019] (1) Inter-cell handover frequency of macrocell base stations: Calculate the handover frequency between macrocell base stations connected to each trajectory point in the trajectory sequence of potential indoor trajectory points:

[0020]

[0021] Among them, Switch_Marcocell i Trj represents the inter-cell handover frequency of the macrocell base station for the i-th potential indoor trajectory point. II,N(Trj) represents the trajectory sequence containing the i-th potential trajectory point. I S represents the number of trajectory points contained in the trajectory sequence. marcocell (Trj I This indicates the number of changes in the macrocell base station cells connected to adjacent trajectory points in the trajectory sequence;

[0022] (2) Macrocell base station handover frequency: Calculate the handover frequency between the macrocell base station locations connected to each trajectory point in the trajectory sequence containing the potential indoor trajectory point:

[0023]

[0024] Among them, Switch_Marco i Trj represents the inter-macrocell base station handover frequency for the i-th potential indoor trajectory point. I I,N(Trj) represents the trajectory sequence containing the i-th potential trajectory point. I S represents the number of trajectory points contained in the trajectory sequence. marco (Trj I This indicates the number of times the location of the macrocell base station connected to adjacent trajectory points in the trajectory sequence changes;

[0025] (3) Inter-cell handover frequency of microcell base stations: Calculate the handover frequency between microcell base stations connected to each trajectory point in the trajectory sequence of potential indoor trajectory points:

[0026]

[0027] Among them, Switch_Smallcell i Trj represents the inter-cell handover frequency of the microcell base station for the i-th potential indoor trajectory point. I I,N(Trj) represents the trajectory sequence containing the i-th potential trajectory point. I S represents the number of trajectory points contained in the trajectory sequence. smallcell (Trj I This indicates the number of changes in the microcell base station cells connected to adjacent trajectory points in the trajectory sequence;

[0028] (4) Handover frequency between microcell base stations: Calculate the handover frequency between the microcell base stations connected to each trajectory point in the trajectory sequence of potential indoor trajectory points:

[0029]

[0030] Among them, Switch_Small i Trj represents the handover frequency between microcell base stations for the i-th potential indoor trajectory point. II,N(Trj) represents the trajectory sequence containing the i-th potential trajectory point. I S represents the number of trajectory points contained in the trajectory sequence. small (Trj I This indicates the number of times the location of the microcell base station connected to adjacent trajectory points in the trajectory sequence changes;

[0031] (5) Proportion of microcell base stations: Calculate the ratio between the number of trajectory points connected to microcell base stations in the trajectory sequence of potential indoor trajectory points and the total number of trajectory points:

[0032]

[0033] Among them, Switch_Small i Trj represents the proportion of microcell base stations for the i-th potential indoor trajectory point. I I,N(Trj) represents the trajectory sequence containing the i-th potential trajectory point. I N(Trj) represents the number of trajectory points contained in the trajectory sequence. I ) smallcell This indicates the number of trajectory points in the trajectory sequence that connect to microcell base stations.

[0034] Furthermore, in S4, the specific methods for extracting building coverage and height within the service range of macrocell base stations include:

[0035] Based on macrocell base stations, Thiessen polygons are generated, and the range of the Thiessen polygons where each macrocell base station is located is taken as its service range.

[0036] Calculate the building coverage and average building height within the service area of ​​each macrocell base station;

[0037] The calculation results are matched with the macrocell trajectory points in step S3 using the base station cell number, and used as the base station building features of the macrocell trajectory points; the base station building features of the microcell trajectory points are not calculated.

[0038] Furthermore, in S5, the specific methods for training and parameter tuning of the random forest model are as follows: using existing indoor and outdoor labeled mobile phone signaling data, potential indoor trajectory points identified through steps S1-S2 are extracted through steps S3-S4, and various features of the potential indoor trajectory points are input into the random forest model for training. The model accuracy is evaluated using the k-fold verification method, and the model parameters are then optimized.

[0039] Furthermore, in S6, the method for determining the user's indoor / outdoor status is as follows: take mobile signaling data of any user at any time period; identify the user's potential indoor trajectory points through steps S1-S2; if the trajectory point is not a potential indoor trajectory point, then the point is an outdoor trajectory point; if the trajectory point is a potential indoor trajectory point, then extract the main serving base station cell type, signaling handover features, and base station building features through steps S3-S4, and input them into the random forest model trained in step S5 to determine whether it is an outdoor or indoor trajectory point.

[0040] This invention discloses a method for determining the indoor and outdoor status of users based on mobile phone signaling data. It can determine the indoor and outdoor status of users within a certain period of time and the duration of the corresponding status based on the user's mobile phone signaling data. This invention overcomes the limitations of traditional methods, such as limited applicability and difficulty in data collection. It can determine the indoor and outdoor status of users on a large scale and comprehensively, and analyze the temporal distribution of the user's indoor and outdoor status, which is helpful for the rational construction of urban public services and the development of urban planning. Attached Figure Description

[0041] Figure 1 This is a schematic diagram of the overall process of the present invention.

[0042] Figure 2 A schematic diagram for identifying potential indoor trajectory points.

[0043] Figure 3 A schematic diagram illustrating the specific process for determining the indoor and outdoor status of users. Detailed Implementation

[0044] The present invention will now be described in further detail with reference to the accompanying drawings and specific embodiments.

[0045] like Figure 1 As shown, the user indoor / outdoor status determination method based on mobile phone signaling data disclosed in this invention mainly includes the following processing steps:

[0046] S1, Mobile signaling data preprocessing;

[0047] S2, Potential indoor trajectory point identification;

[0048] S3. Identification of primary connected base station type and extraction of signaling handover features for potential indoor trajectory points;

[0049] S4. Building coverage and height extraction within the service range of macrocell base stations;

[0050] S5. Random forest model training and parameter tuning to determine the indoor and outdoor states of a user when the user is located at a potential indoor trajectory point;

[0051] S6. Determine the user's indoor and outdoor status and represent the time distribution.

[0052] The user indoor / outdoor status discrimination method based on mobile phone signaling data disclosed in this invention can determine the user's indoor / outdoor status and the duration of the corresponding status based on the user's mobile phone signaling data over a period of time.

[0053] The following is combined with Figure 3 The schematic diagram shown illustrates the specific process of determining the user's indoor / outdoor status, providing a detailed explanation of the specific processing steps of the user indoor / outdoor status determination method disclosed in this invention:

[0054] First, let's explain the technical terms used: Base station and base station cell: In a mobile communication network, a base station is a wireless device used to provide wireless communication services; it can transmit and receive radio waves to communicate with user terminals (such as mobile phones), thereby enabling wireless voice, data, and video communication services. A base station cell refers to a small area within the coverage area of ​​a base station, also known as a "cellular". Each base station can be divided into multiple cells, and each cell has a unique identifier called a cell ID.

[0055] S1. Mobile signaling data preprocessing: Preprocess the acquired user mobile signaling data to remove missing and duplicate records.

[0056] The purpose of this step is to remove erroneous records from mobile signaling data in order to improve the accuracy of determining the user's indoor and outdoor status.

[0057] First, obtain the user's mobile signaling data for one day, which can be represented as:

[0058] T={(l1,ci1,ci1_type,t1),(l2,ci2,ci2_type,t2),……(l n ,ci n ,ci n _type,t n )}

[0059] Among them, (l n ,ci n ,t n A trajectory point is represented by a symbol () and is generated only when the user's mobile phone interacts with the base station; n This indicates the latitude and longitude of the base station to which the user's mobile phone is connected at the time of the interaction; ci n Ci represents the cell number of the base station to which the user's mobile phone is connected. n _type indicates the type of base station cell being connected, t n Indicates the time when the interaction occurred.

[0060] Then, erroneous records such as duplicates or incomplete fields from the same time period in the mobile signaling data are removed.

[0061] S2. Potential Indoor Trajectory Point Identification: Noise points are identified based on the speed, distance, and number of visits between trajectory points. These noise points are then aggregated with neighboring non-noise points, retaining the original base station latitude and longitude, cell number, and base station type. After noise point aggregation, trajectory sequences are extracted based on the location of each trajectory point, and potential indoor trajectory points are identified according to the number of trajectory points contained in the trajectory sequence.

[0062] The purpose of this step is to make a preliminary judgment on the user's indoor / outdoor status based on their movement behavior. When a user moves at high speed, their phone will only interact with nearby base stations once, thus generating only one trajectory point. When a user moves slowly or is stationary, their phone will interact with nearby base stations multiple times, thus generating multiple trajectory points. However, a user can only move at high speed when outdoors. Therefore, the number of trajectory points at a user's location in the phone's signaling data can be used to make a preliminary judgment on the user's indoor / outdoor status.

[0063] To obtain the trajectory points of a user at a certain location from mobile signaling data, it is necessary to extract the trajectory sequence from the mobile signaling data. Before extracting the trajectory sequence, it is also necessary to aggregate the noise points in the mobile signaling data to improve the accuracy of the extracted trajectory sequence. In this invention, the aggregation of noise points in mobile signaling data is based on three aspects processed sequentially: the speed between trajectory points, the distance between them, and the number of times the trajectory points are visited.

[0064] First, based on the speed between trajectory points: calculate the speed between adjacent trajectory points in the user's mobile phone signaling data. If the speed exceeds the speed threshold ΔV (ΔV is set to 120km / h in this invention), then mark the trajectory point as a noise point and assign the base station latitude and longitude of the adjacent non-noise trajectory points to the noise point as the new base station latitude and longitude of the noise point, while retaining the original base station latitude and longitude, base station cell number and base station type of the noise point.

[0065] Second, based on the distance between trajectory points: The aggregation of noise points based on the distance between trajectory points is divided into two steps: (1) Aggregating noise points that are spatially and temporally adjacent. Calculate the distance between adjacent trajectory points. If the distance between two points is less than the distance threshold Δd1 (Δd1 is set to 150m in this invention), then it is regarded as a noise sequence. For each noise sequence, the trajectory point with the highest number of user visits is taken as the center point of the sequence. If the distance between the center points of two adjacent noise sequences is less than Δd1, then the two sequences are merged. After all noise sequences are found and merged, the base station latitude and longitude of the center point of each sequence is assigned to other points in the sequence as the new base station latitude and longitude of other points, but at the same time, the original base station latitude and longitude, base station cell number and base station type of each point need to be retained. (2) Aggregating noise points that are spatially adjacent but not temporally adjacent. Calculate the distance between the center points of each noise sequence in the previous step. If the distance between the center points of the sequence is less than Δd2 (Δd2 is set to 150m in this invention), then merge the two sequences and take the trajectory point with the highest number of user visits as the new center point. Assign the base station latitude and longitude of the center point to other points in the sequence as the new base station latitude and longitude of other points. However, at the same time, it is necessary to retain the original base station latitude and longitude, base station cell number and base station type of each point.

[0066] Third, based on the number of visits to trajectory points: First, calculate the number of visits to each trajectory point. If a trajectory point appears multiple times consecutively within a certain period, then the number of visits to that trajectory point within that period is only counted as two. After the calculation is complete, perform the following steps:

[0067] (1) Find the trajectory point A with the highest number of visits by the user in a day, find the noise sequence related to trajectory point A. If there is a noise sequence related to trajectory point A, execute step (2); otherwise execute step (4).

[0068] (2) Among all the noise sequences related to trajectory point A, find the trajectory point B with the highest number of visits besides trajectory point A, assign the base station latitude and longitude of A to B as its new base station latitude and longitude, but at the same time, it is necessary to retain the original base station latitude and longitude, base station cell number and base station type of trajectory point B.

[0069] (3) Recalculate the number of visits to each trajectory point and return to step (1);

[0070] (4) Noise point aggregation ends.

[0071] After noise point aggregation is completed, trajectory sequence extraction begins, merging trajectory points that are temporally adjacent and geographically identical into a single trajectory sequence. If a trajectory sequence contains at least one trajectory point, the sequence is considered a potential indoor trajectory point sequence, and the trajectory points in that sequence are considered potential indoor trajectory points; otherwise, the trajectory points in that sequence are considered outdoor trajectory points.

[0072] like Figure 2 As shown in (a), this is a user's mobile signaling data for one day. Noise points are first aggregated, and then trajectory sequences are extracted to identify potential indoor trajectory points. First, noise points are aggregated based on the speed between adjacent trajectory points: a noise sequence Seq.1 is found based on a speed threshold. A magnified view of this is shown below. Figure 2 As shown in (b), if the user's speed from point m1 to point m'1 exceeds the speed threshold ΔV, then point m'1 is a noise point. Point m1 is its neighboring non-noise point, so the base station latitude and longitude of point m1 are used as the new base station latitude and longitude of point m'1, while retaining the original base station latitude and longitude, base station cell number, and base station type of point m'1. Next, noise points are aggregated based on the distance between trajectory points: based on the distance threshold Δd1, spatially and temporally adjacent noise sequences Seq.2 and Seq.3 are identified, and their local magnification is shown below. Figure 2 As shown in (c). The center point of the noise sequence Seq.2 is m2, and the center point of Seq.3 is m3. The base station latitude and longitude of each point in the sequence are reassigned: the new base station latitude and longitude of each point in Seq.2 is the same as that of the center point m2, and the new base station latitude and longitude of each point in Seq.3 is the same as that of the center point m3. The original base station latitude and longitude, cell number, and base station type of each trajectory point are retained. Based on the distance threshold Δd2, spatially adjacent but temporally non-adjacent noise sequences Seq.2_3 are found, and their local magnification is shown below. Figure 2 As shown in (d), since the visit frequency of point m2 is greater than that of point m3, m2 is the new center point of the noise sequence Seq.2_3. The new base station latitude and longitude of each point in this sequence are the base station latitude and longitude of the center point m2, while retaining the original base station latitude and longitude, base station cell number, and base station type of each trajectory point. Finally, the noise points are aggregated based on the visit frequency of the trajectory points: among all trajectory points, the point with the highest visit frequency is m4, which has been visited 9 times. The noise sequences related to m4 are Seq.4 and Seq.5, and their local magnification is shown below. Figure 2 As shown in (e). In these two noise sequences, the point with the highest number of visits, excluding m4, is m'4. Therefore, the base station latitude and longitude of m4 are assigned to m'4 as its new base station latitude and longitude. At this point, there are no noise points in the user's mobile signaling data, and noise point aggregation is complete. Then, the trajectory sequence extraction begins, and the extraction results are as follows. Figure 2 As shown in (f), T.1 to T.7 are the extracted trajectory sequences. If each trajectory sequence T.1, T.4, T.5, T.6, and T.7 contains at least one trajectory point, then the points in these sequences are potential indoor trajectory points; the trajectory points in T.2 and T.3 are outdoor trajectory points.

[0073] S3. Identification of Primary Connecting Base Station Type and Extraction of Signaling Handover Features for Potential Indoor Trajectory Points: Based on the connection status of macro and micro cells in the potential indoor trajectory point sequence extracted in step S2, the primary connecting base station cell type of the potential indoor trajectory points in the sequence is determined, classifying them into macro cell trajectory points and micro cell trajectory points. Then, the signaling handover features of each trajectory point are extracted, including handover frequency between macro cell base stations, handover frequency between macro cell base stations, handover frequency between micro cell base stations, handover frequency between micro cell base stations, and the proportion of micro cell base stations.

[0074] The purpose of this step is to extract signaling handover features of potential indoor trajectory points, in order to prepare for using a random forest model to determine the user's indoor and outdoor status.

[0075] Based on their function, base stations can be divided into two categories: macrocell base stations and microcell base stations. Macrocell base stations are typically used to provide wide-area wireless communication services, handling the majority of dotted-line communication requirements within cities, and are usually established near roads or on building rooftops. Microcell base stations are small wireless communication base stations, typically used to provide localized wireless communication services to compensate for the insufficient service capabilities of macrocell base stations in certain scenarios; they are usually established indoors or in densely populated areas. Therefore, obtaining the type of base station a user is connected to is helpful in determining the user's indoor / outdoor status.

[0076] For the potential indoor trajectory points identified in step S2, the primary connected base station type of the trajectory point is determined based on the original macrocell and microcell base stations connected to all trajectory points in the trajectory sequence. If more than one point in the trajectory sequence is originally connected to a microcell base station, then the primary connected base station type of the trajectory point is microcell, and the trajectory point is a microcell trajectory point; otherwise, the primary connected base station cell type of the sequence is macrocell, and the trajectory point is a macrocell trajectory point.

[0077] When a user is indoors, due to building obstructions, their mobile phone typically maintains a stable connection to a single base station or cell for an extended period, without switching between base stations or cells. However, when a user is outdoors, in an open environment without building obstructions, their mobile phone usually switches between multiple base stations or cells. Therefore, the signaling handover status between base stations or cells differs between indoor and outdoor environments. Further extracting the user's signaling handover characteristics will be beneficial for distinguishing between indoor and outdoor user status.

[0078] Further extraction of signaling handover characteristics includes the following five aspects:

[0079] (1) Inter-cell handover frequency of macrocell base stations: Calculate the handover frequency between macrocell base stations connected to each trajectory point in the trajectory sequence of potential indoor trajectory points:

[0080]

[0081] Among them, Switch_Marcocell i Trj represents the inter-cell handover frequency of the macrocell base station for the i-th potential indoor trajectory point. I I,N(Trj) represents the trajectory sequence containing the i-th potential trajectory point. I S represents the number of trajectory points contained in the trajectory sequence. marcocell (Trj I This indicates the number of changes in the macrocell base station cells connected to adjacent trajectory points in the trajectory sequence;

[0082] (2) Macrocell base station handover frequency: Calculate the handover frequency between the macrocell base station locations connected to each trajectory point in the trajectory sequence containing the potential indoor trajectory point:

[0083]

[0084] Among them, Switch_Marco i Trj represents the inter-macrocell base station handover frequency for the i-th potential indoor trajectory point. I I,N(Trj) represents the trajectory sequence containing the i-th potential trajectory point. I S represents the number of trajectory points contained in the trajectory sequence. marco (Trj I This indicates the number of times the location of the macrocell base station connected to adjacent trajectory points in the trajectory sequence changes;

[0085] (3) Inter-cell handover frequency of microcell base stations: Calculate the handover frequency between microcell base stations connected to each trajectory point in the trajectory sequence of potential indoor trajectory points:

[0086]

[0087] Among them, Switch_Smallcell i Trj represents the inter-cell handover frequency of the microcell base station for the i-th potential indoor trajectory point. I I,N(Trj) represents the trajectory sequence containing the i-th potential trajectory point. I S represents the number of trajectory points contained in the trajectory sequence. smallcell (Tri I This indicates the number of changes in the microcell base station cells connected to adjacent trajectory points in the trajectory sequence;

[0088] (4) Handover frequency between microcell base stations: Calculate the handover frequency between the microcell base stations connected to each trajectory point in the trajectory sequence of potential indoor trajectory points:

[0089]

[0090] Among them, Switch_Small i Trj represents the handover frequency between microcell base stations for the i-th potential indoor trajectory point. I I,N(Trj) represents the trajectory sequence containing the i-th potential trajectory point. I S represents the number of trajectory points contained in the trajectory sequence. small (Trj I This indicates the number of times the location of the microcell base station connected to adjacent trajectory points in the trajectory sequence changes;

[0091] (5) Proportion of microcell base stations: Calculate the ratio between the number of trajectory points connected to microcell base stations in the trajectory sequence of potential indoor trajectory points and the total number of trajectory points:

[0092]

[0093] Among them, Switch_Small i Trj represents the proportion of microcell base stations for the i-th potential indoor trajectory point. I I,N(Trj) represents the trajectory sequence containing the i-th potential trajectory point. I N(Trj) represents the number of trajectory points contained in the trajectory sequence. I ) smallcell This indicates the number of trajectory points in the trajectory sequence that connect to microcell base stations.

[0094] If the trajectory point is a macrocell trajectory point, only its macrocell inter-cell handover frequency and macrocell inter-cell handover frequency are calculated, and the microcell inter-cell handover frequency and microcell inter-cell handover frequency of the trajectory point are set to 2, and the microcell base station ratio is set to 0; if the trajectory point is a microcell trajectory point, only its microcell inter-cell handover frequency, microcell inter-cell handover frequency, and microcell base station ratio are calculated, and the macrocell inter-cell handover frequency and macrocell inter-cell handover frequency of the trajectory point are set to 2.

[0095] S4. Building Coverage and Height Extraction within the Service Range of Macrocell Base Stations: Based on macrocell base stations, generate Thiessen polygons, and use the area of ​​the Thiessen polygon where each macrocell base station is located as its service range. Calculate the building coverage and average building height within the service range of each macrocell base station. The calculation results are matched with the macrocell trajectory points in step S3 using the base station cell number, and used as the base station building features for the macrocell trajectory points; the base station building features for microcell trajectory points are not calculated.

[0096] The purpose of this step is to extract the base station building features of macrocell trajectory points, in order to prepare for the next step of using a random forest model to determine the user's indoor and outdoor status.

[0097] Radio wave propagation models posit that building density and height both affect wireless propagation. When a user is located in an area with varying building density and height, even if their indoor and outdoor conditions are the same, their mobile phone's connection to the base station will differ, resulting in different signaling handover characteristics. Therefore, extracting the building features surrounding the base station is beneficial for better identifying the user's indoor and outdoor conditions using signaling handover features. Since microcell base stations are typically built inside buildings, the surrounding building features are usually consistent. Even when a user's mobile phone is in the same indoor or outdoor condition within or around different buildings with microcell base stations, the connection status will not change significantly. Therefore, it is sufficient to extract the building features surrounding macrocell base stations to help determine the user's indoor and outdoor conditions.

[0098] Generally, the service range of a base station in a city is determined using a Thiessen polygon generated based on the base station's location. The area of ​​the Thiessen polygon where each base station is located is the service range of that base station. In reality, the service range of a microcell base station usually only includes the interior of the building where it is located, and a macrocell base station near the microcell base station can also serve some users located inside the building. Therefore, if microcell base stations and macrocell base stations are considered together when generating the Thiessen polygon, the service range of the macrocell base station will be biased. Therefore, the Thiessen polygon is generated only based on the location of the macrocell base station, and the area of ​​the Thiessen polygon where each macrocell base station is located is taken as its service range. The building characteristics within its service range are recalculated, including: (1) Building height: Calculate the average building height within the service range of each macrocell base station; (2) Building coverage: Calculate the building coverage of the macrocell base station based on the building area within the service range of each macrocell base station.

[0099]

[0100] Among them, Bts i Let BC represent the i-th macrocell base station. iArea(Bts) represents the building coverage within the service area of ​​the i-th macrocell base station. i Area(Bts) represents the service area of ​​the macrocell base station. i _building) represents the area of ​​buildings within the service range of this base station.

[0101] After calculating the building coverage and height of all macrocell base stations, the calculation results are matched with the macrocell trajectory points based on the location of the macrocell base stations and the base station latitude and longitude of the macrocell trajectory points identified in step S3. This is used as the building feature of the base station. The building coverage and height values ​​of the microcell trajectory points are both set to -1.

[0102] S5. Random forest model training and parameter tuning for determining indoor and outdoor states when a user is located at a potential indoor trajectory point: Using existing mobile signaling data with indoor and outdoor labels, potential indoor trajectory points are identified through steps S1-S2, and various features of potential indoor trajectory points are extracted through steps S3-S4. These features are then input into the random forest model for training, and the model accuracy is evaluated using the k-fold validation method, thereby optimizing the model parameters.

[0103] The purpose of this step is to train a random forest model using mobile signaling data with user indoor / outdoor location tags, thereby enabling the determination of the user's indoor / outdoor status based on multiple features of potential indoor trajectory points.

[0104] For mobile signaling data with user indoor / outdoor location tags, potential indoor trajectory points are identified through steps S1-S2, and signaling handover features and base station building features of the potential indoor trajectory points are calculated through steps S3-S4, constructing a training dataset containing N training samples:

[0105] T={(x i ,y i |i=1,2,…,N}

[0106] Where x i =(x i1 ,x i2 ,…x in y is the input feature of the i-th potential indoor trajectory point, which includes the signaling handover features and base station building features of the trajectory point. i The indoor / outdoor state of the user at this trajectory point is indicated as: (1) indoor; (2) outdoor. The training dataset is input into the random forest model for training. The formula for the random forest is:

[0107]

[0108] Where RF(x) is the final discrimination result, Fi (x) represents the discrimination result of the i-th decision tree, where ntree and mtry are model parameters, representing the number of decision trees in the random forest and the number of features randomly selected by each decision tree, respectively. This model can iteratively learn and update as the training dataset accumulates, thereby improving discrimination accuracy.

[0109] S6. User's indoor / outdoor status determination and time distribution calculation:

[0110] Based on the random forest model trained in the previous step, the indoor and outdoor states of a user when they are located at a potential indoor trajectory point can be determined.

[0111] To obtain mobile signaling data from any user at any time period, such as... Figure 3 As shown, given any user's mobile phone signaling data, firstly, preprocessing is performed through step S1, and then potential indoor trajectory points in the user's mobile phone signaling data are identified through step S2. If the trajectory point is not a potential indoor trajectory point, then the point is an outdoor trajectory point.

[0112] If the trajectory point is a potential indoor trajectory point, then the main serving base station cell type, signaling handover features, and base station building features of the potential indoor trajectory point are extracted through steps S3-S4. Finally, the features are input into the random forest model trained in step S5 to further determine whether it is an outdoor trajectory point or an indoor trajectory point.

[0113] Finally, for all the discrimination results, the time distribution of the user's indoor and outdoor locations throughout the day is calculated based on the time attributes of each trajectory point.

[0114] Therefore, the method for judging the indoor and outdoor status of users based on mobile phone signaling data disclosed in this invention uses noise points to judge the indoor and outdoor status of users, specifically in the following ways: (1) In step S2, the noise points are not processed by the traditional elimination method, but are aggregated; and the original information of each trajectory point is retained when aggregating noise points, laying the foundation for the subsequent extraction of signaling switching features of potential indoor trajectory points; (2) In step S3, the signaling switching features of potential indoor trajectory points are extracted, so as to use noise points to help judge the indoor and outdoor status of users. Before extracting various features of potential indoor trajectory points, this invention classifies potential indoor trajectory points based on macrocells and microcells, which improves the accuracy of the final judgment result. Step S4 of this invention innovatively uses building features within the service range of the base station to assist in judging the indoor and outdoor status of users, and when extracting the service range of the base station, it only generates Thiessen polygons based on macrocell base stations, so that the extracted macrocell base station service range is closer to its true value.

[0115] Compared with existing technologies, it has the following technological advantages:

[0116] (1) The indoor and outdoor status of users is determined based on low-cost and easily obtainable mobile signaling data, which solves the problem of difficulty in obtaining data by traditional methods;

[0117] (2) Mobile signaling data can record the daily trajectory points of most people in the city. Using mobile signaling data to determine the indoor and outdoor status of users provides a large-scale and comprehensive method for studying the indoor and outdoor status of users, which solves the problem that traditional methods can only analyze a small number of people.

[0118] (3) The model of the present invention has high flexibility and can be continuously trained and optimized with the accumulation of data samples, thereby continuously improving the discrimination accuracy.

[0119] The above embodiments are not intended to limit the present invention, and the present invention is not limited to the examples given above. Any changes, modifications, additions or substitutions made by those skilled in the art within the scope of the technical solution of the present invention are also within the protection scope of the present invention.

Claims

1. A method for determining the indoor / outdoor status of users based on mobile phone signaling data, characterized in that: The main processing steps include the following: S1, Mobile signaling data preprocessing; S2, Potential indoor trajectory point identification; In S2, the process of identifying potential indoor trajectory points is as follows: noise points are identified based on the speed, distance and number of visits between trajectory points; noise points are aggregated with nearby non-noise points, and the latitude and longitude, cell number and type of the base station originally connected to them are retained; after the noise points are aggregated, trajectory sequences are extracted based on the location of each trajectory point, and potential indoor trajectory points are identified based on the number of trajectory points contained in the trajectory sequence. S3. Identification of primary connected base station type and extraction of signaling handover features for potential indoor trajectory points; In S3, the signaling handover features of each trajectory point are extracted, including the handover frequency between macrocell base stations, the handover frequency between macrocell base stations, the handover frequency between microcell base stations, the handover frequency between microcell base stations, and the proportion of microcell base stations. S4. Building coverage and height extraction within the service range of macrocell base stations; In S4, the specific methods for extracting building coverage and height within the service range of macrocell base stations are as follows: Based on macrocell base stations, Thiessen polygons are generated, and the range of the Thiessen polygons where each macrocell base station is located is taken as its service range. Calculate the building coverage and average building height within the service area of ​​each macrocell base station; The calculation results are matched with the macrocell trajectory points in step S3 using the base station cell number, and used as the base station building features of the macrocell trajectory points; the base station building features of the microcell trajectory points are not calculated. S5. Random forest model training and parameter tuning to determine the indoor and outdoor states of a user when the user is located at a potential indoor trajectory point; S6. Determine the user's indoor and outdoor status and represent the time distribution.

2. The method for determining the indoor / outdoor status of users based on mobile phone signaling data according to claim 1, characterized in that: In S1, the specific method for preprocessing mobile signaling data is as follows: erroneous records in the acquired mobile signaling data are removed, including but not limited to missing and duplicate records in the data.

3. The method for determining the indoor / outdoor status of users based on mobile phone signaling data according to claim 1, characterized in that: In S3, the method for identifying the primary connected base station type is as follows: based on the connection of macro and micro cells in the potential indoor trajectory point sequence extracted in step S2, the primary connected base station cell type of the potential indoor trajectory points in the sequence is determined, and they are divided into macro cell trajectory points and micro cell trajectory points. If one or more points in the trajectory sequence of a potential indoor trajectory point were originally connected to a microcell base station, then the primary connected base station type of that trajectory point is microcell, and that trajectory point is a microcell trajectory point; otherwise, the primary connected base station cell type of that sequence is macrocell, and that trajectory point is a macrocell trajectory point.

4. The method for determining the indoor / outdoor status of users based on mobile phone signaling data according to claim 3, characterized in that: The specific methods for extracting various signaling handover features are as follows: (1) Inter-cell handover frequency of macrocell base stations: Calculate the handover frequency between macrocell base stations connected to each trajectory point in the trajectory sequence of potential indoor trajectory points: in, Indicates the first i Inter-cell handover frequency of macrocell base stations for potential indoor trajectory points Indicates the first i The trajectory sequence I containing the potential trajectory points, This indicates the number of trajectory points contained in the trajectory sequence. This indicates the number of changes in the macrocell base station cells connected to adjacent trajectory points in the trajectory sequence; (2) Macrocell base station handover frequency: Calculate the handover frequency between the macrocell base station locations connected to each trajectory point in the trajectory sequence containing the potential indoor trajectory points: in, Indicates the first i The handover frequency between macrocell base stations for potential indoor trajectory points Indicates the first i The trajectory sequence I containing the potential trajectory points, This indicates the number of trajectory points contained in the trajectory sequence. This indicates the number of times the location of the macrocell base station connected to adjacent trajectory points in the trajectory sequence changes; (3) Inter-cell handover frequency of microcell base stations: Calculate the handover frequency between microcell base stations connected to each trajectory point in the trajectory sequence of potential indoor trajectory points: in, Indicates the first i Microcell base station inter-cell handover frequency for potential indoor trajectory points Indicates the first i The trajectory sequence I containing the potential trajectory points, This indicates the number of trajectory points contained in the trajectory sequence. This indicates the number of changes in the microcell base station cells connected to adjacent trajectory points in the trajectory sequence; (4) Handover frequency between microcell base stations: Calculate the handover frequency between microcell base stations connected to each trajectory point in the trajectory sequence of potential indoor trajectory points: in, Indicates the first i Handover frequency between microcell base stations for potential indoor trajectory points Indicates the first i The trajectory sequence I containing the potential trajectory points, This indicates the number of trajectory points contained in the trajectory sequence. This indicates the number of times the location of the microcell base station connected to adjacent trajectory points in the trajectory sequence changes; (5) Proportion of microcell base stations: Calculate the ratio between the number of trajectory points connected to microcell base stations and the total number of trajectory points in the trajectory sequence of potential indoor trajectory points: in, Indicates the first i The proportion of microcell base stations for potential indoor trajectory points I represents the trajectory sequence containing the i-th potential trajectory point. This indicates the number of trajectory points contained in the trajectory sequence. This indicates the number of trajectory points in the trajectory sequence that connect to microcell base stations.

5. The method for determining the indoor / outdoor status of users based on mobile phone signaling data according to claim 1, characterized in that: In S5, the specific method for training and parameter tuning of the random forest model is as follows: using existing indoor and outdoor labeled mobile phone signaling data, potential indoor trajectory points are identified through steps S1-S2, and various features of the potential indoor trajectory points are extracted through steps S3-S4. These features are then input into the random forest model for training. The model accuracy is evaluated using the k-fold verification method, and the model parameters are then optimized.

6. The method for determining the indoor / outdoor status of users based on mobile phone signaling data according to claim 1, characterized in that: In S6, the method for determining the indoor / outdoor status of a user is as follows: take mobile signaling data of any user at any time period; identify potential indoor trajectory points of the user through steps S1-S2; if the trajectory point is not a potential indoor trajectory point, then the point is an outdoor trajectory point; if the trajectory point is a potential indoor trajectory point, then extract the cell type of the primary serving base station, signaling handover features, and base station building features through steps S3-S4, and input them into the random forest model trained in step S5 to determine whether it is an outdoor or indoor trajectory point.