A method and system for monitoring access to privacy data based on communication logs of a vehicle network
By analyzing vehicle network communication logs, the types of session results and low-increment suffixes are identified, which solves the problem of insufficient analysis of session process differences in vehicle networks and enables refined monitoring and identification of redundant behaviors in the process of accessing private data.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANCHANG TRANSPORTATION COLLEGE
- Filing Date
- 2026-05-20
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies struggle to make detailed comparisons of differences in access processes during historical sessions in the context of connected vehicles, especially lacking specialized analysis of additional access processes at the end of a session, resulting in insufficient monitoring capabilities for privacy data access.
By collecting communication log data from the vehicle network system, extracting session result features to construct signatures, identifying individual full access boundaries and merging them into group boundaries, identifying low-increment suffixes, and monitoring the redundancy features of the sessions to be detected, technical analysis of the privacy data access process can be achieved.
It enhances the monitoring capabilities of the privacy data access process, can identify additional access behaviors after the boundary, and provides refined monitoring and access behavior optimization support.
Smart Images

Figure CN122247756A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of privacy data access behavior analysis technology, specifically to a privacy data access monitoring method and system based on vehicle network communication logs. Background Technology
[0002] In connected vehicle systems, there is a continuous data interaction process between vehicle terminals, mobile applications, cloud platforms, in-vehicle gateways, and in-vehicle control units. Data related to vehicle location, trajectory, operating status, remote control records, and account identification are typically requested, transmitted, processed, and returned through multi-node communication links. Because this data can directly or indirectly characterize vehicle usage status, driving activities, and user behavior characteristics, monitoring its access process is of great significance for protecting privacy in connected vehicle systems.
[0003] Monitoring data access behavior typically involves analyzing communication logs, API call records, or interaction audit information to identify abnormal access, excessive access, or unexpected data interaction behaviors. This method can effectively monitor obviously abnormal access in general scenarios. However, in the context of connected vehicles, a single data access is usually not fully represented by a single log record but corresponds to a session process formed by continuous interactions between multiple communication nodes. For a large amount of historical communication logs, different sessions may achieve the same or similar final results, but their process length, number of interaction rounds, and node participation are inconsistent. Some technologies, when utilizing this historical session information, focus more on the characteristics of a single record, local statistical features, or overall anomalies, lacking a more detailed comparative basis for sessions with similar results but different processes, especially lacking specialized analysis of the additional access processes in the later stages of the session.
[0004] Therefore, how to more effectively characterize the differences in access processes in historical sessions based on vehicle network communication logs, and further improve the monitoring capabilities of privacy data access processes, remains an issue that needs to be addressed in existing technologies. Summary of the Invention
[0005] To address the aforementioned technical issues, this invention proposes a privacy data access monitoring method and system based on vehicle network communication logs. By utilizing the process progression patterns of similar result sessions in historical session logs, the sufficient boundaries for the formation of access results are identified, and on this basis, access expansion behavior is monitored, thereby achieving more targeted analysis of the privacy data access process.
[0006] To achieve the above objectives, the present invention provides the following technical solution: The first aspect of this invention provides a method for monitoring privacy data access based on vehicle network communication logs, comprising: Collect communication log data generated by multiple communication nodes in the vehicle network system during data interaction, and extract multiple access sessions from the communication log data; Extract multiple session result features for each access session to construct a session result signature, and classify multiple access sessions based on the session result signature to generate multiple session result type sets; Construct the access path sequence for each access session, extract the individual full access boundaries of multiple access sessions in the session result type set and merge them to generate the group full access boundary for each session result type set; Based on the individual full access boundary, extract the suffix sequence of each access session in the session result type set, and identify the low incremental suffix among multiple suffix sequences; The system acquires the sessions to be monitored in the vehicle network system, identifies the session result type to which the session belongs, identifies the access redundancy characteristics of the session based on the sufficient boundary of the group and multiple low-increment suffixes, and generates the redundancy access monitoring results of the session.
[0007] Preferably, the individual sufficient access boundaries of multiple access sessions in the session result type set are extracted and fused to generate the group sufficient boundary for each session result type set, including: Based on the access path sequence, construct multiple prefix subsequences for each access session in the session result type set, and extract multiple node access state features for each prefix subsequence to construct a node access state vector; Calculate the access state matching parameters of each prefix subsequence and the other prefix subsequences in the session result type set based on the node access state vector, and identify multiple nearest neighbor samples of each access session with respect to each prefix subsequence based on the access state matching parameters; Construct a nearest neighbor sample set for each prefix subsequence, including multiple nearest neighbor samples. Determine the result matching group of the prefix subsequence in the nearest neighbor sample set based on the session result signature. Calculate the boundary reference index of each prefix subsequence based on the number of samples in the nearest neighbor sample set and the result matching group. Based on the boundary reference index, candidate individual subsequences among multiple prefix subsequences included in the access session are determined. The node access state vector of the candidate individual subsequence is recorded as the individual full access boundary of the access session. Multiple individual full access boundaries in the nearest neighbor sample set are merged to obtain the group full boundary of the session result type set.
[0008] Preferably, identifying low-increment suffixes among multiple suffix sequences includes: For the suffix sequence of the access session, the candidate individual subsequences are removed from the access path sequence of the access session to obtain the suffix sequence of the access session. The result increment parameter of the suffix sequence is calculated based on the boundary reference index of the candidate individual subsequences. Based on the result increment parameter, multiple low increment suffixes are determined from multiple suffix sequences.
[0009] Preferably, the target session result type to which the session to be detected belongs is identified, and the access redundancy characteristics of the session to be detected are identified based on the sufficient boundary of the group and multiple low-increment suffixes, generating the redundant access monitoring results of the session to be detected, including: Construct the target session result signature and target access path sequence of the session to be detected. Based on the target session result signature, match the session to be detected with multiple session result types, determine the session result type that matches the session to be detected best, and record it as the target session result type. Extract multiple candidate prefix sequences from the target access path sequence, construct the target node access state vector for each candidate prefix sequence, and identify the individual boundary nodes of the session to be detected based on the group sufficient boundary of the target session result type and the multiple target node access state vectors. Based on individual boundary nodes, the suffix sequence to be detected is extracted from the target access path sequence. Based on multiple low-increment suffixes of the target session result type, the access behavior expansion analysis of the session to be detected is performed to generate redundant access monitoring results of the session to be detected.
[0010] Preferably, the extended analysis of access behavior for the session to be detected based on multiple low-increment suffixes of the target session result type includes: Extract multiple suffix access state features for each low-increment suffix and construct a suffix access state vector. Determine the suffix baseline parameters for each suffix access state feature based on the multiple suffix access state vectors. Construct a low-increment state baseline for the target session result type based on the multiple suffix baseline parameters. Extract the incremental state sequence of the session to be detected based on individual boundary nodes, construct the target incremental structure vector corresponding to the incremental state sequence, extract multiple incremental structure offset features of the session to be detected with respect to the target incremental structure vector based on the low incremental state baseline, and generate the redundant access monitoring results of the session to be detected.
[0011] Preferably, identifying individual boundary nodes of the session to be detected based on the group sufficient boundary of the target session result type and the access state vectors of multiple target nodes includes: Multiple target node access state vectors are matched with the group sufficient boundary of the target session result type. The boundary matching distance between each target node access state vector and the group sufficient boundary is calculated. The communication node containing the target node access state vector with the smallest boundary matching distance is selected as the individual boundary node of the session to be detected.
[0012] A second aspect of the present invention provides a privacy data access monitoring system based on vehicle network communication logs, used to implement the aforementioned privacy data access monitoring method based on vehicle network communication logs, comprising: The communication log data acquisition module is used to collect communication log data generated by multiple communication nodes in the vehicle network system during data interaction, and extract multiple access sessions from the communication log data. The session result type identification module is used to extract multiple session result features for each access session to construct a session result signature, and classify multiple access sessions based on the session result signature to generate multiple session result type sets; The sufficient boundary generation module is used to construct the access path sequence for each access session, extract the individual sufficient access boundaries of multiple access sessions in the session result type set and fuse them to generate the group sufficient boundary for each session result type set; The low-increment suffix extraction module is used to extract the suffix sequence of each access session in the session result type set based on the individual full access boundary, and to identify low-increment suffixes among multiple suffix sequences. The redundant access monitoring module is used to acquire the sessions to be monitored in the vehicle network system, identify the target session result type to which the session to be monitored belongs, identify the access redundancy characteristics of the session to be monitored based on the sufficient boundary of the group and multiple low-increment suffixes, and generate the redundant access monitoring results of the session to be monitored.
[0013] The present invention has the following beneficial effects: This invention performs process-oriented analysis on historical access sessions in vehicle network communication logs, extracts the result features of access sessions, constructs session result signatures, classifies historical access sessions to generate multiple session result type sets, and then identifies individual sufficient access boundaries based on access path sequences of similar historical access sessions and merges them to obtain group sufficient boundaries. On this basis, it further extracts the suffix sequence after the boundary and identifies low-increment suffixes to form a low-increment state baseline under the target session result type. When monitoring the session to be detected, it first identifies the target session result type to which it belongs, then determines its boundary position based on the group sufficient boundary, and compares the incremental state after the boundary with the low-increment state baseline to extract incremental structure offset features. This enables targeted monitoring of additional access behaviors after the boundary during privacy data access. By distinguishing between the access process necessary to form the target result and the additional access process that continues after the sufficient state is reached, this invention can improve the ability to identify redundant behaviors in privacy data access and provide support for refined monitoring and optimization of privacy data access processes. Attached Figure Description
[0014] Figure 1This is a flowchart illustrating a privacy data access monitoring method based on vehicle network communication logs, as described in an embodiment of the present invention.
[0015] Figure 2 This is a schematic diagram of the structure of a privacy data access monitoring system based on vehicle network communication logs, according to an embodiment of the present invention. Detailed Implementation
[0016] To enable those skilled in the art to better understand the technical solutions of this invention, the technical solutions of the embodiments of this invention will be clearly and completely described below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are only for explaining the invention and are not intended to limit the invention.
[0017] The first aspect of this invention provides a method for monitoring privacy data access based on vehicle network communication logs. Please refer to [link to relevant documentation]. Figure 1 The method includes the following steps: Step S1: Collect communication log data generated by multiple communication nodes in the vehicle network system during data interaction, and extract multiple access sessions from the communication log data.
[0018] In this embodiment, the communication nodes may include, but are not limited to, mobile terminal application nodes, cloud service nodes, vehicle communication terminal nodes, in-vehicle controller nodes, and message forwarding nodes. The communication log data is used to record the interaction behavior between the communication nodes during the data request, data forwarding, data processing, and data return processes.
[0019] In some implementations, communication log data may include one or more of the following information: log time information, request initiator identifier, request receiver identifier, message type identifier, request / response direction identifier, status information, message length information, fragmentation information, retransmission information, and connection status information. This information comprehensively reflects the formation method and execution path of a data interaction process in a connected vehicle environment.
[0020] It should be noted that the privacy data access in the embodiments of the present invention is not limited to the access behavior of directly reading user identity information, but may also include the process of requesting, transmitting, processing and returning data associated with the vehicle or vehicle user. For example, vehicle location data, operating status data, remote control records, account association information and interaction records can all be used as privacy data access objects in the embodiments of the present invention.
[0021] After completing the collection of communication log data, multiple access sessions are extracted from the communication log data. Each access session represents a complete data access request and its execution process. For example, in a specific scenario, a user initiates a request to query the vehicle's location through a mobile terminal application. Upon receiving this request, the cloud service forwards the location query instruction to the in-vehicle communication terminal. The in-vehicle communication terminal then interacts with the in-vehicle positioning module to obtain the vehicle's location, and subsequently returns the location result to the cloud, which in turn returns it to the mobile terminal application. During this process, multiple communication log records may be generated between the mobile terminal, the cloud, the in-vehicle communication terminal, and the in-vehicle modules. These continuously generated log records around the same access target constitute an access session. Similarly, access requests such as querying vehicle status, reading remote control records, and obtaining fault information can also correspond to multiple access sessions in the historical communication logs.
[0022] Step S2: Extract multiple session result features for each access session to construct a session result signature, and classify multiple access sessions based on the session result signature to generate multiple session result type sets.
[0023] In this embodiment, multiple access sessions are categorized at the result level so that access sessions with the same or similar final results can be placed in a unified comparison scope in subsequent analysis. Although different access sessions may differ in execution path, number of interaction rounds, and process length, some of them will have consistent or nearly consistent final output results.
[0024] For each access session, multiple session result features are extracted to construct a session result signature. These features primarily characterize the final result pattern presented by the access session at its end, and may include, but are not limited to, final response status information, final output size information, final response latency information, final output organization method information, and final output node information. Specifically, the final response status information characterizes the response status at the end of the access session, such as successful return, failure return, timeout termination, or abnormal termination; the final output size information characterizes the data volume of the final returned data, such as the final return message length, the final output record count, and the final output byte size; the final response latency information characterizes the duration from session initiation to final result output; the final output organization method information characterizes the specific output method of the final result, such as whether it is output in one go or in multiple rounds, i.e., the specific output rounds; and the final output node information characterizes which communication node outputs the final result. After obtaining these session result features, they are combined to form the session result signature corresponding to the access session.
[0025] In some implementations, to avoid excessive sensitivity when directly comparing continuous numerical features, information such as the final output size and final response latency can be standardized. For example, this can be achieved using interval mapping. First, based on the distribution of corresponding numerical features across a large number of historical access sessions, continuous values are divided into several intervals, and then the original values are mapped to the corresponding interval identifiers. For discrete result features, such as final response status information, session end status information, and final output node information, they can be directly used for session result signature construction.
[0026] After obtaining the session result signatures for each access session, the multiple access sessions are classified based on these signatures, generating multiple sets of session result types. Access sessions within each set of session result types share the same or nearly identical final result patterns. That is, within the same set, although different access sessions may not have completely identical processes, their corresponding final access results are consistent or similar at the observable level. Specifically, access sessions with completely identical session result signatures can be directly grouped into the same set of session result types; for access sessions with slight differences in some result features but still belonging to similar result patterns, they can be grouped into the same set of session result types under a preset approximate matching rule. For example, when several access sessions have completely identical discrete result features, but their continuous numerical features belong to adjacent intervals, they can be considered as approximately identical result patterns and grouped into the same set of result types.
[0027] The above method can transform multiple historical access sessions from a mixed distribution based on process to a grouping based on session result type, so that subsequent analysis is no longer performed uniformly on all historical sessions, but is carried out within the same set of session result types.
[0028] Step S3: Construct the access path sequence for each access session, extract the individual sufficient access boundaries of multiple access sessions in the session result type set and merge them to generate the group sufficient boundary for each session result type set.
[0029] In this embodiment, this step is mainly used to identify multiple access sessions within different result type sets that have reached the sufficient access boundary corresponding to the corresponding result. That is, for multiple access sessions within the same session result type set, although their final results are the same or nearly the same, not all sessions need to undergo an access process of exactly the same length or structure to form that result. Therefore, by analyzing the process progression patterns of historical access sessions, it is determined to determine to what extent the process typically progresses under that session result type to be considered sufficient to form the result.
[0030] In some implementations, extracting and fusing individual sufficient access boundaries from multiple access sessions within a set of session result types to generate a group sufficient boundary for each set of session result types specifically includes: Based on the access path sequence, construct multiple prefix subsequences for each access session in the session result type set, and extract multiple node access state features for each prefix subsequence to construct a node access state vector.
[0031] Specifically, the access path sequence for an access session is formed by multiple log records within the access session in chronological order. For example, after a user initiates a data query request, the interaction process involved by multiple communication nodes in completing the request is analyzed. Based on the information flow relationships contained in the interaction process and the multiple communication nodes involved, a corresponding access path sequence is constructed to represent the complete process progression path of the access session from start to finish. On this basis, the access path sequence is progressively segmented to form multiple prefix process fragments. Each prefix subsequence represents the local process state of the access session from its start to when it progresses to a certain communication node. The node access state features corresponding to each prefix subsequence are extracted to represent the access progression state already formed under the current prefix subsequence. These features may include, but are not limited to, the cumulative number of requests, the cumulative number of responses, the number of communication nodes covered, the prefix duration, and the number of repeated accesses.
[0032] For example, the cumulative number of requests is the number of request-type log records that have occurred from the start of the access session to the current position; the cumulative number of responses is the number of response-type log records that have occurred from the start of the access session to the current position; the number of covered communication nodes is the number of communication nodes participating in data interaction in the current prefix subsequence, used to reflect the communication scope that the access process has expanded to; and the prefix duration is the length of time between the start time of the access session and the corresponding time at the current position. Those skilled in the art can use appropriate feature information according to actual needs to characterize the access progress state. By combining the access state features of multiple nodes, the prefix subsequence involving the interaction order between multiple communication nodes is transformed into a comparable state representation object, namely, a node access state vector.
[0033] Calculate the access state matching parameters of each prefix subsequence with the other prefix subsequences in the session result type set based on the node access state vector, and identify multiple nearest neighbor samples of each access session with respect to each prefix subsequence based on the access state matching parameters.
[0034] Specifically, the access state matching parameter can be used to quantify the proximity of two prefix subsequences in the access progress state. A similarity or distance metric can be constructed based on the differences between the state features in the access state vectors of the two nodes, thereby determining which prefix subsequences are the nearest neighbors of the current prefix subsequence. For example, the Euclidean distance between two vectors can be calculated as the access state matching parameter. Any prefix subsequence whose access state matching parameter with the current prefix subsequence is less than a preset state matching threshold is denoted as the nearest neighbor of the current prefix subsequence.
[0035] Construct a nearest neighbor sample set for each prefix subsequence, including multiple nearest neighbor samples. Determine the result matching group of the prefix subsequence in the nearest neighbor sample set based on the session result signature. Calculate the boundary reference index of each prefix subsequence based on the number of samples in the nearest neighbor sample set and the result matching group.
[0036] Specifically, the prefix subsequence result matching group refers to multiple access sessions within the nearest neighbor sample set whose complete access session's session result signature is completely identical to the current access session's session result signature. Based on the sample size of the nearest neighbor sample set and the result matching group (i.e., the number of access sessions included), a boundary reference index is calculated for each prefix subsequence. Specifically, this involves calculating the ratio between the sample size of the result matching group and the sample size of the nearest neighbor sample set, which serves as the boundary reference index. This index characterizes the access progression state corresponding to the current prefix subsequence, indicating whether it can reliably guide the current access session to its result type within similar historical samples.
[0037] As a more efficient implementation, considering the small total number of samples in the nearest neighbor sample set of some prefix subsequences, which leads to a situation where the sample size of the nearest neighbor sample set is small but the ratio of the sample size of the matching group is large, a preferred approach is to select the nearest neighbor sample set with the largest sample size from the nearest neighbor sample sets of multiple prefix subsequences of the access session as a reference. The sample size of this nearest neighbor sample set is used as the baseline, and the ratio of the sample size of the matching group to the baseline is used as the boundary reference index. In this case, the larger the boundary reference index, the more reliably the current state of the communication node can guide the current access session's result type among similar samples.
[0038] Finally, based on the boundary reference index, the prefix subsequence with the largest boundary reference index among the multiple prefix subsequences contained in the access session is selected as the candidate individual subsequence of the access session. The node access state vector of the candidate individual subsequence is denoted as the individual sufficient access boundary of the access session, which is used to characterize that the access session has sufficient conditions to form the final result after advancing to a certain access state. On this basis, multiple individual sufficient access boundaries from the nearest neighbor sample set are merged to obtain the population sufficient boundary of the set of session result types.
[0039] Specifically, robust fusion processing can be performed on the full access boundaries of multiple individuals belonging to the same set of session result types to extract the most representative boundary states. For example, median state fusion can be used. For multiple full access boundaries of individuals, the median of the access state features of each node in these full access boundaries of individuals can be selected. The median of the access state features of these nodes can be used to form a vector as the group full boundary to represent the access session in the set of session result types. When its access progress state is close to the group full boundary, it can be considered that it has basically reached the full state required to form the corresponding result.
[0040] Step S4: Extract the suffix sequence of each access session in the session result type set based on the individual full access boundary, and identify the low-increment suffix among multiple suffix sequences.
[0041] In this embodiment, considering that under the same session result type, some historical access sessions may continue to have several subsequent interactions after reaching the sufficient access boundary, but these subsequent interactions do not contribute significantly to the formation of the final result, and that in similar historical samples, there are often access sessions with earlier boundaries, shorter subsequent processes, and lower overall costs, this embodiment analyzes the subsequent access processes of historical access sessions after reaching the sufficient access boundary in each session result type set, and identifies low-increment suffixes that contribute little to the final result.
[0042] In some implementations, identifying low-increment suffixes among multiple suffix sequences specifically includes: The result increment parameter of the suffix sequence is calculated based on the boundary reference index of the candidate individual subsequence, and multiple low increment suffixes are determined from multiple suffix sequences based on the result increment parameter.
[0043] Specifically, for any access session in the set of session result types, for the suffix sequence of the access session, the candidate individual subsequences are removed from the access path sequence of the access session to separate the access fragments after the boundary, resulting in the suffix sequence of the access session. This suffix sequence characterizes the subsequent access process that continues after the access session reaches the individual full access boundary. The result increment parameter characterizes the additional improvement in the certainty of the final result brought by the suffix sequence relative to the candidate individual subsequences.
[0044] Since the boundary reference index already reflects the credibility of the candidate individual subsequence in guiding the current result type in historical similar samples, the boundary reference index of the candidate individual subsequence can be regarded as the result certainty at the boundary moment. Furthermore, the result completion degree of the complete access session is used as the final result state reference. In this embodiment, the result completion degree of the complete access session is preferably 1. The difference between the result completion degree of the complete access session and the boundary reference index is calculated as the result increment parameter corresponding to the suffix sequence. The smaller the result increment parameter, the lower the additional contribution of the suffix sequence to the final result formation, even though the suffix sequence still has access behavior. Therefore, multiple low-increment suffixes can be selected from multiple suffix sequences. For example, suffix sequences with result increment parameters less than a preset increment threshold are identified as low-increment suffixes.
[0045] Step S5: Obtain the session to be detected in the vehicle network system, identify the session result type to which the session to be detected belongs, identify the access redundancy characteristics of the session to be detected based on the sufficient boundary of the group and multiple low incremental suffixes, and generate the redundancy access monitoring results of the session to be detected.
[0046] In this embodiment, the analysis results of multiple historical access sessions of the vehicle network system are used for the session to be detected to monitor redundant access behavior. For the current session to be detected, it is not possible to judge whether its access is redundant without considering the historical similar access processes. Instead, the session result type should be determined first, and then the access process after the boundary should be compared and analyzed in combination with the sufficient boundary of the group corresponding to the session result type and multiple low-increment suffixes. Attention should be paid to whether there are still obvious additional access processes after the access session has met the sufficient conditions required for the same result.
[0047] In some implementations, the above-mentioned identification of the target session result type to which the session to be detected belongs and the access redundancy characteristics of the session to be detected, and the generation of redundancy access monitoring results for the session to be detected, specifically includes: Construct the target session result signature and target access path sequence of the session to be detected. Based on the target session result signature, match the session to be detected with multiple session result types, determine the session result type that best matches the session to be detected, and record it as the target session result type.
[0048] Specifically, based on the aforementioned consistent analysis method for multiple access sessions, a target session result signature and a target access path sequence for the session to be detected are constructed. This process first acquires the communication log data of the session to be detected, extracts multiple session result features to construct the target session result signature, and generates a target access path sequence representing the complete access process based on the temporal interaction relationships between multiple communication nodes in the session to be detected.
[0049] The process of matching the session to be detected with multiple session result types can be initiated by first constructing a result signature template corresponding to each session result type based on the session result signatures of multiple access sessions in the session result type set. Specifically, statistical analysis is performed on multiple continuous numerical features, and the interval identifiers corresponding to each continuous numerical feature are sorted and the median identifier is identified as a statistical feature of the continuous numerical feature. For discrete result features, since each discrete result feature in the session result type set corresponds to only one state, it is directly applied. Thus, based on the statistical features of multiple discrete result features and continuous numerical features, a result signature template corresponding to each session result type is generated.
[0050] Based on this, the target session result signature is compared with multiple result signature templates to determine the session result type with the highest degree of consistency or near-consistency with the target session result signature. This type is then used as the target session result type to which the session to be detected belongs. Specifically, if no session result type with a consistent signature exists, since discrete result features are completely identical, and consecutive numerical features belong to adjacent intervals, they are also considered to be in the same result type set. In this case, the session result type with the smallest number of consecutive numerical features in adjacent intervals is used as the target session result type, thus limiting the session to be detected to a set of historical sessions with the same or nearly identical final results for comparison.
[0051] Building upon this, similar to the methods described above for extracting prefix subsequences and constructing node access state vectors, multiple candidate prefix sequences are extracted from the target access path sequence, and a target node access state vector is constructed for each candidate prefix sequence. These target node access state vectors are then matched against the sufficient boundary of the target session result type. For example, the Euclidean distance between the target node access state vectors and the sufficient boundary of the target session result type is calculated as the boundary matching distance. The communication node containing the target node access state vector with the smallest boundary matching distance is selected as the individual boundary node of the session to be detected. The individual boundary node indicates that when the session to be detected progresses to this node, it has essentially reached the sufficient access state required to form the target session result type.
[0052] After identifying individual boundary nodes, the suffix sequence to be detected is extracted from the target access path sequence based on the individual boundary nodes. Based on multiple low-increment suffixes of the target session result type, the access behavior expansion analysis of the session to be detected is performed to generate the redundant access monitoring results of the session to be detected.
[0053] For the above-mentioned access behavior expansion analysis process, firstly, multiple suffix access state features of each low-increment suffix are extracted and suffix access state vectors are constructed. Based on the multiple suffix access state vectors, the suffix baseline parameters of each suffix access state feature are determined. Based on the multiple suffix baseline parameters, a low-increment state baseline of the target session result type is constructed.
[0054] Specifically, each low-increment suffix represents a low-contribution additional process that typically persists even after multiple historical access sessions have reached a fully accessed state, under a specific session outcome type. To further characterize the structural features of these low-increment suffixes, multiple suffix access state features can be extracted from each low-increment suffix to characterize the additional access strength and scope of the post-boundary process, such as the number of suffix requests, the number of suffix responses, the size of suffix request bytes, the size of suffix response bytes, the suffix session time, and the number of repeated suffix accesses. Similar to the aforementioned node access state features, these multiple suffix access state features are used to describe the resource consumption of the access session after the individual fully accessed boundary. By statistically analyzing the above-mentioned multiple suffix access state features, the mean or median of each suffix access state feature is selected as the suffix baseline parameter. In this embodiment, the mean is taken as an example. The mean of each suffix access state feature is calculated as the suffix baseline parameter. Based on the suffix baseline parameters of multiple suffix access state features, a low-incremental state baseline for the target session result type is constructed to characterize the low-incremental resource consumption state that historical access sessions typically exhibit after reaching the sufficient boundary under the target session result type.
[0055] Extract the incremental state sequence of the session to be detected based on individual boundary nodes, construct the target incremental structure vector corresponding to the incremental state sequence, extract multiple incremental structure offset features of the session to be detected with respect to the target incremental structure vector based on the low incremental state baseline, and generate the redundant access monitoring results of the session to be detected.
[0056] Specifically, for the session to be detected, after identifying its individual boundary nodes, the access path segments of the target access path sequence after the individual boundary nodes are taken as the post-boundary incremental process, and this post-boundary incremental process is represented as an incremental state sequence. Then, multiple suffix access state features corresponding to the incremental state sequence are extracted to construct the target incremental structure vector corresponding to the incremental state sequence. Based on the low incremental state baseline, multiple incremental structure offset features of the session to be detected with respect to the target incremental structure vector are extracted to characterize the degree of deviation of the post-boundary process of the session to be detected relative to the historical low incremental appended process.
[0057] Specifically, for each structural component in the target incremental structural vector, it can be compared with the suffix baseline parameter of the corresponding feature in the low incremental state baseline to analyze whether there is significant expansion of access behavior in the post-boundary stage of the detected session. Specifically, the offset value corresponding to different suffix access state features is calculated, that is, the increase in the suffix baseline parameter of each suffix access state feature compared to the corresponding feature in the low incremental state baseline. If the offset values of multiple suffix access state features of the detected session are all less than the offset threshold, it indicates that the post-boundary process of the detected session is consistent with or approximately consistent with the normal low incremental additional process in similar historical sessions. If the offset value of any suffix access state feature is not less than the offset threshold, it indicates that the detected session still has significant additional access behavior after reaching the sufficient state. Suffix access state features with offset values not less than the offset threshold are recorded as incremental structural offset features. Redundancy access monitoring results for the detected session are generated based on multiple incremental structural offset features.
[0058] This invention focuses on the post-boundary appending process of the session under test after reaching the full access state corresponding to the target session result type, and compares this appending process with historical low-increment appending processes to achieve targeted monitoring of redundant access behavior. It identifies unnecessary data interaction extensions, node expansions, and data transmission or repeated access that may exist in the access behavior, thereby providing data support for refined monitoring of the privacy data access process of the Internet of Vehicles, optimization of access flow, and analysis of potential risks.
[0059] A second aspect of this invention provides a privacy data access monitoring system based on vehicle network communication logs. Please refer to [link to relevant documentation]. Figure 2 ,include: The communication log data acquisition module is used to collect communication log data generated by multiple communication nodes in the vehicle network system during data interaction, and extract multiple access sessions from the communication log data. The session result type identification module is used to extract multiple session result features for each access session to construct a session result signature, and classify multiple access sessions based on the session result signature to generate multiple session result type sets; The sufficient boundary generation module is used to construct the access path sequence for each access session, extract the individual sufficient access boundaries of multiple access sessions in the session result type set and fuse them to generate the group sufficient boundary for each session result type set; The low-increment suffix extraction module is used to extract the suffix sequence of each access session in the session result type set based on the individual full access boundary, and to identify low-increment suffixes among multiple suffix sequences. The redundant access monitoring module is used to acquire the sessions to be monitored in the vehicle network system, identify the target session result type to which the session to be monitored belongs, identify the access redundancy characteristics of the session to be monitored based on the sufficient boundary of the group and multiple low-increment suffixes, and generate the redundant access monitoring results of the session to be monitored.
[0060] The above are merely specific embodiments of the present invention, enabling those skilled in the art to understand or implement the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art. Parts not described in detail in this specification are prior art known to those skilled in the art.
Claims
1. A method for monitoring privacy data access based on vehicle network communication logs, characterized in that, include: Collect communication log data generated by multiple communication nodes in the vehicle network system during data interaction, and extract multiple access sessions from the communication log data; Extract multiple session result features for each access session to construct a session result signature, and classify multiple access sessions based on the session result signature to generate multiple session result type sets; Construct the access path sequence for each access session, extract the individual full access boundaries of multiple access sessions in the session result type set and merge them to generate the group full access boundary for each session result type set; Based on the individual full access boundary, extract the suffix sequence of each access session in the session result type set, and identify the low incremental suffix among multiple suffix sequences; The system acquires the sessions to be detected in the vehicle network system, identifies the target session result type to which the sessions to be detected belong, identifies the access redundancy characteristics of the sessions to be detected based on the sufficient boundary of the group and multiple low-increment suffixes, and generates the redundancy access monitoring results of the sessions to be detected.
2. The method for monitoring privacy data access based on vehicle network communication logs according to claim 1, characterized in that, Extract and merge the individual sufficient access boundaries of multiple access sessions from the session result type set to generate the group sufficient boundary for each session result type set, including: Based on the access path sequence, construct multiple prefix subsequences for each access session in the session result type set, and extract multiple node access state features for each prefix subsequence to construct a node access state vector; Calculate the access state matching parameters of each prefix subsequence and the other prefix subsequences in the session result type set based on the node access state vector, and identify multiple nearest neighbor samples of each access session with respect to each prefix subsequence based on the access state matching parameters; Construct a nearest neighbor sample set for each prefix subsequence, including multiple nearest neighbor samples. Determine the result matching group of the prefix subsequence in the nearest neighbor sample set based on the session result signature. Calculate the boundary reference index of each prefix subsequence based on the number of samples in the nearest neighbor sample set and the result matching group. Based on the boundary reference index, candidate individual subsequences among multiple prefix subsequences included in the access session are determined. The node access state vector of the candidate individual subsequence is recorded as the individual full access boundary of the access session. Multiple individual full access boundaries in the nearest neighbor sample set are merged to obtain the group full boundary of the session result type set.
3. The method for monitoring privacy data access based on vehicle network communication logs according to claim 2, characterized in that, Identifying low-increment suffixes in multiple suffix sequences includes: For the suffix sequence of the access session, the candidate individual subsequences are removed from the access path sequence of the access session to obtain the suffix sequence of the access session. The result increment parameter of the suffix sequence is calculated based on the boundary reference index of the candidate individual subsequences. Based on the result increment parameter, multiple low increment suffixes are determined from multiple suffix sequences.
4. The method for monitoring privacy data access based on vehicle network communication logs according to claim 3, characterized in that, Identify the target session outcome type to which the session to be detected belongs, identify the access redundancy characteristics of the session to be detected based on the sufficient boundary of the group and multiple low-increment suffixes, and generate the redundancy access monitoring results of the session to be detected, including: Construct the target session result signature and target access path sequence of the session to be detected. Based on the target session result signature, match the session to be detected with multiple session result types, determine the session result type that matches the session to be detected best, and record it as the target session result type. Extract multiple candidate prefix sequences from the target access path sequence, construct the target node access state vector for each candidate prefix sequence, and identify the individual boundary nodes of the session to be detected based on the group sufficient boundary of the target session result type and the multiple target node access state vectors. Based on individual boundary nodes, the suffix sequence to be detected is extracted from the target access path sequence. Based on multiple low-increment suffixes of the target session result type, the access behavior expansion analysis of the session to be detected is performed to generate redundant access monitoring results of the session to be detected.
5. A method for monitoring privacy data access based on vehicle network communication logs according to claim 4, characterized in that, Based on multiple low-incremental suffixes of the target session result type, the following access behavior expansion analysis of the session to be detected is performed: Extract multiple suffix access state features for each low-increment suffix and construct a suffix access state vector. Determine the suffix baseline parameters for each suffix access state feature based on the multiple suffix access state vectors. Construct a low-increment state baseline for the target session result type based on the multiple suffix baseline parameters. Extract the incremental state sequence of the session to be detected based on individual boundary nodes, construct the target incremental structure vector corresponding to the incremental state sequence, extract multiple incremental structure offset features of the session to be detected with respect to the target incremental structure vector based on the low incremental state baseline, and generate the redundant access monitoring results of the session to be detected.
6. A method for monitoring privacy data access based on vehicle network communication logs according to claim 4, characterized in that, Based on the group sufficient boundary of the target session outcome type and the access state vectors of multiple target nodes, the individual boundary nodes of the session to be detected are identified, including: Multiple target node access state vectors are matched with the group sufficient boundary of the target session result type. The boundary matching distance between each target node access state vector and the group sufficient boundary is calculated. The communication node containing the target node access state vector with the smallest boundary matching distance is selected as the individual boundary node of the session to be detected.
7. A privacy data access monitoring system based on vehicle network communication logs, characterized in that, The system is used to implement the privacy data access monitoring method based on vehicle network communication logs as described in any one of claims 1-6, including: The communication log data acquisition module is used to collect communication log data generated by multiple communication nodes in the vehicle network system during data interaction, and extract multiple access sessions from the communication log data. The session result type identification module is used to extract multiple session result features for each access session to construct a session result signature, and classify multiple access sessions based on the session result signature to generate multiple session result type sets; The sufficient boundary generation module is used to construct the access path sequence for each access session, extract the individual sufficient access boundaries of multiple access sessions in the session result type set and fuse them to generate the group sufficient boundary for each session result type set; The low-increment suffix extraction module is used to extract the suffix sequence of each access session in the session result type set based on the individual full access boundary, and to identify low-increment suffixes among multiple suffix sequences. The redundant access monitoring module is used to acquire the sessions to be monitored in the vehicle network system, identify the target session result type to which the session to be monitored belongs, identify the access redundancy characteristics of the session to be monitored based on the sufficient boundary of the group and multiple low-increment suffixes, and generate the redundant access monitoring results of the session to be monitored.