Safety interaction method, device and electronic equipment based on online car-hailing

By acquiring video streams, audio streams, and location data from ride-hailing users, and utilizing a lightweight emotion recognition model for multi-dimensional risk assessment, dynamically adjusting the interface, and encrypting uploaded data, the complexity of interactions in emergency situations in ride-hailing services is resolved, enabling efficient safety assistance and risk response.

CN122245045APending Publication Date: 2026-06-19BEIJING BAILONG MAYUN TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING BAILONG MAYUN TECH CO LTD
Filing Date
2026-02-04
Publication Date
2026-06-19

Smart Images

  • Figure CN122245045A_ABST
    Figure CN122245045A_ABST
Patent Text Reader

Abstract

This application relates to the field of pattern recognition technology and discloses a safe interaction method, device, and electronic device based on ride-hailing services. The method includes: acquiring a user's video stream, audio stream, and location data; determining the user's safety risk level based on the video stream, audio stream, and location data; dynamically adjusting the interactive elements and modes of the terminal device's front-end user interface according to the safety risk level; if the safety risk level exceeds a preset threshold or a user triggers a safety request, encrypting the driving process data and uploading it to a safety service platform, and notifying emergency contacts or the police. The technical solution of this application solves the problems of complex interaction processes and low operational efficiency in emergency situations found in related technologies, improving the convenience and efficiency of the interaction process.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of pattern recognition technology, specifically to a safe interaction method, device, and electronic device based on ride-hailing services. Background Technology

[0002] With the increasing popularity of ride-hailing services, travel safety has become a growing concern.

[0003] In related technologies, on the front-end system of ride-hailing clients, when a safety issue occurs, the server sends out an "identifier." Upon receiving the "identifier," the front-end changes the interface to drive user interaction, such as displaying a button that, when clicked, allows for quick contact with an "emergency contact," or automatically triggers the contact process. However, for different emergency situations, the interaction schemes are complex and inefficient.

[0004] In other words, the relevant technologies suffer from complex interaction processes and low operational efficiency in emergency situations. Summary of the Invention

[0005] This application provides a safe interaction method, device, and electronic device based on ride-hailing services to solve the problems of complex interaction processes and low operational efficiency in emergency situations in related technologies.

[0006] Firstly, this application provides a safe interaction method based on ride-hailing services, applied to a terminal device, the method comprising:

[0007] Acquire user's video stream, audio stream, and location data; The user's security risk level is determined based on video stream, audio stream, and location data; Based on the security risk level, dynamically adjust the interactive elements and modes of the front-end user interface of the terminal device; If the safety risk level exceeds the preset threshold or a user triggers a safety assistance request, the driving data will be encrypted and uploaded to the safety service platform, and emergency contacts or the police will be notified.

[0008] In one optional implementation, the user's security risk level is determined based on the video stream, audio stream, and location data, including: A lightweight emotion recognition model is used to perform facial emotion analysis on video streams to obtain emotion risk values; Speech sentiment analysis of audio streams is performed based on a lightweight emotion recognition model to obtain speech risk values; Environmental risk assessment is conducted based on location data to obtain environmental risk values; The safety risk level is determined based on the emotional risk value, the voice risk value, and the environmental risk value.

[0009] In one optional implementation, the interactive elements and modes of the terminal device's front-end user interface are dynamically adjusted according to the security risk level, including: If the security risk level is less than the low risk threshold, the safety help control will be hidden in the front-end user interface. If the safety risk level is greater than or equal to the low risk threshold but less than the high risk threshold, the interface color tone will be adjusted and a reassuring message will be displayed, and low-frequency vibration will be triggered. If the security risk level is greater than or equal to the high risk threshold, switch to full-screen security mode, start local audio and video recording, and trigger high-frequency vibration.

[0010] In one optional implementation, the data generated during the driving process is encrypted and uploaded to a security service platform, including: Establish an encrypted communication channel with the security service platform via WebSocket; Generate a data summary for the data to be uploaded, which includes keyframes from the video stream, audio stream segments, location data, and security risk levels. After encrypting the data digest using a preset encryption algorithm, it is uploaded to the security service platform through an encrypted communication channel.

[0011] In one optional implementation, facial emotion analysis is performed on the video stream based on a lightweight emotion recognition model to obtain an emotion risk value, including: The video stream is segmented into frames to obtain multiple frames of images; Using the facial landmark detection model in a lightweight emotion recognition model, face mesh detection is performed on each frame of a multi-frame image to identify a preset number of facial feature points. Based on the positional changes of facial feature points, calculate the feature parameters that characterize facial muscle movement; Based on the matching degree between feature parameters and preset emotional features, the emotional risk value is calculated and output.

[0012] In one optional implementation, speech emotion analysis is performed on the audio stream based on a lightweight emotion recognition model to obtain a speech risk value, including: The audio stream is preprocessed to extract acoustic features, forming an acoustic feature sequence; The acoustic feature sequence is input into the LSTM neural network in the lightweight emotion recognition model; By analyzing the temporal variation patterns of tone, speed, and energy in acoustic feature sequences using LSTM neural networks, negative emotions of a preset category can be identified. Based on the identification results and intensity of negative emotions, a voice risk value is calculated and output.

[0013] Secondly, this application provides a safety interaction device based on ride-hailing services, applied to a terminal device, the device comprising: The acquisition module is used to acquire the user's video stream, audio stream, and location data; The risk level determination module is used to determine the user's security risk level based on video streams, audio streams, and location data. The interface adjustment module is used to dynamically adjust the interactive elements and modes of the front-end user interface of the terminal device according to the security risk level. The encrypted alarm module is used to encrypt and upload driving data to the safety service platform and notify emergency contacts or the police if the security risk level is higher than a preset threshold or if the user triggers a safety request.

[0014] Thirdly, this application provides an electronic device, including: a memory and a processor, which are communicatively connected to each other. The memory stores computer instructions, and the processor executes the computer instructions to perform the safe interaction method based on ride-hailing services described in the first aspect or any corresponding embodiment.

[0015] Fourthly, this application provides a computer-readable storage medium storing computer instructions for causing a computer to execute the safe interaction method based on ride-hailing services described in the first aspect or any corresponding embodiment.

[0016] Fifthly, this application provides a computer program product, including computer instructions for causing a computer to execute the safe interaction method based on ride-hailing services described in the first aspect or any corresponding embodiment.

[0017] The safe interaction method based on ride-hailing services proposed in this application achieves the following beneficial technical effects compared to existing technologies: By acquiring user video streams, audio streams, and location data, the system provides a comprehensive and synchronous data foundation for risk assessment through multi-dimensional and all-round real-time perception of user status. Based on these data, the system determines the user's security risk level, achieving quantitative assessment and automated identification of complex security risks. According to the security risk level, the system dynamically adjusts the interactive elements and modes of the terminal device's front-end user interface, achieving adaptive and intelligent guidance. If the security risk level exceeds a preset threshold or a user triggers a safety request, the system encrypts and uploads the driving process data to the safety service platform, and notifies emergency contacts or the police, constructing an automated emergency response closed loop from risk perception to emergency linkage. This improves the convenience and efficiency of the interaction process. Attached Figure Description

[0018] To more clearly illustrate the technical solutions in the specific embodiments of this application or the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0019] Figure 1 This is a schematic diagram illustrating an application scenario according to an embodiment of this application; Figure 2 This is a flowchart illustrating a safe interaction method based on ride-hailing services according to an embodiment of this application; Figure 3 This is a flowchart illustrating another safe interaction method based on ride-hailing services according to an embodiment of this application; Figure 4 This is a schematic diagram of a system architecture for safe interaction based on ride-hailing services according to an embodiment of this application; Figure 5 This is a structural block diagram of a ride-hailing-based safety interaction device according to an embodiment of this application; Figure 6 This is a schematic diagram of the hardware structure of an electronic device according to an embodiment of this application. Detailed Implementation

[0020] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0021] It is understood that before using the technical solutions disclosed in the various embodiments of this application, users should be informed of the types, scope of use, and usage scenarios of the personal information involved in this application in an appropriate manner in accordance with relevant laws and regulations, and user authorization should be obtained.

[0022] As one optional application scenario in the embodiments of this application, such as Figure 1 As shown, application 101 is installed in terminal device 110, and user 130 can interact with application 101 through terminal device 110 and / or access device of terminal device 110.

[0023] For example, application 101 can be any application that provides question-and-answer related services. For instance, application 101 could be a question-and-answer interactive application, such as a text-to-text application, an image-to-text application, etc. Figure 1In the application scenario shown, if application 101 is active, the terminal device 110 can display the interface 102 of application 101. The interface 102 may include various pages that application 101 can provide, such as interactive pages, settings pages, query pages, etc.

[0024] In some embodiments, terminal device 110 is communicatively connected to server 120 to provide services to application 101. Terminal device 110 may be a mobile terminal, fixed terminal, or portable terminal, etc., including but not limited to mobile phones, desktop computers, laptop computers, multimedia tablets, e-book devices, gaming devices, or any combination thereof, including accessories and peripherals of these devices or any combination thereof. In some embodiments, terminal device 110 may also support any type of interface, and server 120 may be various types of computing systems or servers capable of providing computing power, including but not limited to mainframes, edge computing nodes, computing devices in cloud environments, etc.

[0025] It should be noted that, Figure 1 This is merely an example of an application scenario and does not limit the scope of protection of this application.

[0026] The embodiments of this application will be described below with reference to the accompanying drawings. It should be understood that the pages shown in the drawings are merely examples, and various page designs are possible in practice. The various graphic elements on the page may have different arrangements and different visual representations, one or more elements may be omitted or replaced, and one or more other elements may also be present; no limitations are imposed on the embodiments of this application. Furthermore, the embodiments are primarily described below with reference to terminal device 110. It should be understood that the actions described relative to terminal device 110 can be performed by application 101 on terminal device 110, or can be performed by application 101 in conjunction with its server (e.g., server 120).

[0027] This application provides a safe interaction method for ride-hailing services. By analyzing in-vehicle video, audio, and location data in real time, it automatically assesses the safety risk level of drivers and passengers and dynamically adjusts the user interface accordingly (such as highlighting safety buttons and activating full-screen warnings). When the risk is too high, it automatically triggers local audio and video recording, encrypted uploading, and emergency response by linking emergency contacts / police. This solution achieves real-time and automated perception and handling of trip safety risks, significantly simplifies the operation process in emergency situations while protecting user privacy, and improves the efficiency and reliability of safe interaction.

[0028] The embodiments of this application are applicable to the client-side front-end systems of various travel software operation service (Software as a Service, SaaS) platforms, such as ride-hailing, carpooling, and taxi services.

[0029] According to an embodiment of this application, a safe interaction method based on ride-hailing is provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than that shown here.

[0030] This embodiment provides a safe interaction method based on ride-hailing services, which can be used on mobile terminal devices such as mobile phones and tablets. Figure 2 This is a flowchart of a safe interaction method based on ride-hailing services according to an embodiment of this application, such as... Figure 2 As shown, the process includes the following steps: Step S201: Obtain the user's video stream, audio stream, and location data.

[0031] Specifically, video streams refer to continuous image data captured in real time by cameras, such as the facial expressions and body movements of drivers and passengers. Audio streams refer to continuous sound data captured in real time by microphones, such as conversation content, tone of voice, and unusual noises. Location data refers to real-time geographical location and movement trajectory information obtained through a positioning module, such as latitude and longitude coordinates, real-time speed, and direction of travel, which can be provided through navigation software or vehicle terminals.

[0032] The system utilizes the terminal device's camera, microphone, and positioning module to simultaneously collect video, audio, and location information during the journey. Real-time acquisition of multi-source data provides complete and synchronized sensory input data for subsequent risk analysis, enabling comprehensive real-time monitoring of the in-vehicle environment and user status.

[0033] Step S202: Determine the user's security risk level based on the video stream, audio stream, and location data.

[0034] Specifically, the safety risk level refers to a quantitative level reflecting the degree of abnormality in a user's emotions and behaviors during the trip, based on a comprehensive analysis of video, audio, and location information. For example, there are three levels: low, medium, and high risk.

[0035] By loading a lightweight emotion recognition model on the front end, facial emotion analysis is performed on the video stream (such as recognizing mouth and eyebrow movements), and voice emotion analysis is performed on the audio stream (such as recognizing abnormal tone and speech rate). This, combined with location data, determines whether behavior is abnormal (such as prolonged stays outside preset areas). Finally, the data is fused and calculated to quantify the safety risk level. This achieves real-time, automated identification and quantification of travel safety risks, providing a precise basis for subsequent targeted interactions and responses.

[0036] Step S203: Dynamically adjust the interactive elements and modes of the front-end user interface of the terminal device according to the security risk level.

[0037] Specifically, the front-end user interface refers to the application interface that users directly see and interact with on their device screen, such as the page displaying maps, order information, and function buttons in a ride-hailing trip. Interactive elements refer to components on the front-end user interface that are operable or perceptible to users, such as buttons (e.g., an "SOS" emergency help button), prompts (e.g., reassuring text), colors, animations, etc. Modes refer to specific display and interaction states that the interface switches to adapt to different scenarios, such as normal mode, risk warning mode, or full-screen safety mode.

[0038] Based on the determined risk level, the system automatically matches and executes preset interface adjustment strategies. For example, when the risk increases, the main color of the interface can be changed to a warm color to attract attention, and the "SOS" button can be enlarged and accompanied by a pulsating animation to guide the operation; when the risk reaches the highest level, it automatically enters full-screen safety mode, hides unnecessary information, and starts recording audio and video. This achieves intelligent contextualized and adaptive interface interaction, enabling intuitive and efficient guidance for users to focus on safety status and simplify the path to help when risks emerge or erupt, significantly reducing the cognitive load and operational complexity for users in emergency situations.

[0039] In step S204, if the safety risk level is higher than the preset threshold or if the user triggers a safety request, the driving process data will be encrypted and uploaded to the safety service platform, and emergency contacts or the police will be notified.

[0040] Specifically, a preset threshold refers to a pre-defined numerical limit used to determine whether a risk has reached a level requiring an emergency response. For example, a risk level of 70 points (out of 100). A security service platform refers to a cloud-based service system responsible for receiving and processing emergency events and coordinating subsequent responses. Examples include the platform's customer service security center or automatic alarm dispatch system.

[0041] When a security risk level is determined to exceed a preset threshold (e.g., high risk) or when a user actively triggers a security assistance request, the system automatically encrypts relevant driving data (such as audio and video clips and location trajectories during the risk period) and uploads it to the cloud security service platform via a secure channel (e.g., WebSocket protocol). Upon receiving the data, the platform can automatically or manually verify it and immediately notify the user's preset emergency contacts or contact the police via an Application Programming Interface (API). This achieves an automated and standardized emergency response loop from risk identification to external coordination, ensuring that high-risk events are handled promptly, effectively, and with complete evidence, greatly improving response speed and reliability in emergency situations.

[0042] The safety interaction method based on ride-hailing services provided in this embodiment acquires the user's video stream, audio stream, and location data, enabling multi-dimensional and comprehensive real-time perception of the user's status, providing a comprehensive and synchronous data foundation for risk assessment. Based on the video stream, audio stream, and location data, the method determines the user's safety risk level, completing the quantitative assessment and automated identification of complex safety risks. According to the safety risk level, the method dynamically adjusts the interactive elements and modes of the terminal device's front-end user interface, achieving adaptive and intelligent guidance of the interactive interface. If the safety risk level exceeds a preset threshold or a user triggers a safety request, the method encrypts the driving process data and uploads it to the safety service platform, notifying emergency contacts or the police, thus constructing an automated emergency response closed loop from risk perception to emergency linkage. This improves the convenience and operational efficiency of the interaction process.

[0043] This embodiment provides a safe interaction method based on ride-hailing services, which can be used on the aforementioned mobile terminals, such as mobile phones and tablets. Figure 3 This is a flowchart of another safe interaction method based on ride-hailing services according to an embodiment of this application, such as... Figure 3 As shown, the process includes the following steps: Step S301: Obtain the user's video stream, audio stream, and location data.

[0044] Please see details Figure 2 Step S201 of the illustrated embodiment will not be described again here.

[0045] Step S302: Determine the user's security risk level based on the video stream, audio stream, and location data.

[0046] Specifically, step S302 includes: Step S3021: Perform facial emotion analysis on the video stream based on a lightweight emotion recognition model to obtain an emotion risk value.

[0047] Specifically, a lightweight emotion recognition model refers to an optimized machine learning model that can run efficiently on user terminal devices (such as mobile phones) to identify the emotions corresponding to facial expressions. Its characteristics include small model file size (e.g., less than 2MB) and low computational resource consumption. For example, a neural network model based on the TensorFlow.js framework that integrates facial landmark detection functionality. An emotion risk score is a score calculated by analyzing facial expression features (such as the state of the corners of the mouth, eyebrows, and eyes) to quantify the degree of abnormal user emotions (such as anger, fear, and tension). For example, a value between 0 and 100, with higher scores indicating more negative or agitated emotions and greater safety risks.

[0048] The system loads a pre-built lightweight model and performs frame-by-frame processing on the video stream. For each frame, the model first detects the face and locates dozens of key feature points. Then, based on the geometric relationship and motion pattern of these feature points (such as downturned corners of the mouth and furrowed brows), it determines the emotion category and intensity of the current expression and finally outputs a quantified emotion risk value.

[0049] Step S3022: Perform speech emotion analysis on the audio stream based on a lightweight emotion recognition model to obtain a speech risk value.

[0050] Specifically, the voice risk score is a rating calculated by analyzing voice signal characteristics (such as tone, speed, and energy) to quantify the degree of negative emotions (such as anger, fear, and tension) or abnormal states contained in speech. For example, a value between 0 and 100, with a higher score indicating a greater risk of emotional distress in the speech.

[0051] The system calls the speech analysis module within the same lightweight model to preprocess the acquired audio stream and extract key acoustic features (such as Mel-frequency cepstral coefficients). Subsequently, these feature sequences are input into a specific neural network within the model (such as a Long Short-Term Memory (LSTM) network) to analyze their patterns of change over time, thereby identifying negative emotions or abnormal speech features of a preset category, and quantifying the degree of matching between speech features and negative emotions into a speech risk value.

[0052] Step S3023: Conduct an environmental risk assessment based on the location data to obtain the environmental risk value.

[0053] Specifically, the environmental risk score is a value calculated by analyzing location data to quantify whether there are anomalies in the travel environment or user behavior patterns. For example, a value between 0 and 100, with a higher score indicating that the environment deviates more from normal expectations and the potential risk is greater.

[0054] The system analyzes location data from positioning modules such as the Global Positioning System (GPS) or the BeiDou Navigation Satellite System in real time. By judging behavioral patterns such as abnormal displacement within a short period of time (e.g., moving more than 500 meters within 5 minutes at an abnormal speed), prolonged stagnation in unconventional areas (e.g., remote roads), or significant deviation from the planned route, the system quantifies the degree of environmental anomalies and outputs an environmental risk value. This achieves real-time, automated anomaly detection of the physical environment and user behavior patterns, providing crucial environmental and behavioral dimension data for overall safety risk assessment and effectively supplementing potential risk clues beyond emotion recognition.

[0055] Step S3024: Determine the safety risk level based on the emotional risk value, the voice risk value, and the environmental risk value.

[0056] Specifically, a pre-defined fusion strategy is adopted (e.g., the weighted summation formula R = w1). E + w2 V + w3 C) A comprehensive risk score is calculated from the three risk values, and the final risk level is determined based on a preset score range (e.g., 0-30 is low, 31-70 is medium, and 71-100 is high). This achieves effective fusion and decision-making of multimodal risk information, overcoming the potential bias or misjudgment that may exist in single-dimensional judgments. This makes the final safety risk level assessment more comprehensive, accurate, and reliable, providing a precise and unified basis for subsequent differentiated intelligent responses.

[0057] Step S303: Dynamically adjust the interactive elements and modes of the front-end user interface of the terminal device according to the security risk level.

[0058] Specifically, step S303 includes: Step S3031: If the security risk level is less than the low risk threshold, then hide the security help control in the front-end user interface.

[0059] Specifically, a low-risk threshold refers to a pre-set numerical limit used to classify low-risk levels. For example, a comprehensive risk level score below 30 points (out of 100). A safety help control refers to an operable component on the user interface specifically designed to trigger emergency help functionality. For example, a prominent button labeled "SOS" or "Emergency Help".

[0060] When the overall safety risk level assessed is below the preset low-risk threshold, the current trip is deemed safe, and there is no need to highlight emergency functions. Therefore, the system automatically hides or minimizes "safety assistance controls" (such as the SOS button) in the user interface. This achieves intelligent display control of interface elements, keeping the interface simple when safety is assured, avoiding unnecessary elements from interfering with the user, and optimizing the user experience under normal circumstances.

[0061] In step S3032, if the safety risk level is greater than or equal to the low risk threshold and less than the high risk threshold, the interface color tone is adjusted and a reassuring prompt message is displayed, and low-frequency vibration is triggered.

[0062] Specifically, a high-risk threshold refers to a pre-set numerical limit used to classify high-risk levels. For example, a risk level comprehensive score of 70 or higher (out of 100). Interface color scheme refers to the dominant color attribute in the user interface (UI), such as the overall background and the color tendency of main areas (e.g., cool or warm colors).

[0063] When the assessed overall safety risk level falls into the medium-risk range (above the low-risk threshold but below the high-risk threshold), the system automatically executes a multi-sensory, gentle warning strategy. This includes adjusting the interface's main color scheme to a warmer color (e.g., light yellow) to attract attention, displaying reassuring text (e.g., "For help, please feel free to click the button below"), and triggering a short, low-frequency vibration of the device (e.g., lasting 200ms). This achieves non-intrusive, tiered warnings and guidance, effectively alerting users to potential risks and providing reassurance while avoiding excessive panic. Furthermore, it improves user efficiency in stressful situations by simplifying the operation path (e.g., making the help button more prominent).

[0064] Step S3033: If the security risk level is greater than or equal to the high risk threshold, switch to full-screen security mode, start local audio and video recording, and trigger high-frequency vibration.

[0065] Specifically, full-screen security mode refers to a specific interface state that takes over the entire device screen display and focuses on security emergency handling. In this mode, all application interface elements are hidden except for core security functions.

[0066] When the security risk level is determined to reach or exceed the preset high-risk threshold, the user interface is immediately and forcibly switched to full-screen security mode. This mode automatically performs three core operations: 1. Hides all interface elements unrelated to the current security situation (such as order details and advertisements), retaining only the most concise help guidance and key information; 2. Automatically starts local audio and video recording of the in-vehicle environment in the background, beginning a fixed-duration (e.g., the most recent 10 minutes) loop recording to preserve key evidence; 3. Triggers the device to vibrate with high intensity and continuity (e.g., for 1 second, repeated 3 times) to strongly alert the user. This achieves interface cleanup, automatic evidence collection, and strong warning linkage in emergency situations, minimizing operational interference, ensuring automatic preservation of on-site evidence, and driving immediate user attention and response through strong multi-sensory prompts, providing direct support for subsequent possible alarms and investigations.

[0067] In step S304, if the safety risk level is higher than the preset threshold or a user is detected to have triggered a safety assistance request, the driving process data will be encrypted and uploaded to the safety service platform, and emergency contacts or the police will be notified.

[0068] Please see details Figure 2 Step S204 of the illustrated embodiment will not be described again here.

[0069] This implementation method automatically determines the level of safety risk by collecting in-vehicle video, audio, and location data in real time and using a lightweight front-end emotion recognition model for multimodal analysis. Based on this, the user interface is dynamically adjusted (such as tiered warnings, activation of full-screen safety mode, and automatic evidence collection). In cases of high risk, encrypted uploads and external linkage responses are triggered. This achieves real-time automated perception of travel safety risks, contextualized interactive guidance, and efficient emergency closed-loop, significantly improving the timeliness and operational efficiency of safety interactions while protecting user privacy.

[0070] In some optional implementations, the step S304 above, which involves encrypting the driving process data and uploading it to the security service platform, includes the following steps: Step a1: Establish an encrypted communication channel with the security service platform via WebSocket.

[0071] Step a2: Generate a data summary for the data to be uploaded, which includes keyframes of the video stream, audio stream segments, location data, and security risk levels.

[0072] Step a3: After encrypting the data digest using a preset encryption algorithm, upload it to the security service platform through the encrypted communication channel.

[0073] Specifically, when an emergency report needs to be initiated, the system first establishes a persistent, encrypted communication channel with the cloud security service platform via the WebSocket protocol, based on protocols such as Transport Layer Security (TLS). Next, the system processes the core data to be reported (such as key frames in video, abnormal audio segments, real-time location, and risk level) to generate a unique data digest (such as a hash value) representing the batch of data. Finally, the system encrypts the data digest using a preset encryption algorithm (such as Advanced Encryption Standard (AES)) and uploads it to the security service platform through the established encrypted channel. WebSocket refers to a protocol that enables full-duplex communication over a single Transmission Control Protocol (TCP) connection.

[0074] This implementation method, by establishing a dedicated encrypted channel, generating data digests, and transmitting them encrypted, enables secure, lightweight, and real-time reporting of critical risk data. It ensures the security of data transmission, preventing information theft or tampering; by uploading data digests instead of the complete original data, it significantly reduces network transmission load, improves reporting speed, and further protects user privacy; and it provides a reliable and efficient flow of evidence for rapid analysis and decision-making on the cloud platform.

[0075] In some optional implementations, step S3021 above includes: Step b1: Perform frame segmentation on the video stream to obtain multiple frames of images.

[0076] Step b2: Using the face landmark detection model in the lightweight emotion recognition model, face mesh detection is performed on each frame of the multi-frame images to identify a preset number of facial feature points.

[0077] Step b3: Calculate the feature parameters representing facial muscle movement based on the positional changes of facial feature points.

[0078] Step b4: Calculate and output the emotional risk value based on the matching degree between the feature parameters and the preset emotional features.

[0079] Specifically, the video stream is first segmented into multiple frames based on time intervals. Then, a lightweight model's facial landmark detection component (such as MediaPipe Face Mesh) is used to perform face mesh detection on each frame, locating a preset number (e.g., 68) of facial feature points (such as the corners of the eyes and mouth). Next, feature parameters reflecting facial muscle movement are calculated based on the displacement of these feature points in consecutive frames. Finally, by matching these motion parameters with a preset emotion feature template, a quantified emotion risk value is calculated and output.

[0080] This implementation method enables fine-grained, dynamic, and quantitative analysis of facial expressions in video streams. It ensures real-time and continuous analysis through frame-by-frame processing; efficiently and accurately locates facial feature points on the terminal using a local lightweight model; captures genuine emotions more reliably by analyzing the dynamic changes of feature points rather than static images; and ultimately outputs an objectively quantified emotional risk value, providing a crucial and reliable basis for overall risk assessment.

[0081] In some optional implementations, step S3022 above includes: Step c1 involves preprocessing the audio stream to extract acoustic features, forming an acoustic feature sequence.

[0082] Step c2: Input the acoustic feature sequence into the LSTM neural network in the lightweight emotion recognition model.

[0083] Step c3 involves using an LSTM neural network to analyze the temporal variation patterns of tone, speed, and energy in the acoustic feature sequence to identify negative emotions of a preset category.

[0084] Step c4: Calculate and output the voice risk value based on the recognition results and intensity of negative emotions.

[0085] Specifically, the audio stream is first preprocessed (e.g., noise reduction, frame segmentation) and key acoustic features (e.g., Mel Frequency Ceptral Coefficients (MFCCs)) are extracted to form a feature sequence arranged in chronological order. This sequence is then input into a Long Short-Term Memory (LSTM) neural network within a lightweight model. The LSTM neural network analyzes the temporal changes and patterns of parameters such as tone, rate of speech, and energy in the feature sequence to identify predefined negative emotion categories (e.g., anger, fear), and calculates and outputs a quantified speech risk value based on the identification results and intensity.

[0086] This implementation utilizes an LSTM neural network to model the temporal dynamics of speech, enabling accurate and real-time analysis of complex emotional changes in speech. Processing is completed locally on the terminal, ensuring speech privacy and low-latency analysis. Temporal modeling more reliably captures emotional fluctuations and abnormal patterns in speech; and it outputs objectively quantified risk values, providing a stable and effective voice dimension input for integrated risk assessment.

[0087] Figure 4 This is a schematic diagram of a system architecture for safe interaction based on ride-hailing services, according to an embodiment of this application. As shown in the figure, the system architecture includes a data acquisition layer, a computing layer, an interaction presentation layer, and a security service layer. The data acquisition layer is used to collect multi-source data in real time, specifically including: 1. Camera data acquisition The browser calls the MediaDevices.getUserMedia API to request user authorization to access the camera, captures video streams in real time (frame rate ≥ 25fps), and temporarily stores video frames in local cache in binary data format.

[0088] 2. Microphone Data Acquisition Similarly, the audio stream is obtained through the getUserMedia API, and the speech is recorded in real time (sampling rate 44.1kHz) using the MediaRecorder interface. The original audio waveform data is extracted simultaneously for subsequent speech analysis.

[0089] 3. Location data acquisition Call the Geolocation API and set enableHighAccuracy: true to obtain high-precision location information (error ≤ 10 meters), update latitude and longitude data every 3 seconds, and record timestamps.

[0090] 4. Data Output The collected video frames, audio waveforms, and latitude and longitude data are packaged into structured data (JSON format), labeled with data type and collection time, and then transmitted to the computing layer.

[0091] The computational layer (local processing) is used for risk level assessment, and specifically includes: 1. Face detection and analysis Load the pre-trained MediaPipe Face Mesh model, perform facial landmark detection (68 feature points) on each frame of the video stream, identify facial features (such as the curvature of the corners of the mouth, the height of the brow bone, etc.), and output the emotional risk value E (0-100, the higher the value, the more excited the emotion).

[0092] 2. Speech Analysis Preprocess the audio data: extract MFCC as features, input them into the trained LSTM neural network model, identify changes in tone and speed in speech, determine whether there are emotions such as anger or fear, and output a speech risk value V (0-100).

[0093] 3. Location and behavior-assisted judgment Based on GPS data, if it is detected that the user's location has moved more than 500 meters within a short period of time (within 5 minutes) and the speed is abnormal (e.g., ≥60km / h, not in a car scenario), or if the user is in an unused area, an environmental risk value C (0-100) will be output.

[0094] 4. Risk Calculation The comprehensive risk level R (0-100 points) is calculated using the dynamic weighting formula R = 0.4E + 0.3V + 0.3C, and the result is transmitted to the interactive presentation layer in real time.

[0095] The interactive presentation layer includes dynamic user responses and user interactions, specifically: 1. Adaptive adjustment of the UI system's control panel (1) When R < 30 (low risk): Display the normal interface and hide security-related controls; (2) When 30 ≤ R ≤ 70 (medium risk): the safety help button (such as "SOS") is highlighted, and a reassuring message (such as "If you need help, you can click on emergency contact") pops up at the bottom of the interface. (3) When R>70 (high risk): trigger a full-screen red warning, automatically start local audio and video recording (loop over the most recent 10 minutes of data), and pop up an emergency contact list.

[0096] 2. Tactile feedback trigger Calling the navigator.vibrate() interface triggers a short vibration (200ms) for medium-risk situations and a continuous vibration (1s x 3 times) for high-risk situations, enhancing user perception.

[0097] 3. User Operation Processing If the user clicks the emergency contact button, a trigger command is generated; if they click cancel, the current alert is terminated. The command is transmitted to the security service layer as a Boolean value (true / false).

[0098] The security service layer (cloud) is used for security response and services, specifically including: 1. Encrypted data transmission When a high-risk instruction is received or an emergency contact is triggered by a user, the WebSocket encrypted channel (using the TLS 1.3 protocol) is activated to encrypt locally recorded audio and video clips, risk level, and real-time location before transmitting them to the cloud security center.

[0099] 2. Security Center Processing (1) After receiving data in the cloud, the system automatically pushes early warning information to the security center console, displaying user information, risk level and real-time location; (2) The manual reviewer confirms within 5 minutes whether it is a real emergency. If confirmed, it is marked as a threat intelligence and the police are contacted (connected to the 110 alarm platform API) and the preset emergency contacts are notified simultaneously.

[0100] 3. Data storage and compliance Audio and video data from high-risk events are encrypted and stored (AES-256 encryption) in the data lake and retained for 30 days (in compliance with data security regulations). Low / medium-risk data is automatically deleted after 24 hours of local storage.

[0101] 4. External Interface Send an SMS message containing the user's location and risk level to emergency contacts via SMS API. The content template is: "[Security Alert] User XXX is currently in a high-risk state, location: XXX. Please contact XXX for assistance."

[0102] For the closed-loop process of the above system architecture, if the risk is eliminated (the user cancels the command or the manual review finds it to be a false alarm), the system closes the encrypted channel, the UI returns to normal, and vibration and audio / video recording stop; if it is a real emergency, the incident handling result is recorded and archived in the log after the police or emergency contact responds.

[0103] Through this implementation, the aforementioned system architecture achieves fully automated closed-loop management of trip safety risks through four layers of collaboration: data collection, local computing, dynamic interaction, and cloud linkage. Its technical advantages are: front-end localized multimodal analysis ensures real-time risk identification and user privacy security; risk-driven dynamic interface adjustments significantly improve the intuitiveness and operational efficiency in emergency situations; and the encrypted reporting, manual review, and external linkage mechanisms of cloud services construct an efficient and compliant emergency response and handling closed loop, thereby comprehensively improving the proactive early warning capabilities and safety assurance level of the ride-hailing trip safety monitoring system. This embodiment also provides a safety interaction device based on ride-hailing services. This device is used to implement the above embodiments and preferred embodiments, and details already described will not be repeated. As used below, the term "module" can be a combination of software and / or hardware that implements a predetermined function. Although the device described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.

[0104] This embodiment provides a safe interaction device based on ride-hailing services, applied to terminal devices, such as... Figure 5 As shown, it includes: The acquisition module 501 is used to acquire the user's video stream, audio stream, and location data; The risk level determination module 502 is used to determine the user's security risk level based on video stream, audio stream, and location data. The interface adjustment module 503 is used to dynamically adjust the interactive elements and modes of the front-end user interface of the terminal device according to the security risk level. The encrypted alarm module 504 is used to encrypt and upload the driving process data to the safety service platform and notify emergency contacts or the police if the security risk level is higher than a preset threshold or if the user triggers a safety request.

[0105] In some alternative implementations, the risk level determination module 502 includes: The emotion risk value determination unit is used to perform facial emotion analysis on video streams based on a lightweight emotion recognition model to obtain emotion risk values; The speech risk value determination unit is used to perform speech emotion analysis on the audio stream based on a lightweight emotion recognition model to obtain a speech risk value; The environmental risk value determination unit is used to conduct environmental risk assessment based on location data and obtain environmental risk values. The safety risk level determination unit is used to determine the safety risk level based on emotional risk value, voice risk value, and environmental risk value.

[0106] In some alternative implementations, the interface adjustment module 503 includes: The low-risk processing unit is used to hide the safety help control in the front-end user interface if the safety risk level is less than the low-risk threshold. The medium-risk processing unit is used to adjust the interface color tone and display a reassuring prompt message and trigger low-frequency vibration if the safety risk level is greater than or equal to the low-risk threshold and less than the high-risk threshold. The high-risk processing unit is used to switch to full-screen security mode, start local audio and video recording, and trigger high-frequency vibration if the security risk level is greater than or equal to the high-risk threshold.

[0107] In some alternative implementations, the encrypted alarm module 504 includes: The encrypted channel establishment unit is used to establish an encrypted communication channel with the security service platform via WebSocket; The data summary generation unit is used to generate a data summary from the data to be uploaded, which includes keyframes of the video stream, audio stream segments, location data, and security risk levels. The encrypted upload unit is used to encrypt the data digest using a preset encryption algorithm and then upload it to the security service platform through an encrypted communication channel.

[0108] In some optional implementations, the emotional risk value determination unit includes: The frame-segmentation processing subunit is used to segment the video stream into frames to obtain multiple frames of images. The face detection subunit is used to perform face mesh detection on each frame of multiple images using the face key point detection model in the lightweight emotion recognition model, and identify a preset number of facial feature points. The facial feature calculation subunit is used to calculate feature parameters representing facial muscle movement based on the positional changes of facial feature points. The emotion risk calculation subunit is used to calculate and output the emotion risk value based on the matching degree between the feature parameters and the preset emotion features.

[0109] In some optional implementations, the voice risk value determination unit includes: The acoustic feature extraction subunit is used to preprocess the audio stream to extract acoustic features and form an acoustic feature sequence. The input subunit is used to input the acoustic feature sequence into the LSTM neural network in the lightweight emotion recognition model; The emotion recognition subunit is used to analyze the temporal variation patterns of tone, speech rate and energy in the acoustic feature sequence through an LSTM neural network to identify negative emotions of a preset category. The voice risk calculation subunit is used to calculate and output the voice risk value based on the recognition results and intensity of negative emotions.

[0110] The ride-hailing-based safety interaction device provided in this application can execute the ride-hailing-based safety interaction method provided in any embodiment of this application, and has the corresponding functional modules and beneficial effects for executing the method. Further functional descriptions of the above modules and units are the same as those in the corresponding embodiments described above, and will not be repeated here.

[0111] Figure 6 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application.

[0112] The following is a detailed reference. Figure 6 This diagram illustrates a suitable structural schematic for implementing the electronic device described in the embodiments of this application. The electronic device may include a processor (e.g., a central processing unit, graphics processor, etc.) 601, which can perform various appropriate actions and processes according to a program stored in read-only memory (ROM) 602 or a program loaded from memory 608 into random access memory (RAM) 603. The RAM 603 also stores various programs and data required for the operation of the electronic device. The processor 601, ROM 602, and RAM 603 are interconnected via a bus 604. An input / output (I / O) interface 605 is also connected to the bus 604.

[0113] Typically, the following devices can be connected to I / O interface 605: input devices 606 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 607 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; memory devices 608 including, for example, magnetic tapes, hard disks, etc.; and communication devices 609. Communication device 609 allows electronic devices to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 6 Electronic devices with various devices are shown, but it should be understood that it is not required to implement or have all of the devices shown, and more or fewer devices may be implemented or have instead.

[0114] Specifically, according to embodiments of this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this application include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device 609, or installed from a memory 608, or installed from a ROM 602. When the computer program is executed by the processor 601, it performs the functions defined in the ride-hailing-based safe interaction method of embodiments of this application.

[0115] Figure 6The electronic device shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of this application.

[0116] This application also provides a computer-readable storage medium. The methods described in this application can be implemented in hardware or firmware, or implemented as recordable on a storage medium, or implemented as computer code downloaded via a network and originally stored on a remote storage medium or a non-transitory machine-readable storage medium and then stored on a local storage medium. Thus, the methods described herein can be processed by software stored on a storage medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware. The storage medium can be a magnetic disk, optical disk, read-only memory, random access memory, flash memory, hard disk, or solid-state drive, etc.; further, the storage medium can also include combinations of the above types of memory. It is understood that computers, processors, microprocessor controllers, or programmable hardware include storage components capable of storing or receiving software or computer code. When the software or computer code is accessed and executed by the computer, processor, or hardware, the safe interaction method based on ride-hailing services shown in the above embodiments is implemented.

[0117] A portion of this application can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide the methods and / or technical solutions according to this application through the operation of the computer. Those skilled in the art will understand that the forms in which computer program instructions exist in a computer-readable medium include, but are not limited to, source files, executable files, installation package files, etc. Correspondingly, the ways in which computer program instructions are executed by a computer include, but are not limited to: the computer directly executing the instructions, or the computer compiling the instructions and then executing the corresponding compiled program, or the computer reading and executing the instructions, or the computer reading and installing the instructions and then executing the corresponding installed program. Here, the computer-readable medium can be any available computer-readable storage medium or communication medium accessible to a computer.

[0118] Although embodiments of this application have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of this application, and all such modifications and variations fall within the scope defined by the appended claims.

Claims

1. A safe interaction method based on ride-hailing services, characterized in that, Applied to a terminal device, the method includes: Acquire user's video stream, audio stream, and location data; The user's security risk level is determined based on the video stream, the audio stream, and the location data; Based on the security risk level, dynamically adjust the interactive elements and modes of the front-end user interface of the terminal device; If the security risk level is higher than a preset threshold or if the user triggers a security request, the driving data will be encrypted and uploaded to the security service platform, and emergency contacts or the police will be notified.

2. The method according to claim 1, characterized in that, Determining the user's security risk level based on the video stream, the audio stream, and the location data includes: A lightweight emotion recognition model is used to perform facial emotion analysis on the video stream to obtain an emotion risk value; Based on the lightweight emotion recognition model, speech emotion analysis is performed on the audio stream to obtain a speech risk value; An environmental risk assessment is performed based on the location data to obtain an environmental risk value. The safety risk level is determined based on the emotional risk value, the voice risk value, and the environmental risk value.

3. The method according to claim 1, characterized in that, Based on the security risk level, dynamically adjust the interactive elements and modes of the front-end user interface of the terminal device, including: If the security risk level is less than the low risk threshold, then the security help control is hidden in the front-end user interface; If the safety risk level is greater than or equal to the low risk threshold and less than the high risk threshold, the interface color tone is adjusted and a reassuring prompt message is displayed, and low-frequency vibration is triggered. If the security risk level is greater than or equal to the high risk threshold, switch to full-screen security mode, start local audio and video recording, and trigger high-frequency vibration.

4. The method according to claim 1, characterized in that, The process of encrypting and uploading driving data to the security service platform includes: An encrypted communication channel with the security service platform is established via WebSocket; Generate a data digest for the data to be uploaded, which includes keyframes of the video stream, audio stream segments, location data, and security risk level; After the data digest is encrypted using a preset encryption algorithm, it is uploaded to the security service platform through the encrypted communication channel.

5. The method according to claim 2, characterized in that, The process of performing facial emotion analysis on the video stream based on a lightweight emotion recognition model to obtain an emotion risk value includes: The video stream is segmented into frames to obtain multiple frames of images; Using the facial landmark detection model in the lightweight emotion recognition model, facial mesh detection is performed on each frame of the multi-frame images to identify a preset number of facial feature points. Based on the positional changes of the facial feature points, the characteristic parameters representing facial muscle movement are calculated; Based on the matching degree between the feature parameters and the preset emotional features, the emotional risk value is calculated and output.

6. The method according to claim 2, characterized in that, The process of performing speech emotion analysis on the audio stream based on the lightweight emotion recognition model to obtain a speech risk value includes: The audio stream is preprocessed to extract acoustic features, forming an acoustic feature sequence; The acoustic feature sequence is input into the LSTM neural network in the lightweight emotion recognition model; The LSTM neural network is used to analyze the temporal variation patterns of tone, speech rate, and energy in the acoustic feature sequence to identify negative emotions of a preset category. Based on the recognition results and intensity of the negative emotions, the voice risk value is calculated and output.

7. A safety interaction device based on ride-hailing services, characterized in that, Applied to a terminal device, the device includes: The acquisition module is used to acquire the user's video stream, audio stream, and location data; The risk level determination module is used to determine the user's security risk level based on the video stream, the audio stream, and the location data. The interface adjustment module is used to dynamically adjust the interactive elements and modes of the front-end user interface of the terminal device according to the security risk level. The encrypted alarm module is used to encrypt and upload the driving process data to the security service platform and notify emergency contacts or the police if the security risk level is higher than a preset threshold or if the user triggers a safety request.

8. An electronic device, characterized in that, include: A memory and a processor are interconnected, the memory stores computer instructions, and the processor executes the computer instructions to perform the safe interaction method based on ride-hailing services as described in any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions for causing the computer to execute the safe interaction method based on ride-hailing services as described in any one of claims 1 to 6.

10. A computer program product, characterized in that, It includes computer instructions for causing a computer to execute the safe interaction method based on ride-hailing services as described in any one of claims 1 to 6.