A multi-scene blind guiding assistance and danger detection method

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using multimodal perception and scenario-based intelligent guidance, and by generating 3D point clouds using YOLO models and depth cameras for obstacle detection and hazard warning, the semantic understanding and real-time safety issues of guide devices in complex environments have been solved, enabling safe navigation and target recognition for blind people.

CN122244790APending Publication Date: 2026-06-19JILIN UNIVERSITY

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: JILIN UNIVERSITY
Filing Date: 2026-03-18
Publication Date: 2026-06-19

Application Information

Patent Timeline

18 Mar 2026

Application

19 Jun 2026

Publication

CN122244790A

IPC: G06V20/52; G08B31/00; G08B7/06; G08B21/04; A61F9/08; G06V10/82; G06V10/764; G06N3/0464; G06V10/12

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing guide devices for the blind lack semantic understanding capabilities in complex environments, are unable to identify obstacle types and environmental characteristics, and lack real-time hazard detection, leading to safety hazards for blind people while they are moving around.

⚗Method used

Employing multimodal perception and scene-based intelligent guidance, the system trains an image dataset using a YOLO model, combines color and depth images acquired by a depth camera to generate 3D point clouds for obstacle detection, performs scene discrimination and hazard detection, and provides multimodal feedback output.

🎯Benefits of technology

It enables safe navigation and target recognition in complex environments, provides scenario-specific path guidance and real-time hazard warnings, and improves the safety and navigation accuracy of blind people traveling.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122244790A_ABST

Patent Text Reader

Abstract

This invention relates to a multi-scenario guidance and hazard detection method for the blind, belonging to the field of computer vision and intelligent assistive device technology. It includes steps such as data acquisition and YOLO model training, visual information processing, hazard detection, scene discrimination, and multimodal feedback output. The invention uses a hat equipped with binocular cameras to acquire images and calculate distances, employs a Jason Nano chip to run algorithms, and rapidly transmits information through headphones and vibrating wristbands worn on both hands. This invention dynamically employs different algorithms for recognition and analysis to address the varying latency and accuracy requirements of different everyday scenarios, providing guidance for the blind while simultaneously monitoring for dangerous objects in real time.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of computer vision and intelligent assistive device technology, and in particular relates to a method for guiding and assisting blind people and detecting dangers in multiple scenarios. Background Technology

[0002] Visually impaired individuals face difficulties in environmental perception and obstacle avoidance when traveling independently. Traditional guide canes have a detection range limited to the ground and lack semantic information, while guide dogs face issues of high cost and limited accessibility. Although various electronic assistive products have emerged on the market, significant technical limitations still exist in practical applications.

[0003] The first type is electronic canes based on ultrasonic or radar ranging. These devices indicate the distance to obstacles through vibration or beeping, but their core drawback is the lack of semantic understanding. The devices can only mechanically provide feedback that "there is an object in front," and cannot identify whether the obstacle is a pedestrian, vehicle, or static facility. They also cannot perceive environmental features such as zebra crossings or shop entrances, and therefore cannot support decision-making for blind people in complex environments.

[0004] The second category is intelligent guide glasses based on image recognition. Although these devices have semantic recognition capabilities, they suffer from three major drawbacks: First, they lack processing mechanisms for special scenarios, and general algorithms cannot adapt to complex scenarios such as crossing intersections or finding specific entrances, resulting in low guidance accuracy. Second, they lack a persistent hazard detection mechanism; the devices focus on responding to user recognition commands, neglecting continuous monitoring of sudden hazards, creating safety blind spots. Third, the real-time nature of voice broadcasting is insufficient; delays in the recognition and broadcasting process lead to delayed warnings, failing to meet the immediate hazard avoidance needs of blind people while walking. Summary of the Invention

[0005] To address the aforementioned problems, this invention provides a method for guiding and assisting blind people in various scenarios and for hazard detection. Through multimodal perception and scenario-based intelligent guidance, it enables blind people to navigate safely and identify targets in complex environments.

[0006] The technical solution adopted in this invention is as follows:

[0007] A method for providing guidance and hazard detection for the blind in multiple scenarios includes the following steps:

[0008] S1. Data Acquisition and YOLO Model Training: Collect a dataset of 1920*1080 resolution images of common scenes in the lives of blind people in the field. Collect 1000 images for each scene and train the YOLOv8 object detection model.

[0009] S2, Visual Information Processing: The depth camera simultaneously acquires color and depth images of the environment and performs YOLO target recognition;

[0010] S3 Hazard Detection: Based on the depth image obtained in S2, generate a 3D point cloud, perform hazard detection, determine whether there are obstacles, pedestrians or other potential hazards in the environment, and generate graded early warning information when a hazard is detected;

[0011] S4. Scene discrimination: Based on the visual information processing results of S2, perform scene discrimination to determine whether the current scene is a corridor guidance scene, a zebra crossing scene, a scene to assist in the last ten meters to enter a store scene, or a scene to help a blind person find clothes.

[0012] S5. If the scene is a staircase guidance scene, then perform a line detection operation to identify the number of steps of the staircase.

[0013] S6. If the scene is a zebra crossing scene, perform Canny edge detection and Hough line transform operations to extract the edge and line features of the zebra crossing to provide crossing direction guidance.

[0014] S7. If the scenario is to assist in entering the store within the last ten meters, then perform YOLO recognition to locate the store entrance and surrounding signs to provide path guidance.

[0015] S8. If the scenario involves helping a blind person find clothes, then execute the AI-assisted search operation to locate the target clothing and provide location guidance;

[0016] S9, Multimodal Feedback Output: The scenario-based guidance information from S5 to S8 and the danger warning information from S3 are output to the user through headphones or a vibration wristband.

[0017] Furthermore, step S2 specifically includes the following steps:

[0018] S21. Real-time data acquisition: Continuously acquire image data using a depth camera.

[0019] S22, YOLO Target Recognition: Input a color image into the YOLO model, and output the category, location, and confidence level of the target in the image.

[0020] Furthermore, step S3 specifically includes the following steps:

[0021] S31. Obtain the 3D coordinates of hazard-handling objects identified by YOLO using the depth map;

[0022] S32. Use DeepSORT to maintain the cross-frame identity of hazardous objects to obtain stable trajectory numbers and corresponding detection box sequences;

[0023] S33. For each confirmed trajectory ID output by S32, establish a 3D Kalman filter, use the three-dimensional observation (X, Y, Z) calculated from the depth as the measurement input, and output smooth three-dimensional position and velocity for collision time TTC and future position prediction.

[0024] S34. Calculate the relative radial velocity. and collision time TTC and according to Hazard levels are determined by TTC values:

[0025] (1) Safety: TTC > 5 seconds or No warning required;

[0026] (2) Low risk: 3 seconds < TTC ≤ 5 seconds, voice prompt;

[0027] (3) Medium risk: 1.5 seconds < TTC ≤ 3 seconds, both wristbands vibrate once simultaneously;

[0028] (4) High risk: TTC ≤ 1.5 seconds, continuous vibration of the wristbands;

[0029] Furthermore, step S4 specifically includes the following steps:

[0030] S41. Feature Extraction: Extract scene features from YOLO recognition results and 3D point cloud data, including the number of lines, edge density, and target category distribution;

[0031] S42, Scene Classification: Input the extracted scene features into the scene discrimination model and output the current scene category.

[0032] Furthermore, in step S5, the staircase guidance operation specifically includes:

[0033] S51. Perform grayscale conversion and Gaussian blur preprocessing on the image;

[0034] S52. Perform Canny edge detection to extract the staircase edge contour and calculate the number of staircase steps;

[0035] S53. Detect potential straight lines for each pixel using Hough transform, then perform density clustering on the resulting straight line clusters according to their features, and merge straight lines from the same cluster.

[0036] S54. Count the number of staircases;

[0037] S55. The location and number of stairs are announced to the user via voice.

[0038] Furthermore, in step S6, guiding pedestrians across the zebra crossing specifically includes:

[0039] S61. The preprocessing, edge detection, and line detection processes are the same as those in S51, S52, and S53.

[0040] S62. Calculate the two zebra crossing vertical line clusters from the results of S61.

[0041] S63. Calculate the current position information of the zebra crossing relative to the blind person;

[0042] S64. The location information of the zebra crossing relative to the blind person is read aloud to the user via voice.

[0043] Furthermore, in step S7, the YOLO recognition operation specifically includes:

[0044] S71. Data source filtering and preprocessing;

[0045] S72, Character Recognition and Comparison;

[0046] S73. Based on the information obtained in S72, guide information is provided through vibration feedback from the wristband.

[0047] Furthermore, in step S8, the AI-assisted clothing search operation specifically includes:

[0048] S81. Perform clothing recognition on the collected wardrobe images, determine the position and feature information of each garment from left to right, and announce the serial number, type, color and applicable scenario of each garment in turn by voice.

[0049] S82. Receive clothing selection instructions input by the blind person, and match and locate the target clothing from the clothing;

[0050] S83. Provide detailed information about the target clothing and usage suggestions to the blind person via voice.

[0051] Furthermore, in step S9, the feedback output specifically includes:

[0052] S91: Converts non-urgent, precise guidance information into voice prompts and outputs them through headphones;

[0053] S92. The danger warning information and emergency guidance information are converted into vibration signals and output through the vibration wristband. Different types of warnings correspond to different vibration modes.

[0054] Furthermore, this invention uses a hat with a dual-lens camera to capture images and calculate distances, runs an algorithm using a JasonNano chip, and rapidly transmits information through headphones and vibrating wristbands worn on both hands.

[0055] Beneficial effects:

[0056] 1. This invention dynamically employs different algorithms for identification and analysis to address the varying latency and accuracy requirements of different scenarios in daily life, providing guidance for the blind and simultaneously monitoring dangerous objects in real time.

[0057] 2. This invention identifies common scenes in daily life and automatically routes them to appropriate algorithms for processing, including YOLO object detection, PIDNet semantic segmentation, 3D point cloud, integrated LLM, PaddleOCR text recognition, Canny edge detection, and Hough transform. Simultaneously, a real-time hazard detection and early warning algorithm is designed, which integrates YOLO, DeepSORT, and Kalman filtering algorithms to achieve hazard detection, trajectory prediction, and hazard level classification, employing different feedback methods for different hazard levels. Attached Figure Description

[0058] Figure 1 This is the overall flowchart of the present invention. Detailed Implementation

[0059] The present invention will now be described in more detail with reference to the accompanying drawings, and specific embodiments will be provided:

[0060] Example 1: Overall Process of the Invention

[0061] As shown in Figure 1, the method for guiding and assisting the blind in various scenarios and detecting hazards provided by this invention includes the following steps in its overall implementation process:

[0062] S1. Data Acquisition and YOLO Model Training: Collect high-quality image datasets of common scenes in the lives of blind people, such as zebra crossings, corridors, shop surroundings, and clothing storage. After performing data augmentation processing such as image scaling, random cropping, and flipping on the datasets, they are used to train the YOLOv8 object detection model.

[0063] S2. Visual Information Processing: Simultaneously acquire color and depth images of the environment using a depth camera, perform YOLOv8 target recognition on the color images, and output the category, location, and confidence level of the targets in the images;

[0064] S3 Hazard Detection: Combining the visual information processing results of S2, a three-dimensional point cloud is generated to detect, predict the trajectory of, and determine the hazard level of potential dangerous targets such as obstacles and moving objects in the environment. If a hazard is detected, a corresponding graded early warning information is generated.

[0065] S4. Scene Recognition: Extract scene features based on the visual information processing results of S2, perform intelligent scene recognition, and determine whether the current scene is a corridor guidance scene, a zebra crossing scene, a scene to assist in the last ten meters to enter a store scene, or a scene to help a blind person find clothes.

[0066] S5. Stairwell guidance processing: If the current scenario is stairwell guidance, perform image preprocessing, edge detection and line detection operations to identify the number of stairs and location information;

[0067] S6. Zebra crossing guidance processing: If the current scenario is crossing a zebra crossing, perform Canny edge detection and Hough line transform line detection operations to extract the edge and line features of the zebra crossing and provide crossing direction guidance for the blind.

[0068] S7. Store entrance guidance processing: If the current scenario is assisting the last ten meters into the store, perform YOLO target localization combined with OCR text recognition to locate the store entrance and surrounding signs, providing path guidance for the blind.

[0069] S8. Clothing Search Assistance: If the current scenario is to help a blind person find clothing, execute the AI visual model to assist in the search, match and locate the position of the target clothing, calculate the direction and distance from the user to the target clothing, and provide location guidance.

[0070] S9. Multimodal Feedback Output: The scenario-based guidance information from S5 to S8 and the danger warning information from S4 are output to the user in a multimodal manner through headphone voice broadcast and wristband vibration. Scenario-based guidance information is primarily delivered via headphone voice broadcast, while danger warning information, depending on its severity, corresponds to headphone voice prompts and vibration warnings in different modes on the wristbands.

[0071] Example 2: Data Acquisition and YOLO Model Training

[0072] S11. Data Collection: Collect high-quality image datasets of common scenes in the lives of blind people, such as zebra crossings, corridors, shop surroundings, and clothing storage. Divide the data into training sets and test sets.

[0073] S12. Data preprocessing: Convert images from different sources to the RGB color gamut required by the YOLO model; standardize the size by cropping and scaling the image to 640*640 pixels; normalize the pixel values from the integer range of [0,255] to the floating-point range of [0,1].

[0074] S13. Data labeling: Assign different class labels to different objects in the collected color image information (zebra crossing ID=0, corridor ID=1, shop ID=2, wardrobe clothing ID=3, other dangerous objects ID>=4), and obtain the object set origin_objects.

[0075] S14. Train the YOLOv8 model using your own high-quality dataset.

[0076] Example 3 Visual Information Processing

[0077] S2. Visual Information Preprocessing: Acquire information and perform target detection using YOLOv8.

[0078] S21. Continuously acquire image data and depth information using a depth camera.

[0079] S22, YOLOv8 object detection, specifically includes the following steps:

[0080] S221. Image preprocessing: Adjust the input RGB image to 640×640 pixels and perform normalization processing, normalizing the pixel value range from [0, 255] to [0, 1].

[0081] S222, Feature extraction: Use the YOLOv8 backbone network CSPDarknet to extract multi-scale feature maps;

[0082] S223, Feature fusion: Multi-scale features are fused through the Feature Pyramid Network (FPN) and the Path Aggregation Network (PAN).

[0083] S224. Object Detection: Target detection is performed on the fused feature map, outputting the center coordinates, width, height, target category, and confidence score of the bounding box. The system distinguishes between dangerous objects and scene-specific objects, employing dynamic hazard detection algorithms and automatic routing based on the scene to different algorithms for processing, respectively.

[0084] S225. Post-processing: Non-maximum suppression (NMS) is applied to the detection results, retaining those with a confidence score greater than 0.5.

[0085] S226. Subsequent algorithm processing distinguishes between dangerous objects and special objects identified in the scene, employing dynamic hazard detection algorithms and automatic routing based on the scene to different processing algorithms respectively.

[0086] Example 4 Hazard Detection Model

[0087] S3, Hazard Detection: Filtering The objects are subjected to hazard detection.

[0088] S31. Obtaining the three-dimensional coordinates of the target area, specifically including the following steps:

[0089] S311, Depth Map Acquisition: Use a depth camera to acquire a depth map corresponding to the RGB image;

[0090] S312. Target region depth extraction: Based on the bounding box coordinates obtained by YOLOv8 detection in S11, extract the depth information within the corresponding bounding box in the depth map, calculate the coordinates of the center point of the bounding box, and obtain the depth value Z of that point.

[0091] S313, Depth value filtering: Performs median filtering on the depth values extracted within the bounding box to eliminate depth measurement noise;

[0092] S314. Coordinate Transformation: Using the camera intrinsic parameter matrix, 2D pixel coordinates and depth values are converted into 3D spatial coordinates. The transformation formula is as follows:

[0093]

[0094]

[0095] in( ( ) represents the coordinates of the camera's principal point. and For camera focal length parameters, The target's three-dimensional coordinates in the camera coordinate system.

[0096] S32. DeepSORT cross-frame identity maintenance, obtaining stable trajectory numbers and corresponding detection box sequences (containing 2D motion model KF and Hungarian matching), specifically includes the following steps:

[0097] S321. Appearance feature extraction: The ResNet-50 network is used to extract appearance features from the target image within each detection bounding box, resulting in a 128-dimensional feature vector.

[0098] S322, Motion prediction: For an existing trajectory, use a Kalman filter to predict the position of the target in the current frame;

[0099] S323. Data association: Calculate the cost matrix between the detection results and existing trajectories. The cost includes Mahalanobis distance and cosine distance. The comprehensive cost calculation formula is as follows:

[0100]

[0101] in These are the weighting coefficients;

[0102] S324, Hungarian algorithm matching: The Hungarian algorithm is used to solve the optimal matching problem and the detection results are assigned to existing trajectories.

[0103] S325. Trajectory Management: For successfully matched detection results, update the status of the corresponding trajectory; for unmatched detection results, create a new trajectory and assign a new ID; for consecutive unmatched trajectories, delete them.

[0104] S33. For each confirmed trajectory ID output by S13, establish a 3D Kalman filter to perform inverse depth calculation of the 3D observation. As a measurement input, it outputs smooth 3D position and velocity for TTC and future position prediction; it does not participate in ID association decisions, specifically including the following steps:

[0105] S331. State-space modeling: Define the target's state vector, including three-dimensional position and three-dimensional velocity.

[0106] S332. State transition model: A uniform motion model is adopted, and the state transition equation is:

[0107]

[0108] in Here is the state transition matrix. This is process noise;

[0109] S333, Observation Model, Observation Equation:

[0110]

[0111] in For the observation matrix, For measuring noise;

[0112] S334, Prediction Step: Based on the posterior estimate of the previous time step, perform state prediction and calculate the prior state estimate and prior error covariance.

[0113] S335, Update step: Calculate the Kalman gain based on the current observations, and update the state estimate and error covariance;

[0114] S336. Future location prediction: Based on the current state estimate, predict the future location.

[0115] S34. Collision time calculation and hazard level classification, specifically including the following steps:

[0116] S341. Self-motion compensation, obtaining the subject's own position. and speed of movement Record the target location as Speed is denoted as ;

[0117] S342. Relative distance calculation: Calculate the relative distance between the detected target and the main body. ;

[0118] S343. Radial velocity calculation: Calculate the radial approach velocity of the target relative to the main body, i.e., the velocity component along the connecting line:

[0119]

[0120] S344, Collision Time TTC Calculation, when Calculate the collision time:

[0121] TTC =

[0122] S345. Hazard classification, based on TTC values, divides hazard levels into four levels:

[0123] (1) Safety: TTC > 5 seconds or No warning required;

[0124] (2) Low risk: 3 seconds < TTC ≤ 5 seconds, voice prompt;

[0125] (3) Medium risk: 1.5 seconds < TTC ≤ 3 seconds, both wristbands vibrate once simultaneously;

[0126] (4) High risk: TTC ≤ 1.5 seconds, continuous vibration of the wristbands;

[0127] S346. Warning Information Output: The system outputs the target ID, three-dimensional coordinates, speed, TTC value, and danger level, and provides warning prompts through appropriate means.

[0128] Example 5 Scene Recognition

[0129] S4. Scene Differentiation: Based on the YOLO target recognition results, scene differentiation is performed. Scenes are mutually exclusive; priority is given to scenarios where the user activates the "Find Store" function via voice command.

[0130] S41. If so, the current scenario is an auxiliary scenario for entering the store in the last ten meters.

[0131] S42. If not, filter the original object set. The object contains only scene class objects. And according to Remaining objects Make a judgment:

[0132] S421. Notify the user via voice broadcast that in the scene "{left front / right front / left side / right side}{depth}{stairs / zebra crossing / wardrobe}" (wherein...) This is the result of median filtering and averaging the depth map within the detection box. A timer is allocated for the voice broadcast. .

[0133] S422, Preferred Selection The smallest object is used as the current scene.

[0134] Example 6: Hallway Guidance

[0135] Step S5 specifically includes the following steps:

[0136] S51. Pretreatment:

[0137] The original image is cropped according to the detection bounding box, and the cropped image is then processed into grayscale:

[0138] Then it is filtered by convolution with a Gaussian kernel.

[0139] S52. Edge Detection: Use Canny edge detection to highlight the edge features of the stairs, and perform a closing operation on the detection results to filter out the interference of messy lines.

[0140] S53. Line detection: The potential lines at each pixel are detected by Hough transform. The resulting line clusters are then clustered by density according to features (slope, intercept), and lines in the same cluster are merged (using the line features of the center point).

[0141] The clustering operation is as follows:

[0142] Convert the straight line obtained from the Hough transform to The form b records the features. ,in , The height of the image.

[0143] right Normalization process is performed to obtain :

[0144] in, .

[0145] 3. Use DBSCAN to input features. Set parameters , .

[0146] S54, Linear Statistics:

[0147] Iterate through the set of lines, count the number of lines whose slopes fall in the same region, and obtain an initial set of numbers. .

[0148] Set a minimum valid threshold Tmin, and remove lines with fewer than Tmin to obtain the candidate set:

[0149]

[0150] like , ;

[0151] like Calculate the set mean Standard deviation Only keep The number within the range is obtained as follows:

[0152]

[0153] Finally take ;

[0154] S55, Finally Feedback is provided to the user via voice broadcast.

[0155] Example 7: Zebra Crossing Guidance

[0156] Step S6 mainly includes the following steps:

[0157] S61, Preprocessing, Edge Detection, and Line Detection Process (same as S51, S52, S53). This results in a set of line clusters. ( It is the first (A cluster containing multiple straight lines).

[0158] S62. Based on the clustering results of S61, select two clusters of zebra stripe vertical lines, as follows:

[0159] S621. Find the mean slope and size of each line cluster:

[0160] ;

[0161] S622, Use right{ Clustering, with additional parameters (The number of clusters is equal to 2, represented by two slopes, horizontal and vertical, resembling a zebra crossing). Note that the core logic of K_mean is adjusted here to change the cluster size. As weights, the distance formula becomes:

[0162]

[0163] Where c is the cluster center. The larger, The greater the penalty for deviating from c, the closer the cluster centers will be to larger clusters. Ultimately, according to... The result will divide set C into two sets, S1 and S2.

[0164] S623, Calculation , Obtain the set of zebra crossing vertical lines.

[0165]

[0166] from Extract the two largest straight line clusters from the middle. As a family of vertical straight lines, the slope of the zebra crossing vertical lines is obtained. .

[0167] S64, Verification:

[0168] S641 Straight Cluster Justification: If the slope variance within the cluster... If it is, then it is reasonable. Otherwise, the calculation is invalid.

[0169] S642 Parallelism Justification: If relative error If yes, then it is reasonable. Otherwise, the calculation is invalid. Among them, The average absolute slope, relative error threshold .

[0170] S65. Only when the data is valid, according to And the YOLO detection box provides feedback to users:

[0171] S651, Incline --Wristband vibration "rotation / circular motion":

[0172] like This indicates that the zebra crossing is horizontal, turning is meaningless, and there is no vibration.

[0173] like :

[0174] like The zebra crossing direction is indicated from the bottom left to the top right, and the vibrating right ring guides the user to turn right;

[0175] like This indicates that the zebra crossing is oriented from the bottom right to the top left, and the left wristband vibrates to guide the user to turn left;

[0176] like (Looking down) Both wristbands vibrate simultaneously, with the zebra crossing facing the user;

[0177] If the data is invalid, the bracelet will not work;

[0178] S652, Detection frame—Voice announcement "Translation / Left / Right Position":

[0179] Calculate the area of the zebra crossing detection box on the left and right sides of the image. . Area threshold .

[0180] like The voice prompt reads "right side {depth}";

[0181] like The voice prompt reads "left side {depth}";

[0182] Other than that, no announcements will be made.

[0183] Special cases: When When the announcement says "across zebra crossing ahead at {depth}", it indicates that the zebra crossing is horizontal.

[0184] Example 8: The last ten meters into the store

[0185] The specific steps of process S7 are as follows:

[0186] S71, Data Source Filtering: Filtering In the collection Get a collection of objects The objects include, but are not limited to: store entrance signs, store logos, signs on both sides of the door frame, shop windows, billboards, etc.

[0187] S72, Preprocessing: For Create a list of related dictionaries for "object-text-confidence". The dictionary element is configured as follows:

[0188]

[0189] Set threshold .

[0190] S73, Character Recognition and Comparison: Traversal For each element, call right The text within the detection box is recognized, and the recognition results are written to... ,Will Compare with the user-specified string and return the confidence score. (If identification fails, the confidence level is set to zero).

[0191] S74. Results Analysis: Filter out middle big The part that is obtained .

[0192] like If empty, it means that the target store has been identified;

[0193] like If it is not empty, it means that the target store has been identified, and the value is retrieved. Medium confidence level The largest element ,get .

[0194] S75, Feedback Information: According to The detection frame provides feedback, and the detection frame occupies the left half of the area. Area of the right half Area threshold Relative values of the left and right parts .

[0195] a) If ,Right now The voice announces "right front {depth}" while the right wristband vibrates;

[0196] b) If ,Right now The voice prompt says "left front {depth}" while the left wristband vibrates.

[0197] c) If ,Right now The voice announces "ahead {depth}" while both wristbands vibrate simultaneously.

[0198] The vibration logic of the bracelet is as follows: }, .

[0199] Example 9: AI-assisted clothing search

[0200] The specific steps of process S8 are as follows:

[0201] S81. Obtain the current image information, call the GPT-5.3 multimodal large language model through the official API, and pass in the current image data and user-defined prompts (including questions about clothing style, color, etc.).

[0202] S82. Convert the obtained text results into speech using a TTS model, and announce the serial number, type, color, and applicable scenarios of each garment in turn.

[0203] S83. Continuously listen for user commands. When the wake-up command "recognize clothes" is given, listen for the user's speech and record it in common.

[0204] S84. Call the GPT-5.3 multimodal large language model through the official API, and pass in the current image data and common data at the same time.

[0205] S83. Provide detailed information about the target clothing and usage suggestions to the blind person via voice.

Claims

1. A method for guiding and assisting blind people and detecting hazards in multiple scenarios, comprising the following steps: S1. Data Acquisition and YOLO Model Training: Collect a dataset of 1920*1080 resolution images of common scenes in the lives of blind people in the field. Collect 1000 images for each scene and train the YOLOv8 object detection model. S2, Visual Information Processing: The depth camera simultaneously acquires color and depth images of the environment and performs YOLO target recognition; S3 Hazard Detection: Based on the depth image obtained in S2, generate a 3D point cloud, perform hazard detection, determine whether there are obstacles, pedestrians or other potential hazards in the environment, and generate graded early warning information when a hazard is detected; S4. Scene discrimination: Based on the visual information processing results of S2, perform scene discrimination to determine whether the current scene is a corridor guidance scene, a zebra crossing scene, a scene to assist in the last ten meters to enter a store scene, or a scene to help a blind person find clothes. S5. If the scene is a staircase guidance scene, then perform a line detection operation to identify the number of steps of the staircase. S6. If the scene is a zebra crossing scene, perform Canny edge detection and Hough line transform operations to extract the edge and line features of the zebra crossing to provide crossing direction guidance. S7. If the scenario is to assist in entering the store within the last ten meters, then perform YOLO recognition to locate the store entrance and surrounding signs to provide path guidance. S8. If the scenario involves helping a blind person find clothes, then execute the AI-assisted search operation to locate the target clothing and provide location guidance; S9, Multimodal Feedback Output: The scenario-based guidance information from S5 to S8 and the danger warning information from S3 are output to the user through headphones or a vibration wristband.

2. The method for guiding and assisting the blind and detecting hazards in multiple scenarios according to claim 1, characterized in that, Step S2 specifically includes the following steps: S21. Real-time data acquisition: Continuously acquire image data using a depth camera. S22, YOLO Target Recognition: Input a color image into the YOLO model, and output the category, location, and confidence level of the target in the image.

3. The method for guiding and assisting the blind and detecting hazards in multiple scenarios according to claim 1, characterized in that, Step S3 specifically includes the following steps: S31. Obtain the 3D coordinates of hazard-handling objects identified by YOLO using the depth map; S32. Use DeepSORT to maintain the cross-frame identity of hazardous objects to obtain stable trajectory numbers and corresponding detection box sequences; S33. For each confirmed trajectory ID output by S32, establish a 3D Kalman filter, use the three-dimensional observation (X, Y, Z) calculated from the depth as the measurement input, and output smooth three-dimensional position and velocity for collision time TTC and future position prediction. S34. Calculate the relative radial velocity. and collision time TTC and according to Hazard levels are determined by TTC values: (1) Safety: TTC > 5 seconds or No warning required; (2) Low risk: 3 seconds < TTC ≤ 5 seconds, voice prompt; (3) Medium risk: 1.5 seconds < TTC ≤ 3 seconds, both wristbands vibrate once simultaneously; (4) High risk: TTC ≤ 1.5 seconds, the wristbands vibrate continuously.

4. The method for guiding and assisting the blind and detecting hazards in multiple scenarios according to claim 1, characterized in that, Step S4 specifically includes the following steps: S41. Feature Extraction: Extract scene features from YOLO recognition results and 3D point cloud data, including the number of lines, edge density, and target category distribution; S42, Scene Classification: Input the extracted scene features into the scene discrimination model and output the current scene category.

5. The method for guiding and assisting blind people and detecting hazards in multiple scenarios according to claim 1, characterized in that, In step S5, the staircase guidance operation specifically includes: S51. Perform grayscale conversion and Gaussian blur preprocessing on the image; S52. Perform Canny edge detection to extract the staircase edge contour and calculate the number of staircase steps; S53. Detect potential straight lines for each pixel using Hough transform, then perform density clustering on the resulting straight line clusters according to their features, and merge straight lines from the same cluster. S54. Count the number of staircases; S55. The location and number of stairs are announced to the user via voice.

6. The method for guiding and assisting the blind and detecting hazards in multiple scenarios according to claim 1, characterized in that, Step S6, guiding pedestrians across the zebra crossing, specifically includes: S61. The preprocessing, edge detection, and line detection processes are the same as those in S51, S52, and S53. S62. Calculate the two zebra crossing vertical line clusters from the results of S61. S63. Calculate the current position information of the zebra crossing relative to the blind person; S64. The location information of the zebra crossing relative to the blind person is read aloud to the user via voice.

7. The method for guiding and assisting the blind and detecting hazards in multiple scenarios according to claim 1, characterized in that, In step S7, the YOLO recognition operation specifically includes: S71. Data source filtering and preprocessing; S72, Character Recognition and Comparison; S73. Based on the information obtained in S72, guide information is provided through vibration feedback from the wristband.

8. A method for guiding and assisting the blind and detecting hazards in multiple scenarios according to claim 1, characterized in that, In step S8, the AI-assisted clothing search operation specifically includes: S81. Perform clothing recognition on the collected wardrobe images, determine the position and feature information of each garment from left to right, and announce the serial number, type, color and applicable scenario of each garment in turn by voice. S82. Receive clothing selection instructions input by the blind person, and match and locate the target clothing from the clothing; S83. Provide detailed information about the target clothing and usage suggestions to the blind person via voice.

9. A method for guiding and assisting the blind and detecting hazards in multiple scenarios according to claim 1, characterized in that, In step S9, the feedback output specifically includes: S91: Converts non-urgent, precise guidance information into voice prompts and outputs them through headphones; S92. The danger warning information and emergency guidance information are converted into vibration signals and output through the vibration wristband. Different types of warnings correspond to different vibration modes.

10. A method for guiding and assisting the blind and detecting hazards in multiple scenarios according to claim 1, characterized in that, Images are captured and distances are calculated using a hat equipped with binocular cameras. Algorithms are run using a Jason Nano chip, and information is quickly transmitted via headphones and vibrating wristbands worn on both hands.