Image detection method and device, electronic equipment and storage medium

CN115331185BActive Publication Date: 2026-06-26MOORE THREADS TECH CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: MOORE THREADS TECH CO LTD
Filing Date: 2022-09-14
Publication Date: 2026-06-26

Application Information

Patent Timeline

14 Sep 2022

Application

26 Jun 2026

Publication

CN115331185B

IPC: G06V20/54; G06V20/70; G06V10/26; G06V10/82; G06V20/62

AI Tagging

Technology Topics

Computer graphics (images)Image detection

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies are insufficient to effectively regulate the problem of electric bicycles illegally riding on sidewalks, especially in scenarios where there are no clear lane markings, which threatens pedestrian safety.

Method used

By performing semantic segmentation and target recognition on the image, it is determined whether the non-motorized vehicle is on the sidewalk and whether the driver is driving. Combining the semantic segmentation results and target recognition results, it is determined whether there is a violation.

Benefits of technology

It enables real-time, effective, and low-cost automatic monitoring of non-motorized vehicles on sidewalks, making it suitable for widespread application and reducing the cost and coverage limitations of manual monitoring.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN115331185B_ABST

Patent Text Reader

Abstract

The present disclosure relates to the technical field of computer vision, and particularly relates to an image detection method and device, an electronic device and a storage medium. The method comprises: performing semantic segmentation on an obtained to-be-processed image to obtain a sidewalk region semantic segmentation result, and performing target recognition on the to-be-processed image according to the semantic segmentation result to obtain a target recognition result; in a case where the target recognition result comprises at least one traffic participant region and / or at least one non-motor vehicle region, determining a detection result for representing whether there is a violation behavior in the sidewalk according to the semantic segmentation result and the target recognition result. The present disclosure can realize real-time, effective, low-cost and suitable for wide range of automatic supervision of non-motor vehicles on the sidewalk.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of computer vision technology, and in particular to an image detection method and apparatus, electronic device and storage medium. Background Technology

[0002] With the widespread adoption of electric bicycles and the booming development of the food delivery industry, electric bicycles have gradually become the preferred mode of transportation for short-distance commutes. However, while riding electric bicycles, drivers often prioritize convenience and take shortcuts by directly entering pedestrian crossings. Pedestrian crossings are densely populated, and some pedestrians are often looking down at their phones, failing to observe road conditions and react promptly. Electric bicycles entering these crossings are often traveling at high speeds, sometimes even against traffic, making it difficult to avoid pedestrians in time. This results in numerous collisions between electric bicycles and pedestrians, posing a significant threat to pedestrian safety. This phenomenon is increasingly becoming a pressing regulatory issue that large cities need to address.

[0003] Currently, some megacities have implemented manual monitoring, prohibiting drivers from riding non-motorized vehicles on sidewalks, allowing only dismounted and pushed bicycles across. However, manual monitoring requires significant manpower, is difficult to ensure effectiveness, and has limited coverage. Drivers often only dismount and push their bicycles at manually monitored intersections or within the range of the monitor's line of sight, and continue riding on the sidewalk after passing the inspection. Automated monitoring of these issues is still in its infancy, and a mature solution does not yet exist. Summary of the Invention

[0004] This disclosure proposes an image detection technology solution.

[0005] According to one aspect of this disclosure, an image detection method is provided, comprising: acquiring an image to be processed; performing semantic segmentation on the image to be processed to obtain a semantic segmentation result, the semantic segmentation result including a sidewalk area; performing target recognition on the image to be processed based on the semantic segmentation result to obtain a target recognition result; and, if the target recognition result includes at least one traffic participant area and / or at least one non-motorized vehicle area, determining a detection result based on the semantic segmentation result and the target recognition result, the detection result being used to characterize whether there is a violation on the sidewalk.

[0006] In one possible implementation, determining the detection result based on the semantic segmentation result and the target recognition result includes: determining whether the non-motorized vehicle area is located within the pedestrian area, and determining whether a driver is driving the non-motorized vehicle corresponding to the non-motorized vehicle area; if the non-motorized vehicle area is located within the pedestrian area and a driver is driving the non-motorized vehicle, the detection result is that a violation has occurred.

[0007] In one possible implementation, determining whether the non-motorized vehicle area is located within the pedestrian walkway area includes: determining the wheel area of the non-motorized vehicle area based on the non-motorized vehicle area; determining whether the wheel area is located within the pedestrian walkway area; and if the wheel area is located within the pedestrian walkway area, the non-motorized vehicle area is located within the pedestrian walkway area.

[0008] In one possible implementation, determining whether the wheel area is located within the pedestrian walkway area includes: determining a prediction threshold based on the overlapping area of the wheel area and the pedestrian walkway area; and determining that the non-motorized vehicle area is located within the pedestrian walkway area if the prediction threshold is greater than a preset judgment threshold.

[0009] In one possible implementation, determining whether the wheel area is located within the pedestrian walkway area includes: determining the relative position of the wheel area and the non-motorized vehicle area, the relative size of the wheel area and the non-motorized vehicle area, and the proportion of the overlapping area of the wheel area and the pedestrian walkway area to the non-motorized vehicle area; inputting the relative position of the wheel area and the non-motorized vehicle area, the relative size of the wheel area and the non-motorized vehicle area, and the proportion of the overlapping area of the wheel area and the pedestrian walkway area to the non-motorized vehicle area into a trained binary classification model to obtain the classification result of the binary classification model; wherein, the categories of the classification result include whether the wheel area is located within the pedestrian walkway area or whether the wheel area is not located within the pedestrian walkway area.

[0010] In one possible implementation, determining whether a driver is driving a non-motorized vehicle corresponding to the non-motorized vehicle area includes: determining a first center point of the non-motorized vehicle area and a second center point of at least one traffic participant area near the non-motorized vehicle area; finding a matching center point of the first center point from the at least one second center point, wherein the matching center point is the second center point that is closest to the first center point in the horizontal direction and in spatial distance; and determining that a driver is driving the non-motorized vehicle if the vertical coordinate value of the first center point is less than the vertical coordinate value of the matching center point.

[0011] In one possible implementation, semantic segmentation of the image to be processed to obtain a semantic segmentation result includes: dividing the image to be processed into N image sub-blocks, where N is an integer greater than 1; determining an input sequence with positional encoding based on the N image sub-blocks and the positional information of each image sub-block; inputting the input sequence into an encoder for encoding processing to obtain an encoded sequence with semantic context information; inputting the encoded sequence and class embedding information into a decoder for decoding processing to obtain a semantic segmentation result; and performing target recognition on the image to be processed based on the semantic segmentation result to obtain a target recognition result, including: inputting the semantic segmentation result into a target recognition network for target recognition processing to obtain a target recognition result for the image to be processed.

[0012] In one possible implementation, after determining the detection result based on the semantic segmentation result and the target recognition result, the method further includes: determining violation status information of the violation, the violation status information including at least one of speeding, driving against traffic, illegally carrying passengers, and not wearing a helmet; and / or performing facial recognition on the traffic participant with the violation to determine the identity information of the traffic participant with the violation; and / or performing license plate recognition on the non-motorized vehicle with the violation to determine the license plate information of the non-motorized vehicle with the violation; and uploading at least one of the violation status information, the identity information, and the license plate information to a database.

[0013] According to one aspect of this disclosure, an image detection apparatus is provided, comprising: an acquisition module for acquiring an image to be processed; a semantic segmentation module for performing semantic segmentation on the image to be processed to obtain a semantic segmentation result, wherein the semantic segmentation result includes a pedestrian walkway area; a target recognition module for performing target recognition on the image to be processed based on the semantic segmentation result to obtain a target recognition result; and a detection module for determining a detection result based on the semantic segmentation result and the target recognition result, wherein the detection result is used to characterize whether there is a violation on the pedestrian walkway, provided that the target recognition result includes at least one traffic participant area and / or at least one non-motorized vehicle area.

[0014] In one possible implementation, the detection module is used to: determine whether the non-motorized vehicle area is located within the pedestrian area based on the semantic segmentation result and the target recognition result, and to determine whether a driver is driving the non-motorized vehicle corresponding to the non-motorized vehicle area; if the non-motorized vehicle area is located within the pedestrian area and a driver is driving the non-motorized vehicle, the detection result is that there is a violation.

[0015] In one possible implementation, determining whether the non-motorized vehicle area is located within the pedestrian walkway area includes: determining the wheel area of the non-motorized vehicle area based on the non-motorized vehicle area; determining whether the wheel area is located within the pedestrian walkway area; and if the wheel area is located within the pedestrian walkway area, the non-motorized vehicle area is located within the pedestrian walkway area.

[0016] In one possible implementation, determining whether the wheel area is located within the pedestrian walkway area includes: determining a prediction threshold based on the overlapping area of the wheel area and the pedestrian walkway area; and determining that the non-motorized vehicle area is located within the pedestrian walkway area if the prediction threshold is greater than a preset judgment threshold.

[0017] In one possible implementation, determining whether the wheel area is located within the pedestrian walkway area includes: determining the relative position of the wheel area and the non-motorized vehicle area, the relative size of the wheel area and the non-motorized vehicle area, and the proportion of the overlapping area of the wheel area and the pedestrian walkway area to the non-motorized vehicle area; inputting the relative position of the wheel area and the non-motorized vehicle area, the relative size of the wheel area and the non-motorized vehicle area, and the proportion of the overlapping area of the wheel area and the pedestrian walkway area to the non-motorized vehicle area into a trained binary classification model to obtain the classification result of the binary classification model; wherein, the categories of the classification result include whether the wheel area is located within the pedestrian walkway area or whether the wheel area is not located within the pedestrian walkway area.

[0018] In one possible implementation, determining whether a driver is driving a non-motorized vehicle corresponding to the non-motorized vehicle area includes: determining a first center point of the non-motorized vehicle area and a second center point of at least one traffic participant area near the non-motorized vehicle area; finding a matching center point of the first center point from the at least one second center point, wherein the matching center point is the second center point that is closest to the first center point in the horizontal direction and in spatial distance; and determining that a driver is driving the non-motorized vehicle if the vertical coordinate value of the first center point is less than the vertical coordinate value of the matching center point.

[0019] In one possible implementation, the semantic segmentation module is used to: segment the image to be processed into N image sub-blocks, where N is an integer greater than 1; determine an input sequence with positional encoding based on the N image sub-blocks and the positional information of each image sub-block; input the input sequence into an encoder for encoding processing to obtain an encoded sequence with semantic context information; input the encoded sequence and class embedding information into a decoder for decoding processing to obtain a semantic segmentation result; the target recognition module is used to: input the semantic segmentation result into a target recognition network for target recognition processing to obtain the target recognition result of the image to be processed.

[0020] In one possible implementation, the device further includes an uploading module, configured to, after determining the detection result based on the semantic segmentation result and the target recognition result, determine violation status information of the violation, the violation status information including at least one of speeding, driving against traffic, illegally carrying passengers, and not wearing a helmet, and / or, perform facial recognition on the traffic participant with the violation to determine the identity information of the traffic participant with the violation, and / or, perform license plate recognition on the non-motorized vehicle with the violation to determine the license plate information of the non-motorized vehicle with the violation; and upload at least one of the violation status information, the identity information, and the license plate information to a database.

[0021] According to one aspect of this disclosure, an electronic device is provided, comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the instructions stored in the memory to perform the method described above.

[0022] According to one aspect of this disclosure, a computer-readable storage medium is provided that stores computer program instructions thereon, which, when executed by a processor, implement the above-described method.

[0023] In this embodiment of the disclosure, the acquired image to be processed can be semantically segmented to obtain a semantic segmentation result, and based on the semantic segmentation result, target recognition can be performed on the image to be processed to obtain a target recognition result. If the target recognition result includes at least one traffic participant area and / or at least one non-motorized vehicle area, the detection result used to characterize whether there is a violation on the sidewalk can be determined based on the semantic segmentation result and the target recognition result. This realizes real-time, effective, low-cost and suitable for large-scale promotion of automatic monitoring of non-motorized vehicles driving violations on sidewalks.

[0024] It should be understood that the foregoing general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Other features and aspects of this disclosure will become clear from the following detailed description of exemplary embodiments with reference to the accompanying drawings. Attached Figure Description

[0025] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this disclosure and, together with the specification, serve to illustrate the technical solutions of this disclosure.

[0026] Figure 1 A flowchart illustrating an image detection method according to an embodiment of the present disclosure is shown.

[0027] Figure 2 A schematic diagram of a target recognition network according to an embodiment of the present disclosure is shown.

[0028] Figure 3 A schematic diagram showing the target recognition result according to an embodiment of the present disclosure is provided.

[0029] Figure 4 A schematic diagram illustrating the determination of detection results according to an embodiment of the present disclosure is shown.

[0030] Figure 5 A block diagram of an image detection apparatus according to an embodiment of the present disclosure is shown.

[0031] Figure 6 A block diagram of an electronic device according to an embodiment of the present disclosure is shown.

[0032] Figure 7 A block diagram of another electronic device according to an embodiment of the present disclosure is shown. Detailed Implementation

[0033] Various exemplary embodiments, features, and aspects of this disclosure will now be described in detail with reference to the accompanying drawings. The same reference numerals in the drawings denote elements that have the same or similar functions. Although various aspects of the embodiments are shown in the drawings, they are not necessarily drawn to scale unless specifically indicated otherwise.

[0034] The term “exemplary” as used herein means “serving as an example, embodiment, or illustration.” Any embodiment illustrated herein as “exemplary” is not necessarily to be construed as superior to or better than other embodiments.

[0035] In this document, the term "and / or" is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent three cases: A alone, A and B simultaneously, and B alone. Furthermore, the term "at least one" in this document means any combination of at least two of any one or more elements. For example, including at least one of A, B, and C can mean including any one or more elements selected from the set consisting of A, B, and C.

[0036] Furthermore, to better illustrate this disclosure, numerous specific details are set forth in the following detailed description. Those skilled in the art will understand that this disclosure can be practiced without certain specific details. In some instances, methods, means, components, and circuits well known to those skilled in the art have not been described in detail in order to highlight the main points of this disclosure.

[0037] Among related technologies, visual methods can be used to detect whether non-motorized vehicles are speeding, driving against traffic, or crossing lane lines during normal driving. However, most of these technologies are based on lane line detection and segmentation methods to determine the driving range of non-motorized vehicles, and are only applicable to the driving status of non-motorized vehicles on non-motorized vehicle roads (with lane lines).

[0038] Considering that sidewalks often lack prominent features such as lane lines and their texture information is often limited, the above methods cannot be directly applied to the detection of illegal driving on sidewalks.

[0039] In view of this, in order to be applicable to the detection scenario of illegal driving on sidewalks, this disclosure provides an image detection method that can perform semantic segmentation on the acquired image to be processed to obtain a semantic segmentation result, and perform target recognition on the image to be processed based on the semantic segmentation result to obtain a target recognition result. If the target recognition result includes at least one traffic participant area and / or at least one non-motorized vehicle area, the detection result used to characterize whether there is illegal behavior on the sidewalk can be determined based on the semantic segmentation result and the target recognition result. This realizes real-time, effective, low-cost and suitable for large-scale automatic supervision of illegal driving of non-motorized vehicles on sidewalks.

[0040] Figure 1 A flowchart illustrating an image detection method according to an embodiment of the present disclosure is shown, such as... Figure 1 As shown, the image detection method includes: in step S11, acquiring the image to be processed.

[0041] In step S12, semantic segmentation is performed on the image to be processed to obtain semantic segmentation results, which include the sidewalk area.

[0042] In step S13, target recognition is performed on the image to be processed based on the semantic segmentation result to obtain the target recognition result.

[0043] In step S14, if the target identification result includes at least one traffic participant area and / or at least one non-motorized vehicle area, a detection result is determined based on the semantic segmentation result and the target identification result, the detection result being used to characterize whether there is a violation on the sidewalk.

[0044] In one possible implementation, the image detection method can be executed by an electronic device such as a terminal device or a server. The terminal device can be a user equipment (UE), mobile device, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (PDA), handheld device, computing device, in-vehicle device, wearable device, etc. The method can be implemented by a processor calling computer-readable instructions stored in memory. Alternatively, the method can be executed by a server.

[0045] In one possible implementation, the image detection method can be implemented by a processor calling computer-readable instructions stored in memory. In one example, the processor can be a general-purpose processor such as a central processing unit (CPU), a graphics processing unit (GPU), or an application-specific integrated circuit (ASIC), or it can be an artificial intelligence processor, such as an artificial intelligence (AI) chip, for example, a neural processing unit (NPU).

[0046] In one possible implementation, in step S11, an image to be processed containing the sidewalk is acquired. This image to be processed can be any image acquired by the electronic device. The electronic device can acquire images of the current scene to obtain one or more images to be processed. Alternatively, the electronic device can select one or more image frames from multiple image frames included in a video file as images to be processed. In some implementations, the electronic device can acquire one or more images to be processed from other devices.

[0047] Preferably, in order to reduce costs and improve the real-time performance of detection, cameras pre-installed for urban road monitoring can be used to collect one or more image frames from the road video stream in real time as images to be processed.

[0048] After obtaining the image to be processed in step S11, semantic segmentation can be performed on the image to be processed in step S12 to obtain the semantic segmentation result.

[0049] For example, the image to be processed can be input into a trained semantic segmentation network for semantic segmentation. Based on semantic information, the network performs pixel-level image classification of the input image, predicting the semantic category label for each pixel from a pre-defined label set (e.g., including sidewalks, roads, buildings, etc.). This links each pixel in the image to its corresponding category label, resulting in the semantic segmentation result. In this semantic segmentation result, the region comprised of pixels with the category label "sidewalk" is defined as the sidewalk region.

[0050] For example, a semantic segmentation network can be a neural network based on an encoder-decoder structure. The encoder can be a pre-trained classification network, such as a fully convolutional network (FCN), a deep residual network (ResNet), etc.; the decoder is used to semantically project the discriminative features (lower resolution) learned by the encoder onto the pixel space (higher resolution) to obtain denser classification.

[0051] Considering that semantic segmentation not only has discriminative power at the pixel level, but also a mechanism to project discriminative features learned by the encoder at different stages onto the pixel space (i.e., mapping back to the original image size), different architectures can employ different mechanisms (e.g., skip connections, pyramid pooling, etc.) as part of the decoder, and this disclosure does not limit this.

[0052] After obtaining the semantic segmentation result in step S12, target recognition can be performed on the image to be processed in step S13 based on the semantic segmentation result to obtain the target recognition result.

[0053] For example, in scenarios where there are no traffic participants or non-motorized vehicles on the sidewalk, the target recognition result may not include the traffic participant area or the non-motorized vehicle area; in scenarios where there are traffic participants and non-motorized vehicles on the sidewalk, the target recognition result may include at least one traffic participant area and / or at least one non-motorized vehicle area. Traffic participants may include pedestrians, drivers, passengers, and other persons who have a direct or indirect relationship with traffic, and the non-motorized vehicle area may include bicycles, electric bicycles, tricycles, etc.

[0054] For example, the semantic segmentation result can be input into a trained target recognition network for target recognition processing to obtain the target recognition result of the image to be processed. The target recognition network can include at least one of the following: Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Deep Neural Networks (DNN), YOLO (You Only Look Once) network based on deep learning, Residual Networks (ResNets), Back Propagation Neural Networks (BP), Backbone Neural Networks, etc.

[0055] For example, when the target recognition network includes a convolutional neural network, a lightweight network (Mobilenet) can also be selected as the basic model of the convolutional neural network. Other network structures can be added on the basis of Mobilenet to form a convolutional neural network. Because Mobilenet is small in size and has a fast data processing speed, its training speed is relatively fast. Moreover, the neural network of the target state trained also has the advantages of small size and fast data processing speed, making it more suitable for deployment in embedded devices.

[0056] It should be understood that the network structure of the target recognition network described above is only an example, and may include multiple convolutional layers, multiple pooling layers, multiple fully connected layers, etc. The specific construction method and structure of the network can be determined according to the actual situation, and the above example does not constitute a limitation on the embodiments of this disclosure.

[0057] In step S14, if the target identification result includes at least one traffic participant area and / or at least one non-motorized vehicle area, the detection result can be determined based on the semantic segmentation result and the target identification result. For example, the pedestrian area included in the semantic segmentation result and the traffic participant area and non-motorized vehicle area included in the target identification result can be analyzed. By analyzing the relationship between these areas, it can be determined whether a driver is driving a non-motorized vehicle on the pedestrian walkway, and a detection result used to characterize whether there is a violation on the pedestrian walkway can be obtained.

[0058] It should be understood that if the target identification result obtained in step S13 does not include the traffic participant area and the non-motorized vehicle area, it means that there are no traffic participants or non-motorized vehicles on the sidewalk and there will be no violation. In this case, in order to improve efficiency and reduce the consumption of computing resources, more in-depth calculations can be omitted (e.g., step S14 can be omitted), and the detection result that there is no violation on the sidewalk can be directly determined.

[0059] Thus, through steps S11 to S14, real-time, effective, low-cost, and widely applicable automatic monitoring of non-motorized vehicles driving illegally on sidewalks can be achieved.

[0060] The image detection method according to the embodiments of this disclosure will be described in detail below.

[0061] After obtaining the image to be processed in step S11, the obtained image to be processed can be input into the semantic segmentation network for semantic segmentation processing in step S12 to obtain the semantic segmentation result of the image to be processed.

[0062] In one possible implementation, the image to be processed can be input into a semantic segmentation network for semantic segmentation processing to obtain the semantic segmentation result of the image to be processed. The semantic segmentation network includes an encoder and a decoder.

[0063] Compared to related technologies, semantic segmentation algorithms can be based on fully convolutional networks (FCNs) combined with an encoder-decoder structure. Because these algorithms rely on convolutional neural networks, their processing tends to focus on local interactions, resulting in a weaker ability to capture the contextual information of the image being processed. Pedestrian walkways are particularly complex, especially since they often lack obvious features such as zebra crossings and lane markings. Without strong global contextual information to assist in judgment, the segmentation results will be inaccurate.

[0064] Embodiments of this disclosure may employ a semantic segmentation network that does not require a convolutional neural network. This semantic segmentation network can be a self-attention-based neural network structure for sequences, combined with an encoder-decoder structure to segment images. It can capture global interaction information between scene elements, and in pedestrian walkway scenes, the segmentation performance is significantly better than methods based on fully convolutional networks.

[0065] The semantic segmentation network of this disclosure embodiment will be described by way of example from the encoding stage and the decoding stage respectively.

[0066] For example, the image to be processed can be segmented into N image sub-blocks, where N is an integer greater than 1; based on the N image sub-blocks and the position information of each image sub-block, an input sequence with position encoding is determined; the input sequence is input into an encoder for encoding processing to obtain an encoded sequence with semantic context information; the encoded sequence and class embedding information are input into a decoder for decoding processing to obtain a semantic segmentation result.

[0067] During the encoding stage, the image to be processed can be divided into several image sub-blocks and pulled into an image sequence.

[0068] For example, an image of size H×W×C can be divided into N image sub-blocks of size P×P×C, where H and W are the height and width of the image to be processed, respectively, P is the side length of each sub-block, C is the number of channels, and N is the number of sub-blocks. That is: N=(H×W) / P 2 .

[0069] These N image sub-blocks can be flattened into a one-dimensional vector, and the positional code carrying the positional information corresponding to each image sub-block can be added to the one-dimensional vector. That is, these N image sub-blocks are mapped to the input sequence z0∈R of embedded patches with positional codes. N×d , where N is the number of input sequences z0, and d is the sequence length of each input sequence z0.

[0070] Then, an encoder can be used to encode the position-encoded input sequence z0∈R. N×dMapped to an encoded sequence z with rich semantic context information L ∈R N×d .

[0071] The encoder can consist of a multi-headed self-attention (MSA) block and a multilayer perceptron (MLP) block (e.g., a two-layer point-oriented MLP block). Layer normalization (LN) can be applied before each block, and a skip connection can be applied after each block to connect the input and output. The encoding logic can be represented as follows:

[0072] a i-1 =MSA(LN(z) i-1 ))+z i-1

[0073] z i =MLP(LN(a i-1 )+a i-1 (1)

[0074] In formula (1), i represents the current i-th iteration process, i∈{1,…,L}, z i-1 Let z represent the input sequence of the i-th encoder. i represents the output sequence of the i-th encoder, MSA represents multi-head self-attention block, MLP represents multi-layer sensing block, and LN represents layer normalization function.

[0075] The multi-head self-attention block can be represented as:

[0076]

[0077] In formula (2), d represents the length of the input sequence, Q, K, and V are three weight matrices, the query matrix Q represents the feature matrix learned from the training samples, the K matrix represents the feature matrix of the input sequence, and the V matrix is equal to the K matrix. Similarity can be calculated using the query matrix Q and the K matrix, and then converted into a probability distribution using the normalized exponential function Softmax. At this point, the position with the larger probability value represents the part with greater similarity between the two. Then, the probability distribution... Multiplying by the V-value matrix weights the V-matrix with the attention weight distribution, thus changing the distribution of the V-matrix itself.

[0078] In the decoding stage, linear operations can be used to decode the block sequence from the encoding stage into a segmentation map. That is, from z... L ∈R N×d After point-wise linear layer instance normalization transformation to zLin ∈R N×K Then, the sequence z Lin ∈R N×K Reconstruct the image into a two-dimensional feature map, upsample it to the original image size, and obtain the semantic segmentation result s∈R. H×W×K Where H is the height of the semantic segmentation result, W is the height of the semantic segmentation result, and K is the number of semantic category labels.

[0079] During the decoding phase, a set of K learnable class embeddings cls = [cls1, ..., cls2] can be introduced. K ]∈R K×d Each class embedding can be randomly initialized and assigned a semantic class for generating the class mask. Then, the class embedding cls can be coupled with the encoding sequence z. L Joint processing, inputting into the decoder Mask to compute the normalized encoded sequence z M The scalar product of class embeddings c generates a mask sequence Mask(z) of K classes. M c), that is:

[0080] Mask(z M c)=z M c T (3)

[0081] That is, the i-th mask sequence Mask(z) M c) represents the probability that a pixel belongs to the i-th category. This mask sequence is further reconstructed into a two-dimensional feature map, and then upsampled to the original image size using bilinear interpolation to obtain pixel-level classification results.

[0082] In practical applications, the ADE20K dataset can be used for training. This training set includes 20,210 images and 150 semantic category labels, which can obtain pixel-level accurate segmentation results of sidewalks.

[0083] Thus, the pedestrian crossing segmentation algorithm of this disclosure can avoid using convolutional neural networks for segmentation, and instead use neural network structures from the field of natural language processing (e.g., self-focused neural network structures for sequences) to obtain contextual semantic information, which significantly improves the robustness and accuracy of segmentation in pedestrian crossing scenarios.

[0084] After obtaining the semantic segmentation result in step S12, the semantic segmentation result can be input into the trained target recognition network in step S13 for target recognition processing to obtain the target recognition result of the image to be processed.

[0085] Figure 2 A schematic diagram of a target recognition network according to an embodiment of the present disclosure is shown. Figure 2As shown, the target recognition network can be based on the YOLOv4 method, specifically: the backbone network adopts the CSPDarknet53 structure, which is a result based on the Cross-Stage Partial Network (CSP) and the Darknet53 network (a part of the YOLOv3 network); the neck uses a Spatial Pyramid Pooling (SPP) structure and a Path Aggregation Network (PAN) structure; and the head uses a YOLO detection head, such as the YOLOv3 detection head.

[0086] It should be understood that this disclosure is for informational purposes only. Figure 2 Taking the target recognition network shown as an example, the target recognition network may include at least one convolutional layer, at least one pooling layer (downsampling), and at least one connection layer. This disclosure does not limit the specific structure of the target recognition network.

[0087] For example, the target recognition network in the initial state can be trained based on the loss function to obtain the target recognition network in the target state.

[0088] For example, the loss function ClOU Loss It is represented as:

[0089]

[0090] In formula (4), Euclidean distance represents the center point of the predicted region (the predicted region output by the target recognition network, such as the region including human traffic participants and non-motorized vehicles) and the real region (the labeled region in the training samples); The representation is the diagonal distance of the smallest closure region that can simultaneously contain both the predicted and ground truth regions; IOU (Intersection Over Union) represents the overlap between the predicted and ground truth regions; v represents a parameter that measures whether the aspect ratio matches the actual predicted content, specifically:

[0091] v = 4 / π 2 ·[arctan(w gt / h gt )-arctan(w p / h p ) ] (5)

[0092] In formula (5), w p h p and w gt h gtThese represent the height and width of the predicted region and the height and width of the actual region, respectively.

[0093] In this way, based on the overlap ratio IOU loss function, this loss function solves the problems of ambiguity when the boundary regions do not overlap, the difficulty in measuring the distance between the center points of the predicted regions when the predicted regions are located in the same ground truth region, and the problem that the aspect ratio scale of the predicted region boundary is ignored when predicted regions of the same area are located in the same ground truth region.

[0094] Furthermore, the DIOU_NMS method is used in the non-maximum suppression processing. This method iteratively uses the prediction region with the highest confidence to perform DIOU operations with other prediction regions (i.e., an overlap that considers both the overlapping area and the distance between the center points of the prediction region and the real region). It also filters out prediction regions with large DIOU (i.e., large intersection), which helps in the detection of overlapping targets and is particularly suitable for pedestrian and vehicle-intensive sidewalks.

[0095] Figure 3 A schematic diagram illustrating target recognition results according to an embodiment of the present disclosure is shown. Figure 3 As shown, the target recognition result may include at least one traffic participant area and / or at least one non-motorized vehicle area. It should be understood that in scenarios where there are no traffic participants or non-motorized vehicles on the sidewalk, the target recognition result may also exclude the traffic participant area and the non-motorized vehicle area. Figure 3 (not shown), this disclosure does not limit it.

[0096] in, Figure 3 Each traffic participant and each non-motorized vehicle area can be labeled using a rectangular prediction box, or other shapes (such as ellipses), coordinates, edge lines, color markings, etc. This disclosure does not impose specific restrictions on the methods used to label each traffic participant and each non-motorized vehicle area.

[0097] Furthermore, where the target identification result includes at least one traffic participant area and / or at least one non-motorized vehicle area, to display the target identification result more accurately, the target identification result may also include a first confidence level corresponding to each traffic participant area and / or a second confidence level corresponding to each non-motorized vehicle area. The first confidence level indicates the probability of belonging to a traffic participant area, and the second confidence level indicates the probability of belonging to a non-motorized vehicle area.

[0098] For example, labels can also be set at the top of each box. For instance, multiple characters on the left are used to represent the area category, such as "bicycle" for bicycles included in non-motorized vehicles, "motorbike" for electric vehicles included in non-motorized vehicles, and "person" for traffic participants. The number on the right is the confidence level of the area category pair, and the confidence level can range from 0 to 1.

[0099] If the target recognition result obtained in step S13 does not include the traffic participant area and the non-motorized vehicle area, step S14 can be skipped, and the detection result that there is no violation in the pedestrian walkway can be directly determined.

[0100] Otherwise, proceed to step S14, that is, if the target recognition result includes at least one traffic participant area and / or at least one non-motorized vehicle area, determine the detection result used to characterize whether there is a violation on the sidewalk based on the semantic segmentation result and the target recognition result.

[0101] Considering that both speeding and driving against traffic on sidewalks pose significant safety threats to pedestrians, the embodiments of this disclosure can identify a driver's act of driving a non-motorized vehicle on a sidewalk as a violation. That is, once two conditions are met: first, the non-motorized vehicle is within the sidewalk area, and second, the driver is driving the non-motorized vehicle, it can be judged as a violation.

[0102] Thus, in order to determine the detection results used to characterize whether there is a violation on the sidewalk, the semantic segmentation results and target recognition results can be used to analyze whether there is a driver driving a non-motorized vehicle on the sidewalk in the image to be processed, so as to obtain the detection results used to characterize whether there is a violation on the sidewalk. The detection results include whether there is a violation on the sidewalk and whether there is no violation on the sidewalk.

[0103] In one possible implementation, step S14 may include: determining, based on the semantic segmentation result and the target recognition result, whether the non-motorized vehicle area is located within the pedestrian area, and determining whether a driver is driving the non-motorized vehicle corresponding to the non-motorized vehicle area; if the non-motorized vehicle area is located within the pedestrian area and a driver is driving the non-motorized vehicle, the detection result is that a violation has occurred. Otherwise, the detection result is that no violation has occurred.

[0104] For example, Figure 4 A schematic diagram illustrating the determination of detection results according to an embodiment of this disclosure is shown. Figure 4As shown, the semantic segmentation result of the image to be processed includes a pedestrian area A, and the target recognition result includes three traffic participant areas, namely: traffic participant area B1, traffic participant area B2, and traffic participant area B3, and three non-motorized vehicle areas, namely: non-motorized vehicle area C1, non-motorized vehicle area C2, and non-motorized vehicle area C3.

[0105] The detection results can be determined by judging whether each non-motorized vehicle zone is located within the pedestrian zone, and by judging whether a driver is driving a non-motorized vehicle corresponding to the non-motorized vehicle zone; by... Figure 4 Analysis of the various areas reveals that: non-motorized vehicle area C1 is located within pedestrian area A, and driver B3 is driving a non-motorized vehicle in non-motorized vehicle area C1. Therefore... Figure 4 The corresponding test results indicate that there was a violation.

[0106] For example, in order to improve detection efficiency, during the process of determining the detection result, it can be determined in parallel whether the non-motorized vehicle area is located within the pedestrian area and whether a driver is driving the non-motorized vehicle corresponding to the non-motorized vehicle area. If both of these conditions are met, that is, if the non-motorized vehicle area is located within the pedestrian area and a driver is driving the non-motorized vehicle, the detection result is determined to be a violation; otherwise, the detection result is no violation.

[0107] Alternatively, to reduce the consumption of hardware resources, it can be determined first whether each non-motorized vehicle zone is located within the pedestrian zone. Only if a non-motorized vehicle zone is located within the pedestrian zone will it be further determined whether a driver is operating a non-motorized vehicle corresponding to that zone. If a driver is operating a non-motorized vehicle corresponding to that zone within the pedestrian zone, the detection result will be determined as a violation.

[0108] In this way, if there is no non-motorized vehicle area within the pedestrian area, the detection result can be directly determined as no violation, without needing to determine whether a driver is driving a non-motorized vehicle corresponding to the non-motorized vehicle area, and the detection result can be obtained.

[0109] Similarly, to reduce the consumption of hardware resources, it can be first determined whether a driver is operating a non-motorized vehicle corresponding to the designated non-motorized vehicle area. Only if a driver is operating a non-motorized vehicle corresponding to the designated non-motorized vehicle area will the system further determine whether the non-motorized vehicle area is located within the pedestrian area. If the non-motorized vehicle area is located within the pedestrian area, the detection result will be determined as a violation.

[0110] In this way, if no driver is driving the non-motorized vehicle corresponding to the non-motorized vehicle area, it means that the motorized vehicles are not in motion. The result can be directly determined as no violation, without needing to determine whether the non-motorized vehicle area is within the pedestrian area.

[0111] By using the above method, based on the semantic segmentation results and target recognition results, and by using the two judgment conditions of "determining whether the non-motorized vehicle area is located within the pedestrian area" and "determining whether there is a driver driving the non-motorized vehicle corresponding to the non-motorized vehicle area", the detection results can be determined efficiently and accurately, which is conducive to real-time and effective automatic supervision of illegal driving of non-motorized vehicles on the pedestrian sidewalk.

[0112] It should be understood that this disclosure does not impose specific restrictions on the method of determining the detection results. Different methods for determining the detection results can be selected according to different application scenarios. For example, the number of traffic participants and / or the number of non-motorized vehicle areas included in the target recognition results can be used to determine whether to adopt a parallel judgment method or a method of first judging one condition and then judging another condition if it is true.

[0113] The following sections will elaborate on the two judgment conditions: "whether the non-motorized vehicle area is located within the pedestrian area" and "whether there is a driver driving a non-motorized vehicle corresponding to the non-motorized vehicle area".

[0114] In one possible implementation, determining whether the non-motorized vehicle area is located within the pedestrian walkway area may include steps SA1 to SA3: In step SA1, the wheel area of the non-motorized vehicle area is determined based on the non-motorized vehicle area.

[0115] In step SA2, it is determined whether the wheel area is located within the pedestrian walkway area.

[0116] In step SA3, if the wheel area is located within the pedestrian walkway area, the non-motorized vehicle area is also located within the pedestrian walkway area.

[0117] For example, in step SA1, the wheel area can be determined from the non-motorized vehicle area based on the characteristics of the wheel (such as shape characteristics and position characteristics). This disclosure does not limit the specific method for determining the wheel area.

[0118] After determining the wheel area in step SA1, it can be determined in step SA2 whether the wheel area is located within the pedestrian walkway area.

[0119] For example, step SA2 may include: determining a prediction threshold based on the overlapping area of the wheel area and the sidewalk area; and determining that the non-motorized vehicle area is located within the sidewalk area if the prediction threshold is greater than a preset judgment threshold.

[0120] For example, the overlapping area between the wheel area and the sidewalk area can be directly determined as the prediction threshold; or the ratio of the overlapping area to the wheel area can be determined as the prediction threshold. This disclosure does not limit the method of determining the prediction threshold.

[0121] Once the prediction threshold is determined, it can be compared with a preset judgment threshold to determine whether the non-motorized vehicle area is located within the pedestrian area. If the prediction threshold is greater than the judgment threshold, the non-motorized vehicle area is located within the pedestrian area; if the prediction threshold is less than or equal to the judgment threshold, the non-motorized vehicle area is not located within the pedestrian area. The judgment threshold can be set empirically, and this disclosure does not impose any restrictions on it.

[0122] This method can determine whether the wheel area is within the pedestrian walkway area. It is simple and easy to implement.

[0123] Furthermore, to more accurately determine whether the wheel area is located within the pedestrian walkway area, a trained binary classification model can be used. This binary classification model includes, for example, Support Vector Machines (SVMs) and neural network-based binary classification models; this disclosure does not impose specific restrictions on the types of binary classification models.

[0124] Modeling and analysis can be performed in advance using manually labeled training sample images. This involves using a certain number of sample images that have already undergone sidewalk segmentation (including sidewalk areas) and target recognition (including non-motorized vehicle areas and sidewalk areas). The non-motorized vehicle wheel portion is manually labeled on these sample images to obtain the wheel region. A positive or negative label is then assigned to the wheel to indicate whether it is within the sidewalk area; for example, a positive label indicates the wheel is within the sidewalk area, and a negative label indicates the wheel is not. During the labeling process, the wheel should be placed as completely as possible in the center of the labeling frame.

[0125] Then, determine the relative position between the wheel area and the non-motorized vehicle area. For example, you can first determine the reference point of the non-motorized vehicle area (such as the vertex of the lower left corner of the non-motorized vehicle area), and determine the difference between the coordinates of the center point of the wheel area and the reference point as the relative position (x', y') between the two.

[0126] In addition, determine the relative size of the wheel area and the non-motorized vehicle area, for example, the ratio of the width and height of the wheel area to the width and height of the non-motorized vehicle detection area (w', h').

[0127] In addition, determine the proportion of the overlapping area between the wheel area and the sidewalk area to the non-motorized vehicle area, for example, the proportion λ of the overlapping pixels between the wheel area and the sidewalk area to the total number of pixels in the wheel area.

[0128] Then, these five parameters can be used as inputs: relative position (x', y'), relative size (w', h'), and scale λ. Input these parameters into the binary classification model of the initial state to obtain the first classification result.

[0129] The initial binary classification model can be iteratively trained based on the loss function, the first classification result, and manually labeled positive and negative tags, allowing the binary classification model to learn from the sample images and obtain a trained binary classification model. This trained binary classification model can be used to determine whether the wheels of non-motorized vehicles are located within the pedestrian walkway. Here, the binary classification model can select an appropriate algorithm during training and classification based on the specific application scenario, and this disclosure does not impose specific restrictions in this regard.

[0130] After training the binary classification model, it can be applied in step SA2 to determine whether the wheel area is located within the sidewalk area.

[0131] For example, step SA2 may include SA21 and SA22: In step SA21, the relative position of the wheel area and the non-motorized vehicle area, the relative size of the wheel area and the non-motorized vehicle area, and the proportion of the overlapping area of the wheel area and the sidewalk area to the non-motorized vehicle area are determined.

[0132] For example, based on the wheel area and non-motorized vehicle area determined in the previous steps, the relative position of the wheel area and the non-motorized vehicle area can be determined. For example, the reference point of the non-motorized vehicle area (such as the vertex of the lower left corner of the non-motorized vehicle area) can be determined first, and the difference between the coordinates of the center point of the wheel area and the reference point can be determined as the relative position (x”, y”) of the two.

[0133] Based on the wheel area and non-motorized vehicle area determined in the previous steps, the relative size of the wheel area and the non-motorized vehicle area can be determined, for example, the ratio of the width and height of the wheel area to the width and height of the non-motorized vehicle detection area (w”, h”).

[0134] Based on the wheel area and sidewalk area determined in the previous steps, the proportion of the overlapping area between the wheel area and the sidewalk area to the non-motorized vehicle area can be determined. For example, the proportion λ' of the overlapping pixels between the wheel area and the sidewalk area to the total number of pixels in the wheel area.

[0135] In step SA22, the relative position of the wheel area and the non-motorized vehicle area, the relative size of the wheel area and the non-motorized vehicle area, and the proportion of the overlapping area of the wheel area and the sidewalk area to the non-motorized vehicle area are input into the trained binary classification model to obtain the classification result of the binary classification model; wherein, the categories of the classification result include wheel area being located within the sidewalk area and wheel area not being located within the sidewalk area.

[0136] For example, the relative positions (x”, y”) and sizes (w”, h”) of the wheel area and the non-motorized vehicle area, as well as the proportion (λ’) of the overlapping area between the wheel area and the sidewalk area to the non-motorized vehicle area, can be input into a trained binary classification model to obtain a first classification result. This first classification result can be used to indicate whether the wheel area is located within the sidewalk area.

[0137] This method allows for a more accurate determination of whether the wheel area is within the pedestrian walkway area.

[0138] In step SA2, if it is determined whether the wheel area is within the pedestrian walkway area, in step SA3, if it is determined that the wheel area is within the pedestrian walkway area, it can be determined that the non-motorized vehicle area is within the pedestrian walkway area; or, if it is determined that the wheel area is not within the pedestrian walkway area, it can be determined that the non-motorized vehicle area is not within the pedestrian walkway area.

[0139] By using steps SA1 to SA3, the determination of whether a non-motorized vehicle area is located within the pedestrian area can be transformed into a simple determination of whether the wheel area is located within the pedestrian area, thus achieving efficient and rapid determination of whether a non-motorized vehicle area is located within the pedestrian area.

[0140] The above describes the method for "determining whether a non-motorized vehicle area is located within a pedestrian area". The following section will elaborate on "determining whether a driver is driving a non-motorized vehicle corresponding to the non-motorized vehicle area".

[0141] On a sidewalk, there are three possible relationships between road users and non-motorized vehicles: a person is on a non-motorized vehicle, a person is next to (near) a non-motorized vehicle, and a person is far away from a non-motorized vehicle.

[0142] When a person is driving on a non-motorized vehicle, the corresponding image features will show that the center of gravity of the person and the non-motorized vehicle are relatively close, the position of the center of gravity in the X-axis direction (i.e., the horizontal direction) is not much different, the coordinate of the person's center of gravity in the Y-axis direction (i.e., the vertical direction) is smaller than the coordinate of the non-motorized vehicle's center of gravity in the Y-axis direction, and the position of the two centers of gravity in the Y-axis direction is relatively large.

[0143] When a person is next to a non-motorized vehicle, the corresponding image features will show that the center of gravity of the person and the non-motorized vehicle are relatively close, and the position of their center of gravity in the Y-axis direction is not much different.

[0144] When a person is far away from a non-motorized vehicle, the corresponding image features show that the center of gravity distance between the person and the non-motorized vehicle is relatively far.

[0145] It should be understood that the image coordinate system can be selected with the upper left corner as the origin, the vertical downward direction as the positive Y-axis, and the horizontal rightward direction as the positive X-axis. This disclosure only uses this image coordinate system as an example and does not impose specific restrictions on the origin of the image coordinate system or the specific directions of the coordinate axes.

[0146] Therefore, by comparing the traffic participant area with the non-motorized vehicle area, replacing the center of gravity of the traffic participant with the center of gravity of the non-motorized vehicle area, and replacing the center of gravity of the non-motorized vehicle area with the center of gravity of the non-motorized vehicle area, it can be determined whether a driver is driving a non-motorized vehicle.

[0147] In one possible implementation, determining whether a driver is driving a non-motorized vehicle corresponding to the non-motorized vehicle area may include steps SB1 to SB3: In step SB1, determining a first center point of the non-motorized vehicle area and a second center point of at least one traffic participant area near the non-motorized vehicle area.

[0148] For example, the geometric center point (or centroid, center of gravity) of the non-motorized vehicle area can be determined as the first center point, and the geometric center point (or centroid, center of gravity) of at least one traffic participant near the non-motorized vehicle area can be determined as its corresponding second center point.

[0149] Among them, at least one traffic participant area near the non-motorized vehicle area can be a traffic participant within a preset radius centered on the first center point; or it can be a traffic participant that contacts or partially overlaps with the non-motorized vehicle area; this disclosure does not impose specific restrictions on traffic participants near the non-motorized vehicle area, and can be set according to the actual application scenario.

[0150] In step SB2, a matching center point for the first center point is found from at least one second center point. The matching center point is the second center point that is closest to the first center point in the horizontal direction and in spatial distance.

[0151] For example, suppose the coordinates of a first center point are (x1, y1), and its corresponding N second center points are (x2, y1, y2, y1, y2, y3, y4, y5, y6, y7, y8, y9, y1 ... 1 ,y2 1 )~(x2 N ,y2 NThe first center point (x1, y1) and each second center point (x2, y1) can be calculated separately. k ,y2 k The horizontal distance |x1-x2| for k∈[1,N] k | and spatial distance [(x1-x2)] k ) 2 +(y1-y2 k ) 2 ] 0.5 .

[0152] From N second center points (x2) 1 ,y2 1 )~(x2 N ,y2 N In the process of finding a second center point that has the smallest horizontal distance and the smallest spatial distance, if such a center point exists, it can be used as a matching center point, indicating that there may be a driver driving a non-motorized vehicle in the non-motorized vehicle area corresponding to the first center point. If no matching center point exists that has the smallest horizontal distance and the smallest spatial distance, it indicates that there is no driver driving a non-motorized vehicle in the non-motorized vehicle area corresponding to the first center point.

[0153] It should be understood that, in the process of finding the matching center point, the horizontal distance and spatial distance between each second center point and the first center point can be calculated in parallel; or the horizontal distance between each second center point and the first center point can be calculated first, and then the spatial distance between each second center point and the first center point can be calculated; or the spatial distance between each second center point and the first center point can be calculated first, and then the horizontal distance between each second center point and the first center point can be calculated; this disclosure does not impose any specific limitations on this.

[0154] In step SB3, if the vertical coordinate value of the first center point is less than the vertical coordinate value of the matching center point, it is determined that the driver is driving the non-motorized vehicle.

[0155] For example, suppose the coordinates of a first center point are (x1, y1), and the coordinates of the matching center point are (x1, y1). o ,y o If the vertical coordinate value y1 of the first center point is less than the vertical coordinate value y of the matching center point, o The matching center point (x) can be determined. o ,y o The driver corresponding to the first center point (x1, y1) is driving the non-motorized vehicle corresponding to the first center point (x1, y1); if the vertical coordinate value y1 of the first center point is greater than or equal to the vertical coordinate value y of the matching center point... oIt can be determined that there is no driver driving the non-motorized vehicle corresponding to the first center point (x1, y1).

[0156] This method analyzes traffic participants near non-motorized vehicles, identifying those whose second center point in the traffic participant area is closest to the first center point in the non-motorized vehicle area in the horizontal direction, whose vertical coordinates are smaller than those of the first center point in the non-motorized vehicle area, and whose Euclidean distance between their center points is the closest. Traffic participants and non-motorized vehicles exhibiting this matching relationship can be identified as being driven by a driver. This method is simple, convenient, and can accurately and quickly determine whether a driver is driving a non-motorized vehicle corresponding to the designated non-motorized vehicle area.

[0157] Thus, if both of the above judgment conditions are met simultaneously—"whether the non-motorized vehicle area is located within the pedestrian area" and "whether a driver is driving a non-motorized vehicle corresponding to the non-motorized vehicle area"—that is, if the non-motorized vehicle area is located within the pedestrian area and a driver is driving a non-motorized vehicle, the detection result is that a violation has occurred. Otherwise, the detection result is that no violation has occurred.

[0158] After determining the detection result of whether a violation exists in step S14, the method further includes: determining the violation status information of the violation, the violation status information including at least one of speeding, driving against traffic, illegally carrying passengers, and not wearing a helmet, and / or performing facial recognition on the traffic participant with the violation to determine the identity information of the traffic participant with the violation, and / or performing license plate recognition on the non-motorized vehicle with the violation to determine the license plate information of the non-motorized vehicle with the violation; and uploading at least one of the violation status information, the identity information, and the license plate information to the database.

[0159] For example, violation status information of violations can be determined based on various image recognition methods (such as neural network-based image recognition methods), namely: speeding, driving against traffic, illegally carrying passengers, not wearing a helmet, etc.

[0160] For example, for a non-motorized vehicle that has violated regulations, multiple consecutive frames of images of the non-motorized vehicle can be acquired to determine the vehicle's movement trajectory, and the vehicle can be judged whether it is speeding or driving against the flow of traffic based on the movement trajectory.

[0161] For example, image analysis can be performed on non-motorized vehicle areas and traffic participant areas where violations occur, and the number of traffic participants on non-motorized vehicles can be used to determine whether passengers are being illegally carried.

[0162] For example, in areas where traffic participants have committed violations, it can be determined whether a traffic participant is wearing a helmet by identifying whether a helmet is present on the head area of the traffic participant.

[0163] For example, a face region can be segmented from the area of traffic participants who have committed traffic violations. Visual features, pixel statistical features, face image transformation coefficient features, and face image algebraic features of this face region are extracted. Based on these features, face recognition is performed to determine the identity information of the traffic participants who have committed traffic violations.

[0164] The face recognition methods may include feature-based recognition algorithms, appearance-based recognition algorithms, template-based recognition algorithms, and recognition algorithms using neural networks, etc. This disclosure does not limit the types of face recognition algorithms.

[0165] For example, a license plate area can be segmented from the non-motorized vehicle area where violations occur. Character and color recognition are performed on the license plate area to determine the license plate information of the non-motorized vehicle involved in the violation, namely, the license plate information package.

[0166] The license plate recognition method may include edge-based license plate recognition method, color-based license plate recognition method, and machine learning-based license plate recognition method. This disclosure does not limit the type of license plate recognition method.

[0167] Having obtained the violation status information, the identity information, and the license plate information, at least one of the violation status information, the identity information, and the license plate information can be uploaded to a local or remote database (e.g., in the cloud, on a server).

[0168] This approach facilitates collaboration with relevant regulatory departments to obtain driver information and violation status, further building a non-motorized vehicle driver credit database, and realizing an automated regulatory system that is "digitally regulated, traceable in violation, and cost-based in violation."

[0169] In summary, this disclosure provides an image detection method applicable to detecting violations on sidewalks. It performs semantic segmentation on the acquired image to be processed, obtaining a semantic segmentation result. Based on this semantic segmentation result, it performs target recognition on the image to be processed, obtaining a target recognition result. Then, if the target recognition result includes at least one traffic participant area and / or at least one non-motorized vehicle area, it determines whether the judgment conditions "whether the non-motorized vehicle area is located within the sidewalk area" and "whether there is a driver driving a non-motorized vehicle corresponding to the non-motorized vehicle area" are met based on the semantic segmentation result and the target recognition result. This determines the detection result used to characterize whether there is a violation on the sidewalk. If the non-motorized vehicle area is located within the sidewalk area and a driver is driving a non-motorized vehicle, the detection result is that a violation exists; otherwise, the detection result is that no violation exists.

[0170] In this way, real-time, effective, low-cost, and widely applicable automated monitoring of non-motorized vehicles driving illegally on sidewalks is achieved.

[0171] It is understood that the various method embodiments mentioned above in this disclosure can be combined with each other to form combined embodiments without violating the principle and logic. Due to space limitations, this disclosure will not elaborate further. Those skilled in the art will understand that in the above methods of specific implementation, the specific execution order of each step should be determined by its function and possible internal logic.

[0172] In addition, this disclosure also provides an image detection apparatus, an electronic device, a computer-readable storage medium, and a program, all of which can be used to implement any of the image detection methods provided in this disclosure. The corresponding technical solutions and descriptions are described in the corresponding section of the method and will not be repeated here.

[0173] Figure 5 A block diagram of an image detection apparatus according to an embodiment of the present disclosure is shown, such as Figure 5 As shown, the device includes: an acquisition module 51 for acquiring an image to be processed; a semantic segmentation module 52 for performing semantic segmentation on the image to be processed to obtain a semantic segmentation result, wherein the semantic segmentation result includes a pedestrian walkway area; a target recognition module 53 for performing target recognition on the image to be processed based on the semantic segmentation result to obtain a target recognition result; and a detection module 54 for determining a detection result based on the semantic segmentation result and the target recognition result, wherein the detection result is used to characterize whether there is a violation on the pedestrian walkway, provided that the target recognition result includes at least one traffic participant area and / or at least one non-motorized vehicle area.

[0174] In one possible implementation, the detection module 54 is used to: determine whether the non-motorized vehicle area is located within the pedestrian area based on the semantic segmentation result and the target recognition result, and to determine whether a driver is driving the non-motorized vehicle corresponding to the non-motorized vehicle area; if the non-motorized vehicle area is located within the pedestrian area and a driver is driving the non-motorized vehicle, the detection result is that there is a violation.

[0175] In one possible implementation, determining whether the non-motorized vehicle area is located within the pedestrian walkway area includes: determining the wheel area of the non-motorized vehicle area based on the non-motorized vehicle area; determining whether the wheel area is located within the pedestrian walkway area; and if the wheel area is located within the pedestrian walkway area, the non-motorized vehicle area is located within the pedestrian walkway area.

[0176] In one possible implementation, determining whether the wheel area is located within the pedestrian walkway area includes: determining a prediction threshold based on the overlapping area of the wheel area and the pedestrian walkway area; and determining that the non-motorized vehicle area is located within the pedestrian walkway area if the prediction threshold is greater than a preset judgment threshold.

[0177] In one possible implementation, determining whether the wheel area is located within the pedestrian walkway area includes: determining the relative position of the wheel area and the non-motorized vehicle area, the relative size of the wheel area and the non-motorized vehicle area, and the proportion of the overlapping area of the wheel area and the pedestrian walkway area to the non-motorized vehicle area; inputting the relative position of the wheel area and the non-motorized vehicle area, the relative size of the wheel area and the non-motorized vehicle area, and the proportion of the overlapping area of the wheel area and the pedestrian walkway area to the non-motorized vehicle area into a trained binary classification model to obtain the classification result of the binary classification model; wherein, the categories of the classification result include whether the wheel area is located within the pedestrian walkway area or whether the wheel area is not located within the pedestrian walkway area.

[0178] In one possible implementation, determining whether a driver is driving a non-motorized vehicle corresponding to the non-motorized vehicle area includes: determining a first center point of the non-motorized vehicle area and a second center point of at least one traffic participant area near the non-motorized vehicle area; finding a matching center point of the first center point from the at least one second center point, wherein the matching center point is the second center point that is closest to the first center point in the horizontal direction and in spatial distance; and determining that a driver is driving the non-motorized vehicle if the vertical coordinate value of the first center point is less than the vertical coordinate value of the matching center point.

[0179] In one possible implementation, the semantic segmentation module 52 is used to: segment the image to be processed into N image sub-blocks, where N is an integer greater than 1; determine an input sequence with positional encoding based on the N image sub-blocks and the positional information of each image sub-block; input the input sequence into an encoder for encoding processing to obtain an encoded sequence with semantic context information; input the encoded sequence and class embedding information into a decoder for decoding processing to obtain a semantic segmentation result; the target recognition module 53 is used to: input the semantic segmentation result into a target recognition network for target recognition processing to obtain the target recognition result of the image to be processed.

[0180] In one possible implementation, the device further includes an uploading module, configured to, after determining the detection result based on the semantic segmentation result and the target recognition result, determine violation status information of the violation, the violation status information including at least one of speeding, driving against traffic, illegally carrying passengers, and not wearing a helmet, and / or, perform facial recognition on the traffic participant with the violation to determine the identity information of the traffic participant with the violation, and / or, perform license plate recognition on the non-motorized vehicle with the violation to determine the license plate information of the non-motorized vehicle with the violation; and upload at least one of the violation status information, the identity information, and the license plate information to a database.

[0181] This method is specifically technically related to the internal structure of computer systems and can solve technical problems of how to improve hardware computing efficiency or execution performance (including reducing data storage, reducing data transmission, and increasing hardware processing speed), thereby achieving technical effects that improve the internal performance of computer systems in accordance with natural laws.

[0182] In some embodiments, the functions or modules of the apparatus provided in this disclosure can be used to perform the methods described in the above method embodiments. The specific implementation can be referred to the description of the above method embodiments, and for the sake of brevity, it will not be repeated here.

[0183] This disclosure also proposes a computer-readable storage medium storing computer program instructions that, when executed by a processor, implement the above-described method. The computer-readable storage medium can be volatile or non-volatile.

[0184] This disclosure also proposes an electronic device, including: a processor; and a memory for storing processor-executable instructions; wherein the processor is configured to invoke the instructions stored in the memory to execute the above-described method.

[0185] This disclosure also provides a computer program product, including computer-readable code, or a non-volatile computer-readable storage medium carrying computer-readable code, wherein when the computer-readable code is run in a processor of an electronic device, the processor in the electronic device performs the above-described method.

[0186] Electronic devices can be provided as terminals, servers, or other forms of devices.

[0187] Figure 6 This diagram illustrates a block diagram of an electronic device 800 according to an embodiment of the present disclosure. For example, the electronic device 800 may be a user equipment (UE), a mobile device, a user terminal, a terminal, a cellular phone, a cordless phone, a personal digital assistant (PDA), a handheld device, a computing device, an in-vehicle device, a wearable device, or other terminal devices.

[0188] Reference Figure 6 The electronic device 800 may include one or more of the following components: processing component 802, memory 804, power supply component 806, multimedia component 808, audio component 810, input / output interface 812, sensor component 814, and communication component 816.

[0189] Processing component 802 typically controls the overall operation of electronic device 800, such as operations associated with display, telephone calls, data communication, camera operation, and recording operations. Processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the methods described above. Furthermore, processing component 802 may include one or more modules to facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.

[0190] Memory 804 is configured to store various types of data to support the operation of electronic device 800. Examples of this data include instructions for any application or method operating on electronic device 800, contact data, phonebook data, messages, pictures, videos, etc. Memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.

[0191] Power supply component 806 provides power to various components of electronic device 800. Power supply component 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 800.

[0192] Multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touchscreen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensors may sense not only the boundaries of the touch or swipe action but also the duration and pressure associated with the touch or swipe operation. In some embodiments, multimedia component 808 includes a front-facing camera and / or a rear-facing camera. When the electronic device 800 is in an operating mode, such as a shooting mode or a video mode, the front-facing camera and / or the rear-facing camera may receive external multimedia data. Each front-facing camera and rear-facing camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

[0193] Audio component 810 is configured to output and / or input audio signals. For example, audio component 810 includes a microphone (MIC) configured to receive external audio signals when electronic device 800 is in an operating mode, such as call mode, recording mode, and voice recognition mode. The received audio signals may be further stored in memory 804 or transmitted via communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

[0194] Input / output interface 812 provides an interface between processing component 802 and peripheral interface modules, such as keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to, home buttons, volume buttons, power buttons, and lock buttons.

[0195] Sensor assembly 814 includes one or more sensors for providing state assessments of various aspects of electronic device 800. For example, sensor assembly 814 may detect the on / off state of electronic device 800, the relative positioning of components such as the display and keypad of electronic device 800, changes in position of electronic device 800 or a component of electronic device 800, the presence or absence of user contact with electronic device 800, orientation or acceleration / deceleration of electronic device 800, and temperature changes of electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 814 may also include an optical sensor, such as a complementary metal-oxide-semiconductor (CMOS) or charge-coupled device (CCD) image sensor, for use in imaging applications. In some embodiments, sensor assembly 814 may also include an accelerometer, gyroscope, magnetometer, pressure sensor, or temperature sensor.

[0196] Communication component 816 is configured to facilitate wired or wireless communication between electronic device 800 and other devices. Electronic device 800 can access wireless networks based on communication standards, such as Wi-Fi, 2G, 3G, 4G, LTE, 5G, or combinations thereof. In one exemplary embodiment, communication component 816 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, communication component 816 also includes a near-field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID), Infrared Data Association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

[0197] In an exemplary embodiment, the electronic device 800 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the methods described above.

[0198] In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as a memory 804 including computer program instructions that can be executed by a processor 820 of an electronic device 800 to perform the above-described method.

[0199] Figure 7A block diagram of an electronic device 1900 according to an embodiment of the present disclosure is shown. For example, the electronic device 1900 may be provided as a server or a terminal device. (Refer to...) Figure 7 The electronic device 1900 includes a processing component 1922, which further includes one or more processors, and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by the processing component 1922. The application programs stored in memory 1932 may include one or more modules, each corresponding to a set of instructions. Furthermore, the processing component 1922 is configured to execute instructions to perform the methods described above.

[0200] Electronic device 1900 may also include a power supply component 1926 configured to perform power management of electronic device 1900, a wired or wireless network interface 1950 configured to connect electronic device 1900 to a network, and an input / output interface 1958. Electronic device 1900 can operate on an operating system stored in memory 1932, such as Microsoft Server operating system (Windows Server). TM Apple's graphical user interface-based operating system (Mac OS X) TM ), a multi-user, multi-process computer operating system (Unix) TM Linux is a free and open-source Unix-like operating system. TM ), the open-source Unix-like operating system (FreeBSD) TM (or similar.)

[0201] In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as a memory 1932 including computer program instructions that can be executed by a processing component 1922 of an electronic device 1900 to perform the above-described method.

[0202] This disclosure can be a system, method, and / or computer program product. A computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of this disclosure.

[0203] Computer-readable storage media can be tangible devices capable of holding and storing instructions for use by an instruction execution device. Computer-readable storage media can be, for example, (but not limited to) electrical storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disc read-only memory (CD-ROM), digital multifunction disc (DVD), memory sticks, floppy disks, mechanical encoding devices, such as punch cards or recessed protrusions storing instructions thereon, and any suitable combination of the foregoing. The computer-readable storage media used herein are not to be construed as transient signals themselves, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or electrical signals transmitted through wires.

[0204] The computer-readable program instructions described herein can be downloaded from computer-readable storage media to various computing / processing devices, or downloaded via a network, such as the Internet, local area network, wide area network, and / or wireless network, to an external computer or external storage device. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and / or edge servers. A network adapter card or network interface in each computing / processing device receives the computer-readable program instructions from the network and forwards them to the computer-readable storage media in the respective computing / processing device.

[0205] Computer program instructions used to perform the operations of this disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, status setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or may be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, electronic circuitry, such as programmable logic circuitry, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), is personalized by utilizing the status information of the computer-readable program instructions to implement various aspects of this disclosure.

[0206] Various aspects of this disclosure are described herein with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-readable program instructions.

[0207] These computer-readable program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processor of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner; thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.

[0208] Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.

[0209] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction containing one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those shown in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

[0210] The computer program product can be implemented specifically through hardware, software, or a combination thereof. In one alternative embodiment, the computer program product is specifically embodied in a computer storage medium; in another alternative embodiment, the computer program product is specifically embodied in a software product, such as a software development kit (SDK), etc.

[0211] The description of the various embodiments above tends to emphasize the differences between the various embodiments. The similarities or similarities between them can be referred to, and for the sake of brevity, they will not be repeated here.

[0212] Those skilled in the art will understand that, in the above-described method of the specific implementation, the order in which each step is written does not imply a strict execution order and does not constitute any limitation on the implementation process. The specific execution order of each step should be determined by its function and possible internal logic.

[0213] If the technical solution of this application involves personal information, the product using this technical solution has clearly informed the user of the personal information processing rules and obtained the user's voluntary consent before processing the personal information. If the technical solution of this application involves sensitive personal information, the product using this technical solution has obtained the user's separate consent before processing the sensitive personal information, and also meets the requirement of "express consent". For example, at personal information collection devices such as cameras, clear and prominent signs are set up to inform users that they have entered the scope of personal information collection and that personal information will be collected. If an individual voluntarily enters the collection scope, it is deemed that they have agreed to the collection of their personal information; or on the personal information processing device, with clear signs / information informing users of the personal information processing rules, authorization is obtained from the individual through pop-up information or by asking the individual to upload their personal information; wherein, the personal information processing rules may include information such as the personal information processor, the purpose of personal information processing, the processing method, and the types of personal information processed.

[0214] The various embodiments of this disclosure have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical application, or improvement of the technology in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.

Claims

1. An image detection method, characterized in that, The method is used to detect illegal driving on sidewalks, including: Obtain the image to be processed; The image to be processed is semantically segmented to obtain a semantic segmentation result, which includes the sidewalk area. Based on the semantic segmentation result, target recognition is performed on the image to be processed to obtain the target recognition result; In cases where the target identification result includes at least one traffic participant area and / or at least one non-motorized vehicle area, a detection result is determined based on the semantic segmentation result and the target identification result. The detection result is used to characterize whether there is a violation on the sidewalk. The violation includes the presence of a driver operating a non-motorized vehicle within the sidewalk area. The driver operating a non-motorized vehicle is determined based on a first center point of the non-motorized vehicle area and a second center point of at least one traffic participant area near the non-motorized vehicle area. The vertical coordinate value of the first center point is less than the vertical coordinate value of the matching center point, and the matching center point is the second center point that is closest to the first center point in the horizontal direction and in spatial distance.

2. The method according to claim 1, characterized in that, The step of determining the detection result based on the semantic segmentation result and the target recognition result includes: Based on the semantic segmentation result and the target recognition result, it is determined whether the non-motorized vehicle area is located within the pedestrian area, and whether a driver is driving the non-motorized vehicle corresponding to the non-motorized vehicle area. If the non-motorized vehicle area is located within the pedestrian area and a driver is operating the non-motorized vehicle, the detection result indicates a violation.

3. The method according to claim 2, characterized in that, Determining whether the non-motorized vehicle area is located within the pedestrian area includes: Based on the non-motorized vehicle area, determine the wheel area of the non-motorized vehicle area; Determine whether the wheel area is located within the pedestrian walkway area; When the wheel area is located within the pedestrian walkway area, the non-motorized vehicle area is located within the pedestrian walkway area.

4. The method according to claim 3, characterized in that, Determining whether the wheel area is located within the pedestrian walkway area includes: The prediction threshold is determined based on the overlapping area between the wheel area and the sidewalk area; If the predicted threshold is greater than the preset judgment threshold, it is determined that the non-motorized vehicle area is located within the pedestrian area.

5. The method according to claim 3, characterized in that, Determining whether the wheel area is located within the pedestrian walkway area includes: Determine the relative position of the wheel area and the non-motorized vehicle area, the relative size of the wheel area and the non-motorized vehicle area, and the proportion of the overlapping area of the wheel area and the sidewalk area to the non-motorized vehicle area; The relative positions of the wheel area and the non-motorized vehicle area, the relative sizes of the wheel area and the non-motorized vehicle area, and the proportion of the overlapping area of the wheel area and the sidewalk area to the non-motorized vehicle area are input into the trained binary classification model to obtain the classification result of the binary classification model. The classification results include categories such as the wheel area being within the pedestrian walkway area and the wheel area not being within the pedestrian walkway area.

6. The method according to claim 2, characterized in that, The determination of whether a driver is operating a non-motorized vehicle corresponding to the non-motorized vehicle area includes: Determine a first center point of the non-motorized vehicle area, and a second center point of at least one traffic participant area near the non-motorized vehicle area; From at least one second center point, find the matching center point of the first center point; If the vertical coordinate value of the first center point is less than the vertical coordinate value of the matching center point, it is determined that the driver is driving the non-motorized vehicle.

7. The method according to claim 1, characterized in that, The image to be processed is semantically segmented to obtain semantic segmentation results, including: The image to be processed is divided into N image sub-blocks, where N is an integer greater than 1; Based on N image sub-blocks and the position information of each image sub-block, determine the input sequence with position encoding; The input sequence is input into the encoder for encoding processing to obtain an encoded sequence with semantic context information; The encoded sequence and class embedding information are input into the decoder for decoding to obtain the semantic segmentation result; Based on the semantic segmentation result, target recognition is performed on the image to be processed to obtain the target recognition result, including: The semantic segmentation result is input into the target recognition network for target recognition processing to obtain the target recognition result of the image to be processed.

8. The method according to any one of claims 1-7, characterized in that, After determining the detection result based on the semantic segmentation result and the target recognition result, the method further includes: The violation status information is determined, which includes at least one of the following: speeding, driving against traffic, illegally carrying passengers, and not wearing a helmet. And / or, perform facial recognition on traffic participants who have violated traffic rules to determine their identity information. And / or, perform license plate recognition on non-motorized vehicles that have violated regulations, and determine the license plate information of the non-motorized vehicles that have violated regulations; Upload at least one of the violation status information, the identity information, and the license plate information to the database.

9. An image detection device, characterized in that, The device is used to detect illegal driving on sidewalks, including: The acquisition module is used to acquire the image to be processed; A semantic segmentation module is used to perform semantic segmentation on the image to be processed to obtain a semantic segmentation result, the semantic segmentation result including the sidewalk region; The target recognition module is used to perform target recognition on the image to be processed based on the semantic segmentation result, and obtain the target recognition result; A detection module is configured to determine a detection result based on the semantic segmentation result and the target recognition result when the target recognition result includes at least one traffic participant area and / or at least one non-motorized vehicle area. The detection result is used to characterize whether there is a violation on the sidewalk. The violation includes the presence of a driver operating a non-motorized vehicle within the sidewalk area. The driver operating a non-motorized vehicle is determined based on a first center point of the non-motorized vehicle area and a second center point of at least one traffic participant area near the non-motorized vehicle area. The vertical coordinate value of the first center point is less than the vertical coordinate value of a matching center point, and the matching center point is the second center point that is closest to the first center point in the horizontal direction and in spatial distance.

10. An electronic device, characterized in that, include: processor; Memory used to store processor-executable instructions; The processor is configured to invoke instructions stored in the memory to execute the method according to any one of claims 1 to 8.

11. A computer-readable storage medium storing computer program instructions thereon, characterized in that, When the computer program instructions are executed by the processor, they implement the method described in any one of claims 1 to 8.