Stranger identification method and device, and electronic device
By extracting and matching the clothing and body shape features of target pedestrians, the problem of low accuracy in facial feature recognition in existing technologies has been solved, enabling fast and accurate stranger identification and early warning.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING SUNNIWELL DIGITAL S&T CO LTD
- Filing Date
- 2023-07-10
- Publication Date
- 2026-06-16
Smart Images

Figure CN116798074B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer vision, and more particularly to a method, device and electronic device for stranger identification. Background Technology
[0002] In the field of computer vision, facial recognition is commonly used to determine whether a person entering a designated area is a stranger who is not permitted to enter. This method of facial recognition requires acquiring subtle facial features, such as the specific location and shape of facial organs like the eyes, mouth, and nose. However, this type of facial recognition method for identifying strangers is susceptible to problems if the person's face is obscured, their back is to the image acquisition device, or the image captured by the device is blurry. In such cases, it may fail to accurately identify whether the person entering the designated area is a stranger.
[0003] For scenarios where it is necessary to accurately identify whether people entering a designated area are strangers and to issue targeted warnings, existing technologies for stranger identification based on facial features suffer from low accuracy. Summary of the Invention
[0004] In view of this, embodiments of this application provide a stranger identification method, apparatus, and electronic device to solve the problem of low stranger identification accuracy in the prior art.
[0005] In a first aspect, embodiments of this application provide a stranger identification method, wherein the method includes:
[0006] Acquire a target image of a specified area captured by a target image acquisition device, wherein the target image contains pedestrians entering the specified area;
[0007] Based on the image to be detected, the shape features of the target pedestrian are extracted using a preset feature extraction model. The shape features include at least the clothing features and body shape features of the target pedestrian.
[0008] The physical characteristics of the target pedestrian are matched with physical characteristics in a preset pedestrian feature database. Based on the matching results, it is determined whether the target pedestrian is a stranger.
[0009] In conjunction with the first aspect, in a second possible embodiment, the extraction of the target pedestrian's external features includes:
[0010] Based on the image data acquired by the target image acquisition device, the target rectangular area image of each pedestrian in the image data is output, wherein each target rectangular area image contains first identification information for distinguishing different target pedestrians;
[0011] The images of each target rectangular region are input into a preset feature extraction model to extract the feature vectors of a preset length corresponding to each target pedestrian, and a second identification information is added to the feature vectors of the preset length corresponding to each target pedestrian to distinguish the target pedestrian.
[0012] Based on the correspondence between each of the first identification information and each of the second identification information, the shape feature vector belonging to the same target pedestrian is determined as the shape feature of the target pedestrian.
[0013] In conjunction with the second possible embodiment of the first aspect, in the third possible embodiment, the image data acquired by the target image acquisition device includes historically acquired sample images, and the shape features in the preset pedestrian feature database are predetermined by the following method:
[0014] Using the sample images and the preset feature extraction model, the shape feature vectors of each target pedestrian contained in each sample image are extracted;
[0015] Based on the shape feature vectors of each target pedestrian, a feature vector library is constructed to obtain the preset pedestrian feature library.
[0016] In conjunction with the first aspect, in a fourth possible embodiment, the step of matching the physical features of the target pedestrian with physical features in a preset pedestrian feature database, and determining whether the target pedestrian is a stranger based on the matching result, includes:
[0017] Determine the cosine similarity between the shape features of the target pedestrian and each shape feature in the preset pedestrian feature library, and determine the number of target cosine similarities that meet the preset cosine similarity threshold requirement among each cosine similarity.
[0018] If the number of target cosine similarities is less than a preset statistical threshold, then the target pedestrian is determined to be a stranger.
[0019] In conjunction with the first aspect, in a fifth possible embodiment, the method further includes:
[0020] Pedestrian detection is performed on each frame of image data to be detected acquired by the target image acquisition device, and the detection results are output. The detection results of the target frame image data include: the target rectangular area image occupied by the target pedestrian in the target frame image, and the timestamp of the target frame.
[0021] Based on the pedestrian detection results of each frame of image data, the earliest appearance time of the target pedestrian and the target rectangular area image occupied by the target pedestrian in each frame of image data are determined.
[0022] Taking the earliest appearance time of the target pedestrian as the starting point, obtain the largest target rectangular area image in each target rectangular area within a preset time period thereafter;
[0023] The step of extracting the shape features of the target pedestrian based on the image to be detected and using a preset feature extraction model includes:
[0024] The image of the largest target rectangular region is input into the preset feature extraction model to extract the shape features of the target pedestrian.
[0025] In conjunction with the first possible embodiment or the fourth possible embodiment of the first aspect, in the sixth possible embodiment, the method further includes:
[0026] If the target pedestrian is a stranger, an early warning signal is output based on a preset early warning strategy.
[0027] Secondly, embodiments of this application provide a stranger identification device, wherein the device includes:
[0028] The acquisition module is used to acquire a target image of a specified area captured by the target image acquisition device, wherein the target image contains pedestrians entering the specified area;
[0029] The feature extraction module is used to extract the shape features of the target pedestrian based on the image to be detected using a preset feature extraction model, wherein the shape features include at least: the clothing features of the target pedestrian and the body shape features of the target pedestrian;
[0030] The stranger identification module is used to match the physical features of the target pedestrian with the physical features in a preset pedestrian feature database, and determine whether the target pedestrian is a stranger based on the matching results.
[0031] In conjunction with the second aspect, in a second possible embodiment, the feature extraction module is specifically used for:
[0032] Based on the image data acquired by the target image acquisition device, the target rectangular area image of each pedestrian in the image data is output, wherein each target rectangular area image contains first identification information for distinguishing different target pedestrians;
[0033] The images of each target rectangular region are input into a preset feature extraction model to extract the feature vectors of a preset length corresponding to each target pedestrian, and a second identification information is added to the feature vectors of the preset length corresponding to each target pedestrian to distinguish the target pedestrian.
[0034] Based on the correspondence between each of the first identification information and each of the second identification information, the shape feature vector belonging to the same target pedestrian is determined as the shape feature of the target pedestrian.
[0035] In conjunction with the second possible embodiment of the second aspect, in the third possible embodiment, the image data acquired by the target image acquisition device includes historically acquired sample images, and the feature extraction module is specifically used for:
[0036] Using the sample images and the preset feature extraction model, the shape feature vectors of each target pedestrian contained in each sample image are extracted;
[0037] Based on the shape feature vectors of each target pedestrian, a feature vector library is constructed to obtain the preset pedestrian feature library.
[0038] In conjunction with the second aspect, in the fourth possible embodiment, the stranger recognition module is specifically used to determine the cosine similarity between the shape features of the target pedestrian and each shape feature in the preset pedestrian feature library, and to determine the number of target cosine similarities that meet the preset cosine similarity threshold requirement among each cosine similarity.
[0039] If the number of target cosine similarities is less than a preset statistical threshold, then the target pedestrian is determined to be a stranger.
[0040] In conjunction with the second aspect, in a fifth possible embodiment, the device further includes:
[0041] The pedestrian detection module is used to perform pedestrian detection based on each frame of image data to be detected acquired by the target image acquisition device, and output the detection results. The detection results of the target frame image data include: the target rectangular area image occupied by the target pedestrian in the target frame image, and the timestamp of the target frame.
[0042] The determination module is used to determine the earliest appearance time of the target pedestrian and the target rectangular area image occupied by the target pedestrian in each frame of image data based on the pedestrian detection results of each frame of image data.
[0043] The largest target rectangular region image acquisition module is used to acquire the largest target rectangular region image in each of the target rectangular regions within a preset time period, starting from the earliest appearance time of the target pedestrian.
[0044] The feature extraction module is also used to input the image of the largest target rectangular region into the preset feature extraction model to extract the shape features of the target pedestrian.
[0045] In conjunction with the first or fourth possible embodiments of the second aspect, in a sixth possible embodiment, the apparatus further includes:
[0046] The early warning module is used to output an early warning signal based on a preset early warning strategy if the target pedestrian is a stranger.
[0047] Thirdly, embodiments of this application provide an electronic device, wherein the electronic device includes:
[0048] Processor; and
[0049] Stored program memory,
[0050] The program includes instructions that, when executed by the processor, cause the processor to perform the stranger identification method according to the first aspect.
[0051] Fourthly, embodiments of this application provide a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to perform the stranger identification method according to the first aspect.
[0052] The beneficial effects of this application are:
[0053] This application provides a method, apparatus, and electronic device for stranger identification. The method includes: acquiring a target image of a designated area captured by a target image acquisition device; extracting physical features such as clothing and body shape of a target pedestrian using a preset feature extraction model based on the target image; matching the physical features of the target pedestrian with physical features in a preset pedestrian feature database; and determining whether the target pedestrian is a stranger based on the matching result. This application embodiment quickly extracts large features such as clothing and body shape of the target pedestrian. These large physical features occupy a large image area and are less likely to be occluded. Therefore, stranger identification based on these large physical features is more obvious and less prone to error. Stranger identification based on these large physical features can quickly determine whether a person entering the designated area is a stranger and issue an early warning, improving the low accuracy of stranger identification in existing technologies. Attached Figure Description
[0054] Further details, features, and advantages of this application are disclosed in the following description of exemplary embodiments in conjunction with the accompanying drawings, in which:
[0055] Figure 1 A possible flowchart of the stranger identification method provided in this application embodiment;
[0056] Figure 2A possible flowchart of the stranger identification method provided in this application embodiment;
[0057] Figure 3 A possible schematic diagram of the pedestrian detection method provided in the embodiments of this application;
[0058] Figure 4 A schematic diagram of a possible process for extracting the external features of a target pedestrian provided in an embodiment of this application;
[0059] Figure 5 This is a schematic diagram of a possible process for extracting external features according to an embodiment of this application;
[0060] Figure 6 This is a schematic diagram illustrating a possible shape feature vector search provided in an embodiment of this application;
[0061] Figure 7 This is a schematic diagram illustrating a possible external shape feature for an embodiment of this application;
[0062] Figure 8 This is a schematic diagram of a possible feature matching process provided in an embodiment of this application;
[0063] Figure 9 A schematic diagram of a possible logical structure of the stranger identification device provided in an embodiment of this application;
[0064] Figure 10 This is a schematic diagram of a possible logical structure of an electronic device provided in an embodiment of this application. Detailed Implementation
[0065] Embodiments of this application will now be described in more detail with reference to the accompanying drawings. While some embodiments of this application are shown in the drawings, it should be understood that this application can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this application. It should be understood that the drawings and embodiments of this application are for illustrative purposes only and are not intended to limit the scope of protection of this application.
[0066] It should be understood that the steps described in the method embodiments of this application may be performed in different orders and / or in parallel. Furthermore, the method embodiments may include additional steps and / or omit the steps shown. The scope of this application is not limited in this respect.
[0067] The term "comprising" and its variations as used herein are open-ended, meaning "including but not limited to". The term "based on" means "at least partially based on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Definitions of other terms will be given in the following description. It should be noted that the concepts of "first", "second", etc., mentioned in this application are used only to distinguish different devices, modules, or units, and are not intended to limit the order of functions performed by these devices, modules, or units or their interdependencies.
[0068] It should be noted that the terms "a" and "a plurality of" used in this application are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".
[0069] As described in the background section, in the field of computer science, in video security scenarios targeting a specific area, facial feature recognition is typically performed on images captured by security equipment in that area to further determine the identity information of the person being filmed and whether the person being filmed is already on the whitelist. If not, the person is determined to be a stranger; if so, the person is determined to be a non-stranger.
[0070] This method of stranger identification based on facial recognition technology typically requires extracting a large number of subtle facial features, such as the distance between the pupils of the eyes and the distance between the nose and mouth, for accurate identification. However, for an image of a designated area captured by an image acquisition device, the face usually occupies only a small area, and the similarity between facial features is extremely high, meaning the differences between features are small. Therefore, to accurately identify people entering a designated area, it is usually necessary to use a large number of captured images, extract a large number of subtle facial features, and then compare these extracted facial features to obtain accurate identification results. However, if the facial area in the image captured by the image acquisition device is blurry, or if the facial area is occluded, or if the face is facing away from the image acquisition device, the accuracy of stranger identification based on facial recognition will be low in scenarios where accurate identification of whether a person entering a designated area is a stranger is required.
[0071] For example, in one possible application scenario, the stranger identification method provided in this application embodiment may be applied to hazardous work spaces. Such hazardous work spaces include machine shops, construction sites, etc. Taking a machine shop as an example, machine shops typically refuse entry to unauthorized personnel from a safety perspective to prevent individuals without professional operating skills from entering the workshop and operating related equipment, which could lead to personal injury or property damage. For such application scenarios, it is necessary to accurately identify and warn personnel entering the scene, thereby reducing losses caused by unauthorized personnel entering such areas. However, the technical solution for stranger identification based on facial recognition requires facial recognition of personnel entering the scene. As can be seen from the above-described implementation principle of facial recognition, if the face of the person entering the scene is obscured, or if the facial area in the image is blurry, the solution for stranger identification and warning based on facial recognition is unlikely to quickly warn of unauthorized personnel entering the scene, resulting in a low accuracy rate for stranger identification.
[0072] In view of this, embodiments of this application provide a stranger identification method, which can be applied to any electronic device with image recognition capabilities, including but not limited to personal mobile terminals, image acquisition devices, computers, or servers, etc. Figure 1 As shown, the stranger identification method provided in this application includes the following steps:
[0073] S11. Acquire the image of the specified area to be detected acquired by the target image acquisition device;
[0074] The image to be detected contains pedestrians entering the designated area.
[0075] S12. Based on the image to be detected, extract the shape features of the target pedestrian using a preset feature extraction model;
[0076] The physical characteristics of the target pedestrian include: the pedestrian's clothing characteristics and the pedestrian's body shape characteristics.
[0077] S13. Match the physical features of the target pedestrian with the physical features in the preset pedestrian feature database, and determine whether the target pedestrian is a stranger based on the matching results.
[0078] When faced with the need to identify strangers in specific scenarios, this application abandons the traditional facial feature recognition approach and instead selects features such as the target pedestrian's clothing and body shape as identification features. Compared to facial features, these features occupy a larger area in the image, are more distinct, and are less likely to be occluded. Therefore, by using this application's embodiment, large features such as the target pedestrian's clothing and body shape are quickly extracted. These large shape features occupy a large image area, making stranger identification based on them more obvious and reducing the possibility of errors. Stranger identification based on these large shape features can quickly determine whether a person entering a designated area is a stranger and issue an alert, thus improving the low accuracy of stranger identification in existing technologies.
[0079] To clearly illustrate steps S11 to S13 above, the specific implementation details of each step will be explained below:
[0080] In step S11, the target image acquisition device refers to a device capable of converting the received light signal into an image. It can be any type of image acquisition device, including personal mobile terminal cameras, bullet cameras, and PTZ cameras. The target image acquisition device can be a single image acquisition device or multiple image acquisition devices. The designated area refers to an area set by the user that requires video surveillance using one or more image acquisition devices. In this embodiment, the designated area can be an area requiring specific permissions to enter, including: spaces for storing important property, hazardous work areas, etc.
[0081] In step S11, the image to be detected in the specified area acquired by the target image acquisition device can be an image to be detected acquired in real time by the target image acquisition device, or an image to be detected acquired historically by the target image acquisition device. The number of images to be detected is not limited; that is, in this embodiment, the image to be detected can be a single frame or multiple frames. When the image to be detected is a multi-frame image, the temporal order of each frame can be determined based on the timestamps of each frame.
[0082] When performing step S11, the image data acquired by the target image acquisition device can be obtained in real time, or the image data acquired by the target image acquisition device can be obtained from a server or other storage device. The specific method of acquiring the image data to be detected can be flexibly selected according to the actual situation, and this application does not make specific limitations.
[0083] In step S11, the image to be detected contains a pedestrian appearing in a designated area. This can mean the pedestrian is completely exposed within the field of view of the target image acquisition device, or that the degree to which the pedestrian is exposed within the field of view of the target image acquisition device is greater than a preset exposure level. For example, the target image acquisition device could capture the entire target pedestrian or only a portion of the target pedestrian.
[0084] In one possible embodiment, the image to be detected may not contain pedestrians appearing in the specified area. In this case, a pedestrian detection algorithm can be used to detect pedestrians in the image to be detected acquired by the target image acquisition device, and images containing pedestrians appearing in the specified area can be selected as target detection images. In this embodiment, the image to be detected may be image data cropped from the original image data acquired by the target image acquisition device, containing only the target pedestrians. Specifically, as shown... Figure 2 As shown, the following steps can be used to obtain an image containing only the target pedestrian:
[0085] S111. Perform pedestrian detection on each frame of the image data to be detected acquired by the target image acquisition device, and output the detection results;
[0086] The detection results of the target frame image data include: the target rectangular region occupied by the target pedestrian in the target frame image, and the timestamp of the target frame.
[0087] S112. Based on the pedestrian detection results of each frame of image data, determine the earliest appearance time of the target pedestrian and the target rectangular area image occupied by the target pedestrian in each frame of image data.
[0088] S113. Taking the earliest appearance time of the target pedestrian as the starting point, obtain the image of the largest target rectangular region in each target rectangular region within the preset time period thereafter.
[0089] The image of the largest target rectangular region can be used as input to the preset feature extraction model, and then step S12 is executed to extract the shape features of the target pedestrian.
[0090] In this embodiment of the application, pedestrian detection is performed on each frame of the image to be detected acquired by the target image acquisition device. At this time, the image to be detected is the original image data acquired by the target image acquisition device without size transformation or image processing.
[0091] Based on the embodiments of this application, when performing step S12, the image to be detected mentioned in step S12 is the largest target rectangular region image obtained after processing by the above steps S111, S112 and S113.
[0092] In this embodiment of the application, when performing step S111, pedestrian detection is performed on each frame of the image data to be detected. Specifically, this can be done by using a target recognition model to identify the image data acquired by the input target image acquisition device and identify the bounding rectangle of the pedestrian's location, such as... Figure 3 As shown, by identifying the position of the bounding rectangle of the pedestrian in the image coordinates, and then outputting the values of the pixels contained in that rectangular region individually, the image of the area occupied by the target pedestrian can be cropped from the original image. In this embodiment, identification information can also be added to each rectangular region, such as... Figure 3 As shown, each box represents a target pedestrian, and the number in the upper left corner of the rectangle represents the pedestrian's identification information. For specific target recognition models, please refer to existing target detection technologies; this application will not elaborate further.
[0093] Since the cropped rectangular area image only contains pedestrians and is only a part of the original image, the time when the pedestrian appeared is no longer included in this part of the image. When there are many cropped rectangular area images, it is impossible to accurately distinguish the specific appearance time of each pedestrian. Therefore, in this embodiment, a timestamp can be added to each cropped rectangular area image containing a pedestrian, based on the original timestamp of the original image where the pedestrian is located. The added timestamp is consistent with the original timestamp of the original image. That is, in step S111, the timestamp of the target frame is used to characterize the image timestamp of the rectangular area image occupied by each target pedestrian in the target frame, and also to characterize the time that each target pedestrian was exposed to the specified area.
[0094] For the same target pedestrian, when performing step S112, the rectangular area images of the target pedestrian at different timestamps can be obtained based on the target rectangular area images occupied by the same target pedestrian in each frame image identified in multiple frames.
[0095] In this embodiment, after detecting the first target pedestrian in the nth frame image, the detection result of the (n-1)th frame image acquired by the same target image acquisition device is obtained from the database. It is then determined whether the first target pedestrian is a person who appears multiple times within the specified area. If the (n-1)th frame image shows that the first target pedestrian also exists, it can be determined that the first target pedestrian appears multiple times within the specified area. At this time, the nth frame image and the (n-1)th frame image data can be output for step S113. If the detection result of the (n-1)th frame image does not include the first target pedestrian, it indicates that the first target pedestrian appeared in the specified area at the time node corresponding to the nth frame.
[0096] When performing step S112, the earliest appearance time of the target pedestrian is determined based on the pedestrian detection results of each frame of image data. That is, the earliest appearance time of the target pedestrian is determined based on the timestamp of each target frame, which is the earliest time node of the image data in which the target pedestrian was detected.
[0097] Based on this, when executing step S113, starting from the earliest time node, the largest target rectangular area image is acquired within a preset time period. The largest target rectangular area image refers to the image data within the target rectangular area with the largest pixel size. The preset time period can be a time period set based on experience or a time period determined based on historical experimental results. For example, the preset time period can be 10 seconds. In essence, when executing step S113, it acquires the original image frame with the largest proportion of the target pedestrian in the entire image frame among all frames acquired by the target image acquisition device within 10 seconds after the appearance of the target pedestrian, and a partial image of the area occupied by the target pedestrian in that original frame. If the target image acquisition device is a fixed-focus image acquisition device, then the largest target rectangular area image is the partial image data of the image occupied by the target pedestrian in the image acquired by the target image acquisition device when the target pedestrian is closest to the image acquisition device.
[0098] By using the embodiments of this application, pedestrian detection is performed on multiple frames of images acquired by the target image acquisition device, and then the target area image with the largest area occupied by the target pedestrian is obtained as the input of the feature extraction model. That is, the feature extraction is performed using the largest target area image, which makes the extracted features more accurate and helps to improve the accuracy of subsequent stranger identification.
[0099] In one possible embodiment, when performing step S12, the original image to which the image with the largest target rectangular area occupied by the target pedestrian belongs can be obtained, and the coordinates of the rectangular area occupied by the target pedestrian in the original image can be obtained. Then, the original image and the coordinates of the rectangular area are input into a preset feature extraction model to extract the shape features of the target pedestrian.
[0100] In another possible embodiment, when performing step S12, the target rectangular area image with the largest area can be selected based on the coordinates of the rectangular area occupied by the target pedestrian in the original image, and then the target rectangular area image with the largest area can be input into a preset feature extraction model to extract the shape features of the target pedestrian.
[0101] In step S12, the preset feature extraction model can be a pre-designed neural network model for extracting specified features from an image. Specifically, in this embodiment, the preset feature extraction model is a CNN (Convolutional Neural Network) model. The specific type, number of layers, weights of each layer, and other parameters of the convolutional network model can be flexibly designed according to different actual application scenarios, and this application does not impose strict limitations. For example, when performing step S12, as shown in Figure 4, the images of each target rectangular region can be input into the convolutional neural network to obtain the corresponding feature vectors.
[0102] Specifically, in one possible embodiment, when performing step S12, as follows: Figure 5 As shown, it is achieved based on the following steps:
[0103] S121. Based on the image data acquired by the target image acquisition device, output the target rectangular area image occupied by each pedestrian in the image data, wherein each target rectangular area image contains first identification information used to distinguish different target pedestrians;
[0104] S122. Input the image of each target rectangular region into the preset feature extraction model, extract the feature vector of preset length corresponding to each target pedestrian, and add second identification information to the feature vector of preset length corresponding to each target pedestrian to distinguish the target pedestrian.
[0105] S123. Based on the correspondence between each first identification information and each second identification information, determine the shape feature vector belonging to the same target pedestrian as the shape feature of the target pedestrian.
[0106] Specifically, when performing step S121, the relevant descriptions of steps S111 to S112 above can be referenced to crop the target rectangular region image occupied by the target pedestrian from the original image data. For example, Figure 3 As shown, each target rectangular region image contains first identification information used to distinguish different target pedestrians, i.e. Figure 3 The first identifier of the rectangular area in the image occupied by the woman making the phone call in the center is 35.
[0107] When performing step S122, it can be as follows: Figure 4 As shown, based on Figure 3The image of the target rectangular region occupied by pedestrian number 35 is input into a convolutional neural network, and the output is a feature vector of a preset length. A longer preset length means more information is contained, resulting in a more accurate processing result, but also longer processing time and lower processing efficiency. A shorter preset length means less data needs to be processed, resulting in a less accurate processing result. In this embodiment, based on practical experience, the preset length can be chosen to be 256, i.e., outputting a feature vector of length 256. This covers the target rectangular region image occupied by most pedestrians, thus including all features visible to the human eye.
[0108] Since there are cases where a preset feature extraction model processes multiple target rectangular region images in parallel, in this embodiment of the application, when performing step S122, a second identification information for distinguishing the target pedestrian can be added to the feature vector of a preset length corresponding to each target pedestrian.
[0109] There should be a one-to-one correspondence between the first identification information and the second identification information. Therefore, when executing step S123, the feature vectors corresponding to the target pedestrian can be determined based on the correspondence between the first identification information and the second identification information.
[0110] In step S12, the physical characteristics of the target pedestrian refer to the feature that occupies a proportion of the target pedestrian's body greater than a preset proportion threshold. The physical characteristics of the target pedestrian include at least: the target pedestrian's clothing characteristics and the target pedestrian's body shape characteristics. For example, Figure 7 As shown, the clothing characteristics of the target pedestrian can include features that occupy a large proportion of the pedestrian's overall body, such as safety helmets, gloves, work vests, and anti-static shoes. If the clothing worn conforms to regulations and is colored, the clothing characteristics can also include the color of the clothing. The physical characteristics of the target pedestrian can include their height, weight, etc.
[0111] Among these methods, body shape features can be selected from the most prevalent body shape features among individuals on a whitelist within a specified region as the target body shape feature. For example, if the individuals on the whitelist are all relatively thin, then the weight feature can be selected as the body shape characteristic; if the individuals on the whitelist are all relatively tall, then height can be selected as the body shape feature for identification. In one possible embodiment, the physical characteristics of the target individual can be extracted by comprehensively considering various salient features of individuals on the whitelist within the specified region.
[0112] In this embodiment, the process of extracting features using a preset feature extraction model to obtain the shape feature vector involves assigning values to the neurons of the input layer of a convolutional neural network model using the image data of each pixel in the target rectangular region image. After calculation by multiple layers of convolutional neural networks, a preset length of numerical values is output. Then, a feature vector is constructed based on this preset length of numerical values. Taking a preset length of 256 as an example, the output result is as follows:
[0113] External feature vector = [x1, x2, x3, ..., x256]
[0114] In this embodiment, the physical characteristics of the target pedestrian are determined based on common clothing features or general body shape characteristics of staff in a designated area. Therefore, this embodiment constructs pedestrian features for stranger identification by selecting common clothing or general body shape characteristics of a designated area. Since physical characteristics are generally conspicuous and occupy a large proportion of the image, and are less likely to be completely occluded, compared to traditional face recognition methods that require a large number of images to determine whether an intruder is a stranger when their face is occluded, the feature extraction method provided in this embodiment can quickly extract the corresponding physical characteristics from fewer images, thus improving the efficiency and accuracy of stranger identification.
[0115] Based on steps S121 to S123 above, in one possible embodiment, sample images historically acquired by the target image acquisition device can be used to perform steps S121 to S123, extracting the shape feature vectors of whitelisted individuals in a specified area from the historical sample images. Then, based on the extracted shape feature vectors, a feature vector library is constructed to obtain a preset pedestrian feature library for subsequent matching. In this embodiment, sample images refer to historically acquired images in which the shape features of each person appearing in the image meet the preset shape feature requirements.
[0116] By using the embodiments of this application, a large number of physical feature vectors that meet the requirements for entering the designated area can be extracted from a large number of sample images of individuals who meet the requirements for entering the designated area, and then a feature library for matching can be constructed based on these large number of physical feature vectors. This allows for rapid determination of whether a target pedestrian entering the designated area is a stranger based on feature vector matching, which helps to improve the accuracy of stranger identification.
[0117] Based on this, step S13 can be achieved through the following steps:
[0118] S131. Determine the shape features of the target pedestrian and the cosine similarity between them and each shape feature in the preset pedestrian feature database, and determine the number of target cosine similarities that meet the preset cosine similarity threshold requirement among each cosine similarity.
[0119] S132. If the number of cosine similarities of the target is less than the preset statistical threshold, then the target pedestrian is determined to be a stranger.
[0120] In this embodiment, the feature vector obtained using the preset feature extraction model is a mathematical vector. Specifically, the shape features extracted from the target pedestrian in the image to be detected are a feature vector of a preset length, and the feature vectors extracted from the preset pedestrian feature library from historical sample images are also individual feature vectors of a preset length. Therefore, when executing step S131, the cosine similarity between the target pedestrian's shape feature vector and the shape feature vectors in the preset pedestrian feature library can be determined by calculating the cosine similarity between the feature vectors. Based on the calculated cosine similarities, the number of target cosine similarities that meet the preset cosine similarity threshold requirement is counted.
[0121] For example, it can be as follows Figure 7 As shown, the feature vectors are used to perform vector retrieval on the pedestrian feature database. Alternatively, as... Figure 8 As shown, if the left image is a feature vector of 256 extracted from the target pedestrian in the image to be detected, the feature vector of the target pedestrian is compared with each feature vector in the preset pedestrian feature library to determine the number of multiple cosine similarities that all reach 0.8. If the number of pre-prepared cosine similarities that all reach 0.8 is greater than 3, it indicates that the pedestrian in the left image is not a stranger.
[0122] When performing step S132, if the number of target cosine similarities is less than a preset statistical threshold, then the target pedestrian is determined to be a stranger. Both the preset statistical threshold and the preset cosine similarity threshold can be set based on practical experience, and this application does not impose specific limitations.
[0123] By using the embodiments of this application, the cosine similarity can be calculated by comparing the appearance features of the target pedestrian with the appearance features in the preset pedestrian feature database, and the number of target cosine similarities that meet the preset cosine similarity threshold requirement can be counted. The larger the number of target cosine similarities, the higher the similarity between the target pedestrian and the people on the whitelist who are allowed to enter the designated area. In this way, it is possible to quickly determine whether the target pedestrian is a stranger, which helps to improve the accuracy of stranger identification.
[0124] In one possible embodiment, after step S13 identifies the target pedestrian as a stranger, a stranger warning can be issued through the following steps:
[0125] If the target pedestrian is a stranger, an early warning signal will be output based on the preset early warning strategy.
[0126] The preset early warning strategies include methods such as voice warnings, traffic light warnings, and warning information prompts. In this embodiment, a voice warning signal can be sent to a stranger to alert them of unauthorized entry and expel them. In another possible embodiment, an intrusion warning message can be sent to the security system to remind security personnel to handle the intruder. Specific preset early warning strategies can be flexibly set according to the actual scenario, and this application does not impose strict limitations. By using the embodiments of this application, accurate early warnings can be provided based on the stranger identification results obtained by this application, effectively solving the problem of low accuracy in existing stranger early warning technologies.
[0127] Secondly, embodiments of this application provide a stranger identification device, such as... Figure 9 As shown, the stranger identification device 900 includes the following components:
[0128] The acquisition module 901 is used to acquire the image to be detected of a specified area acquired by the target image acquisition device, wherein the image to be detected contains pedestrians entering the specified area;
[0129] The feature extraction module 902 is used to extract the shape features of the target pedestrian based on the image to be detected and using a preset feature extraction model. The shape features include at least the clothing features and body shape features of the target pedestrian.
[0130] The stranger identification module 903 is used to match the physical features of the target pedestrian with the physical features in the preset pedestrian feature database, and determine whether the target pedestrian is a stranger based on the matching results.
[0131] In one possible embodiment, the feature extraction module 902 is specifically used for:
[0132] Based on the image data acquired by the target image acquisition device, the target rectangular area image of each pedestrian in the image data is output. Each target rectangular area image contains first identification information used to distinguish different target pedestrians.
[0133] The images of each target rectangular region are input into a preset feature extraction model to extract a feature vector of preset length corresponding to each target pedestrian. The feature vector of preset length corresponding to each target pedestrian contains second identification information used to distinguish the target pedestrian.
[0134] Based on the correspondence between each first identification information and each second identification information, the shape feature vector belonging to the same target pedestrian is determined as the shape feature of the target pedestrian.
[0135] In one possible embodiment, the image data acquired by the target image acquisition device includes historically acquired sample images, and the feature extraction module 902 is specifically used for:
[0136] Using sample images and a pre-defined feature extraction model, the shape feature vectors of each target pedestrian contained in each sample image are extracted.
[0137] Based on the shape feature vectors of each target pedestrian, a feature vector library is constructed to obtain a preset pedestrian feature library.
[0138] In one possible embodiment, the stranger recognition module 903 is specifically used to determine the shape features of the target pedestrian and the cosine similarity between them and each shape feature in the preset pedestrian feature database, and to determine the number of target cosine similarities that meet the preset cosine similarity threshold requirement among each cosine similarity.
[0139] If the number of cosine similarities of the target is less than the preset statistical threshold, then the target pedestrian is determined to be a stranger.
[0140] In one possible embodiment, the device 900 further includes:
[0141] The pedestrian detection module 904 is used to perform pedestrian detection based on each frame of image data to be detected acquired by the target image acquisition device, and output the detection results. The detection results of the target frame image data include: the target rectangular area image occupied by the target pedestrian in the target frame image, and the timestamp of the target frame.
[0142] The determination module 905 is used to determine the earliest appearance time of the target pedestrian and the target rectangular region image occupied by the target pedestrian in each frame of image data based on the pedestrian detection results of each frame of image data.
[0143] The largest target rectangular region image acquisition module 906 is used to acquire the largest target rectangular region image in each target rectangular region within a preset time period, starting from the earliest appearance time of the target pedestrian.
[0144] The feature extraction module 902 is also used to input the image of the largest target rectangular region into the preset feature extraction model to extract the shape features of the target pedestrian;
[0145] The early warning module 907 is used to output an early warning signal based on a preset early warning strategy if the target pedestrian is determined to be a stranger based on the matching results.
[0146] The names of the messages or information exchanged between multiple devices in the embodiments of this application are for illustrative purposes only and are not intended to limit the scope of these messages or information.
[0147] An exemplary embodiment of this application also provides an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor. The memory stores a computer program executable by the at least one processor, the computer program being executed by the at least one processor to cause the electronic device to perform a method according to an embodiment of this application.
[0148] An exemplary embodiment of this application also provides a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a computer's processor, is used to cause the computer to perform a method according to an embodiment of this application.
[0149] An exemplary embodiment of this application also provides a computer program product, including a computer program, wherein, when executed by a computer's processor, the computer program is used to cause the computer to perform a method according to an embodiment of this application.
[0150] refer to Figure 10 The present invention describes a structural block diagram of an electronic device 1000 that can serve as a server or client of this application, which is an example of a hardware device that can be applied to various aspects of this application. The electronic device is intended to represent various forms of digital electronic computer devices, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely examples and are not intended to limit the implementation of the application described and / or claimed herein.
[0151] like Figure 10 As shown, the electronic device 1000 includes a computing unit 1001, which can perform various appropriate actions and processes according to a computer program stored in a read-only memory (ROM) 1002 or a computer program loaded into a random access memory (RAM) 1003 from a storage unit 1008. The RAM 1003 may also store various programs and data required for the operation of the device 1000. The computing unit 1001, ROM 1002, and RAM 1003 are interconnected via a bus 1004. An input / output (I / O) interface 1005 is also connected to the bus 1004.
[0152] Multiple components in electronic device 1000 are connected to I / O interface 1005, including: input unit 1006, output unit 1007, storage unit 1008, and communication unit 1009. Input unit 1006 can be any type of device capable of inputting information to electronic device 1000. Input unit 1006 can receive input digital or character information and generate key signal inputs related to user settings and / or function control of electronic device. Output unit 1007 can be any type of device capable of presenting information and may include, but is not limited to, a display, speaker, video / audio output terminal, vibrator, and / or printer. Storage unit 1008 may include, but is not limited to, disk and optical disk. Communication unit 1009 allows electronic device 1000 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers, and / or chipsets, such as Bluetooth™ devices, WiFi devices, WiMax devices, cellular communication devices, and / or the like.
[0153] The computing unit 1001 can be various general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 1001 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1001 performs the various methods and processes described above. For example, in some embodiments, the aforementioned stranger identification method can be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 1008. In some embodiments, part or all of the computer program can be loaded and / or installed on the electronic device 1000 via ROM 1002 and / or communication unit 1009. In some embodiments, the computing unit 1001 can be configured to perform the aforementioned stranger identification method by any other suitable means (e.g., by means of firmware).
[0154] The program code used to implement the methods of this application may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that when executed by the processor or controller, the functions / operations specified in the flowcharts and / or block diagrams are implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.
[0155] In the context of this application, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
[0156] As used in this application, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, device, and / or apparatus (e.g., disk, optical disk, memory, programmable logic device (PLD)) for providing machine instructions and / or data to a programmable processor, including machine-readable media that receive machine instructions as machine-readable signals. The term "machine-readable signal" refers to any signal for providing machine instructions and / or data to a programmable processor.
[0157] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).
[0158] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as a data server), or computing systems that include middleware components (e.g., an application server), or computing systems that include frontend components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.
[0159] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact through communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other.
Claims
1. A method for stranger identification, characterized in that, The method includes: Acquire a target image of a designated area captured by a target image acquisition device, wherein the image to be detected contains pedestrians entering the designated area, and the designated area includes: a hazardous work space; Based on the image to be detected, the shape features of the target pedestrian are extracted using a preset feature extraction model. The shape features include at least the clothing features and body shape features of the target pedestrian. The clothing features include the features of the safety helmet, gloves, work vest, anti-static shoes, and the color features of clothing. The body shape features of the target pedestrian are the most common features among the body shape features of people on the whitelist of the specified area. The physical characteristics of the target pedestrian are matched with physical characteristics in a preset pedestrian feature database. Based on the matching results, it is determined whether the target pedestrian is a stranger.
2. The method according to claim 1, characterized in that, The extraction of the physical features of the target pedestrian includes: Based on the image data acquired by the target image acquisition device, the target rectangular area image of each pedestrian in the image data is output, wherein each target rectangular area image contains first identification information for distinguishing different target pedestrians; The images of each target rectangular region are input into a preset feature extraction model to extract the feature vectors of a preset length corresponding to each target pedestrian, and a second identification information is added to the feature vectors of the preset length corresponding to each target pedestrian to distinguish the target pedestrian. Based on the correspondence between each of the first identification information and each of the second identification information, the shape feature vector belonging to the same target pedestrian is determined as the shape feature of the target pedestrian.
3. The method according to claim 2, characterized in that, The image data acquired by the target image acquisition device includes historically acquired sample images, and the shape features in the preset pedestrian feature database are determined in advance through the following method: Using the sample images and the preset feature extraction model, the shape feature vectors of each target pedestrian contained in each sample image are extracted; Based on the shape feature vectors of each target pedestrian, a feature vector library is constructed to obtain the preset pedestrian feature library.
4. The method according to claim 1, characterized in that, The step of matching the physical features of the target pedestrian with physical features in a preset pedestrian feature database, and determining whether the target pedestrian is a stranger based on the matching results, includes: Determine the cosine similarity between the shape features of the target pedestrian and each shape feature in the preset pedestrian feature library, and determine the number of target cosine similarities that meet the preset cosine similarity threshold requirement among each cosine similarity. If the number of target cosine similarities is less than a preset statistical threshold, then the target pedestrian is determined to be a stranger.
5. The method according to claim 1, characterized in that, The method further includes: Pedestrian detection is performed on each frame of image data to be detected acquired by the target image acquisition device, and the detection results are output. The detection results of the target frame image data include: the target rectangular area image occupied by the target pedestrian in the target frame image, and the timestamp of the target frame. Based on the pedestrian detection results of each frame of image data, the earliest appearance time of the target pedestrian and the target rectangular area image occupied by the target pedestrian in each frame of image data are determined. Taking the earliest appearance time of the target pedestrian as the starting point, obtain the largest target rectangular area image in each target rectangular area within a preset time period thereafter; The step of extracting the shape features of the target pedestrian based on the image to be detected using a preset feature extraction model includes: The image of the largest target rectangular region is input into the preset feature extraction model to extract the shape features of the target pedestrian.
6. The method according to claim 1 or 4, characterized in that, The method further includes: If the target pedestrian is a stranger, an early warning signal is output based on a preset early warning strategy.
7. A stranger identification device, characterized in that, The device includes: The acquisition module is used to acquire a target image of a designated area captured by the target image acquisition device, wherein the target image contains pedestrians entering the designated area, and the designated area includes: a hazardous work space; The feature extraction module is used to extract the shape features of the target pedestrian based on the image to be detected using a preset feature extraction model. The shape features include at least the clothing features and body shape features of the target pedestrian. The clothing features include the color features of the safety helmet, gloves, work vest, anti-static shoes, and clothing. The body shape features of the target pedestrian are the most common body shape features among the people on the whitelist of the specified area. The stranger identification module is used to match the physical features of the target pedestrian with the physical features in a preset pedestrian feature database, and determine whether the target pedestrian is a stranger based on the matching results.
8. The apparatus according to claim 7, characterized in that, The feature extraction module is specifically used for: Based on the image data acquired by the target image acquisition device, the target rectangular area image of each pedestrian in the image data is output, wherein each target rectangular area image contains first identification information for distinguishing different target pedestrians; The images of each target rectangular region are input into a preset feature extraction model to extract the feature vectors of a preset length corresponding to each target pedestrian, and a second identification information is added to the feature vectors of the preset length corresponding to each target pedestrian to distinguish the target pedestrian. Based on the correspondence between each of the first identification information and each of the second identification information, the shape feature vector belonging to the same target pedestrian is determined as the shape feature of the target pedestrian; The image data acquired by the target image acquisition device includes historically acquired sample images, and the feature extraction module is specifically used for: Using the sample images and the preset feature extraction model, the shape feature vectors of each target pedestrian contained in each sample image are extracted; Based on the shape feature vectors of each target pedestrian, a feature vector library is constructed to obtain the preset pedestrian feature library; The stranger recognition module is specifically used to determine the cosine similarity between the physical features of the target pedestrian and each physical feature in the preset pedestrian feature database, and to determine the number of target cosine similarities that meet the preset cosine similarity threshold requirement among each cosine similarity. If the number of target cosine similarities is less than a preset statistical threshold, then the target pedestrian is determined to be a stranger. The device further includes: The pedestrian detection module is used to perform pedestrian detection based on each frame of image data to be detected acquired by the target image acquisition device, and output the detection results. The detection results of the target frame image data include: the target rectangular area image occupied by the target pedestrian in the target frame image, and the timestamp of the target frame. The determination module is used to determine the earliest appearance time of the target pedestrian and the target rectangular area image occupied by the target pedestrian in each frame of image data based on the pedestrian detection results of each frame of image data. The largest target rectangular region image acquisition module is used to acquire the largest target rectangular region image in each of the target rectangular regions within a preset time period, starting from the earliest appearance time of the target pedestrian. The feature extraction module is also used to input the image of the largest target rectangular region into the preset feature extraction model to extract the shape features of the target pedestrian; The early warning module is used to output an early warning signal based on a preset early warning strategy if the target pedestrian is a stranger.
9. An electronic device, comprising: processor; as well as Stored program memory, The program includes instructions that, when executed by the processor, cause the processor to perform the method according to any one of claims 1-6.
10. A non-transitory computer-readable storage medium storing computer instructions, wherein, The computer instructions are used to cause the computer to perform the method according to any one of claims 1-6.