A processing method and electronic device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By acquiring image sequences and detecting human pose data from multiple image frames, and utilizing convolutional neural networks for segmentation and dynamic adjustment of detection timing, the problem of misjudgment in sleep state detection by electronic devices is solved, achieving accurate state recognition and resource conservation.

CN122244949APending Publication Date: 2026-06-19SMARTER SILICON (SHANGHAI) TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: SMARTER SILICON (SHANGHAI) TECH CO LTD
Filing Date: 2026-03-27
Publication Date: 2026-06-19

Application Information

Patent Timeline

27 Mar 2026

Application

19 Jun 2026

Publication

CN122244949A

IPC: G06V40/20; G06V10/26; G06V10/82; G06V10/14; G06N3/0464

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Building construction monitoring method
CN119251761A
A construction monitoring method
CN119251761B
Specific action detection method and device, equipment and storage medium
CN116434325A
High-altitude operation lifeline early warning method
CN119068412A
Human body direction determination method and device, screen control method and device and related equipment
CN115909418A

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing electronic devices rely on fixed triggering rules for sleep state detection, resulting in a high false alarm rate and difficulty in distinguishing whether a user is temporarily away or has truly entered a sleep state.

⚗Method used

By acquiring image sequences, detecting human pose data in multiple image frames, segmenting image frames using convolutional neural networks, determining visible human body areas and occluded areas, and dynamically adjusting detection timing by combining key point localization and contour information, the target state is identified based on human pose data, and corresponding working mode control commands are triggered.

🎯Benefits of technology

It achieves accurate identification of user status, reduces misjudgment, improves the intelligence level of equipment and user experience, saves resources, adapts to detection under multiple lighting conditions, and dynamically optimizes detection timing.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122244949A_ABST

Patent Text Reader

Abstract

This application discloses a processing method and an electronic device, relating to the field of computer vision technology, including: acquiring an image sequence; detecting multiple image frames in the image sequence to determine human pose data; determining whether a target object is in a target state based on the human pose data; and triggering a control command associated with the working mode corresponding to the target state in response to the target object being in a target state.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image processing technology, and in particular to a processing method and an electronic device. Background Technology

[0002] In existing intelligent control solutions for electronic devices, the state of the detected object is determined by fixed triggering rules, resulting in a high rate of misjudgment. For example, in sleep state detection scenarios, determining sleep mode based on rules about the duration of user inactivity is difficult to distinguish between whether the user has temporarily left or has truly entered a sleep state, easily leading to misjudgments. Summary of the Invention

[0003] In view of the above problems, this application provides a processing method and an electronic device, the specific solution of which is as follows:

[0004] The first aspect of this application provides a processing method, including:

[0005] Obtain image sequences;

[0006] Detect multiple image frames in the image sequence to determine human pose data;

[0007] Based on the human posture data, determine whether the target object is in the target state;

[0008] In response to the target object being in the target state, a control command associated with the working mode corresponding to the target state is triggered.

[0009] In one possible implementation, detecting multiple image frames in the image sequence to determine human pose data includes:

[0010] A convolutional neural network is invoked to segment the image frame, resulting in a mask image of the image frame. Each pixel in the mask image is marked with a different pixel value to indicate the visible human body area, the human body occlusion area, and the background.

[0011] Based on the mask image and the image frame, determine the key point location information of the visible human body area and the human body outline of the occluded human body area;

[0012] The key point positioning information includes the coordinates of at least one key point on the human body; the human posture data includes key point positioning information and human body contour.

[0013] In one possible implementation, determining whether the target object is in a target state based on the human posture data includes:

[0014] Obtain the first inter-frame distance between corresponding key points between the adjacent preceding image frames of the image frame and the image frame;

[0015] Obtain the second inter-frame distance between the human body contour between the adjacent preceding image frames and the image frames;

[0016] In response to the fact that both the first inter-frame distance and the second inter-frame distance satisfy a preset state condition in the inter-frame detection of a continuous number of targets, the target object is determined to be in a target state.

[0017] In one possible implementation, based on the mask image and the image frame, determining the key point location information of the visible human body region and the human body contour of the occluded human body region includes:

[0018] Based on pixel values, the ratio of the number of pixels in the visible human body region to the total number of pixels in the mask image is calculated.

[0019] If the ratio is not less than the detection accuracy threshold, the key point location information of the visible human body region and the human body contour of the human body occlusion region are determined based on the mask image and the image frame.

[0020] In one possible implementation, determining the key point location information of the visible human body region based on the mask image and the image frame includes:

[0021] The visible human body region in the image frame is defined by the mask image to obtain a visible human body image;

[0022] The human pose recognition model is invoked to obtain the key point localization information based on the visible human image.

[0023] In one possible implementation, determining the human body contour of the occluded region based on the mask image and the image frame includes:

[0024] The human body occlusion area in the image frame is defined by the mask image to obtain a human body occlusion image;

[0025] The human body contour is obtained by adaptive filtering of the grayscale image of the occluded human body.

[0026] In one possible implementation, acquiring the image sequence includes:

[0027] Detect the current light intensity in the current environment;

[0028] In response to the current light intensity being less than the target light intensity threshold, a first type of image frame acquired by the first image sensor is obtained;

[0029] In response to the current light intensity being not less than the target light intensity threshold, a second type of image frame acquired by the second image sensor is obtained;

[0030] The image sequence includes a first type of image frames and / or a second type of image frames arranged in chronological order.

[0031] In one possible implementation, acquiring the image sequence includes:

[0032] In response to the arrival of a detection timing, the image sequence is acquired, wherein the detection timing is determined based on the target state features of the target object being photographed.

[0033] In one possible implementation, the control instructions include:

[0034] At least one of the following: a first control instruction for controlling an electronic device to enter the operating mode corresponding to the target state, or a second control instruction for controlling an associated device of the electronic device to enter the operating mode corresponding to the target state.

[0035] A second aspect of this application provides an electronic device, comprising: an image acquisition module, a posture detection module, a state recognition module, and a mode control module, wherein: the image acquisition module is used to acquire an image sequence; the posture detection module is used to detect multiple image frames in the image sequence to determine human posture data; the state recognition module is used to determine whether a target object is in a target state based on the human posture data; and the mode control module is used to trigger a control command associated with a working mode corresponding to the target state in response to the target object being in the target state.

[0036] Using the above technical solutions, the present application provides a processing method and electronic device that acquires an image sequence; detects multiple image frames in the image sequence to determine human posture data; determines whether the target object is in a target state based on the human posture data; and triggers a control command associated with the working mode corresponding to the target state in response to the target object being in a target state. Attached Figure Description

[0037] The above and other features, advantages, and aspects of the embodiments of this disclosure will become more apparent when taken in conjunction with the accompanying drawings and the following detailed description. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic, and the originals and elements are not necessarily drawn to scale.

[0038] Figure 1 A flowchart illustrating a processing method provided in this application;

[0039] Figure 2 A flowchart illustrating another processing method provided in this application;

[0040] Figure 3 A flowchart illustrating another processing method provided in this application;

[0041] Figure 4 A flowchart illustrating another processing method provided in this application;

[0042] Figure 5 A flowchart illustrating another processing method provided in this application;

[0043] Figure 6a A flowchart illustrating the specific implementation of a processing method provided in this application;

[0044] Figure 6b This is a schematic diagram illustrating the effect of pre-detection and segmentation provided in an embodiment of this application;

[0045] Figure 7 This is a schematic diagram of the structure of an electronic device provided in this application. Detailed Implementation

[0046] The embodiments of this application are described below with reference to the accompanying drawings. The terminology used in the implementation section of this application is for explaining specific embodiments only and is not intended to limit the scope of this application.

[0047] The embodiments of this application will now be described with reference to the accompanying drawings. Those skilled in the art will recognize that, with technological advancements and the emergence of new scenarios, the technical solutions provided in the embodiments of this application are equally applicable to similar technical problems.

[0048] The terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this application are used to distinguish similar elements and are not necessarily used to describe a specific order or sequence. It should be understood that such terms are interchangeable where appropriate; this is merely a way of distinguishing elements with the same properties in the description of embodiments of this application. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion, so that a process, method, system, product, or apparatus that comprises a series of units is not necessarily limited to those units, but may include other units not explicitly listed or inherent to those processes, methods, products, or apparatuses.

[0049] Reference Figure 1 , Figure 1 A flowchart illustrating a processing method provided in an embodiment of this application is shown below. Figure 1 As shown in the embodiment of this application, a processing method may include steps 101 to 104, which are described in detail below.

[0050] Step 101: Obtain the image sequence.

[0051] In this embodiment, an electronic device (such as a smartphone or tablet) acquires an image sequence through an image sensor, wherein the image sequence includes multiple image frames arranged in a time sequence.

[0052] In one alternative embodiment, the image frames can be acquired by a camera built into the electronic device. For example, in response to detecting that the user has not touched the screen for a long time, or in response to reaching a preset detection period, the electronic device initiates a sleep detection process, controls the camera to capture images, and obtains an image sequence.

[0053] Step 102: Detect multiple image frames in the image sequence to determine human pose data.

[0054] In this embodiment, multiple image frames in the image sequence are detected separately to determine human pose data. The human pose data is used to characterize the pose information of the target user. For example, the human pose data includes the position information of human key points and human contour information.

[0055] Optionally, before performing human pose data detection, human feature detection can be performed on the image frame to determine whether a human target, i.e., the human features of the target user, exists in the image frame. If a human target is detected, the human pose data detection continues. If no human target is detected, the human pose data detection ends, avoiding the need to perform subsequent, more complex human pose data detection in unmanned scenarios, thereby saving computing resources and power consumption.

[0056] Step 103: Based on human posture data, determine whether the target object is in the target state.

[0057] In this embodiment, the target state can be any preset state. Optionally, the preset states include a sleep state, a static state, and a movement state. Specifically, human posture data can intuitively reflect the degree of physical activity and posture change characteristics of the target subject. Different states correspond to different posture behaviors. For example, when a user is asleep, physical activity is significantly reduced, and the posture tends to be stable and maintained for a relatively long time. When a user is static, the change in body posture is minimal, exhibiting a relatively static characteristic. When a user is in a movement state, the posture exhibits a periodic change pattern, such as walking, running, and other movement patterns.

[0058] Based on this, by analyzing human posture data, it is possible to identify whether a user is in a target state. For example, if the human posture data of multiple consecutive image frames shows a stable posture that is maintained for a relatively long time, it is determined that the target subject is in a sleeping state; if the human posture data shows a characteristic of maintaining the same posture for a long time, it is determined that the target subject is in a stationary state; if the human posture data shows a characteristic of periodic changes, it is determined that the target subject is in a moving state.

[0059] Step 104: In response to the target object being in the target state, trigger the control command associated with the working mode corresponding to the target state.

[0060] In this embodiment, the working mode corresponding to the target state includes at least one type. Different working modes can have different mode characteristics. For example, the working modes corresponding to the sleep state include sleep mode, low power mode, and hibernation mode with different power consumption levels. As another example, the working modes corresponding to the exercise state include exercise monitoring mode and health data recording mode with different functional configurations. The exercise monitoring mode is used to monitor exercise data in real time, and the health data recording mode is used to record real-time exercise data.

[0061] In this embodiment, upon determining that the target object is in a target state, a control command associated with the corresponding working mode of the target state is triggered. This control command is used to control the relevant device to enter the working mode corresponding to the target state. For example, when the target object is determined to be in a sleep state, the electronic device triggers a control command associated with the working mode corresponding to the sleep state. This control command can be used to control the electronic device to enter sleep mode, low-power mode, or hibernation mode to save power. As another example, when the target object is determined to be in motion, the electronic device triggers a working mode corresponding to the motion state. This control command can be used to control the electronic device to enter motion monitoring mode and health data recording mode.

[0062] As can be seen from the above technical solutions, the processing method provided in this application collects image sequences, detects and determines human posture data in the image frames, determines whether the target object is in a target state based on the human posture data, and triggers control commands associated with the working mode corresponding to the target state in response to the target object being in a target state. Since human posture data can intuitively reflect the user's activity state, analyzing human posture data can accurately identify whether the user has entered a target state, thereby realizing intelligent perception of the user's target state. Furthermore, when the target object is in a target state, triggering control commands associated with the working mode corresponding to the target state can promptly switch the device or associated devices to a working mode that matches the target state, avoiding resource waste. Compared with state recognition methods based on fixed rules, this application uses real-time human posture data as the basis for state recognition, which can effectively avoid misjudgment and improve the intelligence level of the device and the user experience.

[0063] Based on the above embodiments, see Figure 2 , Figure 2 This application provides another processing method in its embodiments. Figure 2 The following illustrates the specific implementation process of step 101, namely, obtaining the image sequence: Figure 2As shown in the embodiment of this application, a processing method may include steps 201 to 204, which are described in detail below.

[0064] Step 201: In response to the arrival of the detection time, trigger the acquisition of the image sequence.

[0065] In this embodiment, the detection timing is determined based on the target state features of the target shooting object. The target state features are used to characterize the historical behavioral habits of the target shooting object related to the pre-target state. Taking the target state as a sleep state as an example, the target state features may include sleep time features and sleep light intensity features. The sleep time features may include the user's historical average sleep time, and the sleep light intensity features may include the user's historical average sleep light intensity.

[0066] Specifically, the detection timing is dynamically determined based on the degree of matching between the current environmental information and the target state characteristics. For example, the electronic device acquires the current time and current light intensity, calculates the time difference between the current time and the sleep time characteristic, and the light intensity difference between the current light intensity and the sleep light intensity characteristic. Based on the time difference and light intensity difference, the detection cycle is determined, i.e., the interval between two adjacent detections. The detection cycle length is positively correlated with the light intensity difference; that is, the greater the difference between the current light intensity and the user's usual sleep light intensity, the longer the detection cycle. Conversely, the detection cycle length is negatively correlated with the time difference; that is, the closer the current time is to the user's usual sleep time, the shorter the detection cycle. In summary, increasing the detection frequency during the time when the user is likely to fall asleep and decreasing the detection frequency during non-sleep times ensures both timely detection and effective savings in computing resources and power consumption.

[0067] It is understandable that target state features are not limited to sleep time features and sleep light intensity features, but can also include other features related to user behavior habits, such as the user's historical activity patterns and differences in work and rest on weekdays and holidays. These features can be used to dynamically adjust the detection timing to achieve personalized intelligent detection.

[0068] Step 202: Detect the current light intensity of the current environment.

[0069] In this embodiment, the ambient light intensity of the current environment is detected by an ambient light sensor. The ambient light sensor can be integrated into the electronic device to sense the lighting conditions of the surrounding environment in real time.

[0070] Step 203: In response to the current light intensity being less than the target light intensity threshold, acquire the first type of image frame collected by the first image sensor.

[0071] In this embodiment, the detected current light intensity is compared with a preset target light intensity threshold. If the current light intensity is less than the target light intensity threshold, it indicates that the ambient light is dim. In this case, a first image sensor is selected to acquire an image frame, which is recorded as a first type of image frame. Optionally, the first image sensor is an infrared light sensor, and the corresponding first type of image frame is an infrared image. The infrared light sensor can still effectively image under low light or even no light conditions, ensuring that clear user images can be acquired at night or in low-light environments.

[0072] Step 204: In response to the current light intensity being no less than the target light intensity threshold, acquire the second type of image frame collected by the second image sensor.

[0073] In this embodiment, if the current light intensity is not less than the target light intensity threshold, it indicates that the ambient light is sufficient. A second image sensor is then selected to acquire image frames, which are denoted as second-type image frames. Optionally, the second image sensor is a visible light image sensor, and the corresponding second-type image frames are visible light images. Visible light image sensors can provide images with rich colors and clear details in daylight or well-lit environments.

[0074] In summary, the image sequence includes first-type image frames and / or second-type image frames arranged in chronological order. Under different lighting conditions, the image sequence can be composed of a single type of image frame or a mixture of the two types of image frames, depending on the changes in lighting conditions during the detection process.

[0075] It should be noted that the first and second image sensors can be built-in sensor components of the electronic device or external image acquisition devices independent of the electronic device. In an IoT home environment, the external image acquisition device can be a camera integrated into a smart camera, smart doorbell, smart peephole, or smart home appliance (such as a smart refrigerator or smart TV), or a security monitoring device deployed in the home environment. The electronic device obtains the image data it acquires by establishing a communication connection with the external device. Furthermore, the image sensor can be in a continuous acquisition state, with the electronic device directly acquiring its output image frames, or the electronic device can trigger the image sensor to acquire data when detection is needed; this application does not limit this.

[0076] As can be seen from the above technical solutions, the processing method provided in this application embodiment detects the light intensity of the current environment and adaptively selects an infrared light sensor or a visible light sensor for image acquisition, ensuring that effective image data can be obtained under all weather and multi-light conditions, providing high-quality image sequences for subsequent human posture data detection, and significantly improving environmental adaptability and detection reliability.

[0077] Furthermore, by dynamically determining the detection timing based on the historical behavioral habits of the target subject, the detection frequency is increased during the time when the user is likely to fall asleep, and decreased during the non-sleep time. This achieves on-demand allocation of detection resources. Compared with the fixed-cycle detection method, the adaptive detection timing determination method not only ensures the timeliness of detection, but also effectively saves computing resources and power consumption, thereby improving the intelligence level of the device and the user experience.

[0078] Based on this, the processing method provided in this application further includes: after determining that the target object is in the target state, updating the target state features of the target object. Specifically, when the electronic device determines that the target object has entered the target state, it records detection information such as the detection time and the current ambient light intensity, and fuses the recorded detection information with the stored target state features to update the historical habit data of the target object, thereby updating the target state features of the target object. For example, the detection time is included in the historical sleep time series, and the average sleep time is recalculated; the light intensity detected is included in the historical sleep light intensity series, and the average sleep light intensity is recalculated. The updated target state features will be used for the dynamic determination of the next detection timing, thus achieving continuous adaptive optimization of the target state features.

[0079] In summary, this application can continuously learn and adapt to changes in user behavior and habits. Taking sleep as an example, as the seasons change or the user's sleep schedule is adjusted, the user's sleep time and preferred light intensity may change. Through real-time updates after each successful detection, the target state characteristics can reflect the user's latest habits in a timely manner, thereby dynamically adjusting the detection timing and making the intelligent sleep function more and more accurate with use, thus improving the long-term user experience of the device.

[0080] Further, see Figure 3 , Figure 3 This is a flowchart illustrating another processing method provided in an embodiment of this application. Figure 3 The following illustrates the specific implementation process of step 102, which involves detecting multiple image frames in an image sequence to determine human pose data. Figure 3 As shown in the embodiment of this application, a processing method may include steps 301 to 302, which will be described in detail below.

[0081] Step 301: Call the convolutional neural network to segment the image frames and obtain the mask image of the image frames.

[0082] In this embodiment, each pixel in the mask image is marked with a different pixel value to indicate the visible human body area, the human body occlusion area, and the background.

[0083] Optionally, the convolutional neural network is a pre-trained lightweight convolutional neural network (such as MobileNet V3). Lightweight convolutional neural networks (such as MobileNet V3) have the characteristics of low computational cost and fast inference speed, making them suitable for deployment on resource-constrained mobile devices.

[0084] Specifically, the electronic device uses a lightweight convolutional neural network to perform image segmentation on the image frame. This convolutional neural network infers from the image frame to obtain a mask image corresponding to it. The pixel value of each pixel in the mask image identifies the region category to which that pixel belongs in the original image frame. For example, a pixel with a value of 1 is marked as a visible human body region, i.e., an unobstructed part of the human body (such as a face or an arm exposed by a blanket); a pixel with a value of 2 is marked as an occluded human body region, i.e., a part of the human body obscured by an object (such as a blanket); and a pixel with a value of 0 is marked as a background region, i.e., a scene region that does not belong to the human body.

[0085] Step 302: Based on the mask image and image frame, determine the key point location information of the visible human body area and the human body contour of the occluded human body area.

[0086] In this embodiment, the human posture data includes key point localization information and human body contour. Key point localization information includes the coordinates of at least one key human body point, such as the positions of limb joints, fingers, facial contours, and eyes, used to accurately represent the specific posture of the user's visible parts. The human body contour is used to represent the shape features of occluded body parts, reflecting the user's overall posture.

[0087] It should be noted that different techniques can be used to detect visible human body regions and human body occlusion regions. For example, key points in visible human body regions can be located using pose estimation models, while human body contours in occlusion regions can be extracted using image processing methods.

[0088] As can be seen from the above technical solutions, compared with existing human pose detection schemes that typically run pose estimation models directly on the entire image, resulting in high computational cost and limited detection accuracy in occluded scenes, the processing method provided in this application segmentes image frames by calling a convolutional neural network. The image is divided into visible human body regions, occluded human body regions, and background. Based on the image frames and mask images, keypoint localization is performed in the visible human body regions, and contour extraction is performed in the occluded human body regions. By introducing a lightweight convolutional neural network for pre-segmentation, the scope of human pose detection is reduced from the entire image to a specific region, reducing computational cost. Furthermore, the complex pose detection task is broken down into specialized detection methods targeting different region characteristics, improving detection accuracy and robustness.

[0089] Furthermore, in an optional embodiment, before determining the key point location information and human body contour, a step of judging the effectiveness of human body detection is included. Specifically, based on the pixel values of each pixel in the mask image, the number of pixels in the visible human body region is counted, and the ratio of this number to the total number of pixels in the mask image is calculated. This ratio reflects the area proportion of the visible human body region in the entire image. If the ratio is not less than a preset detection accuracy threshold, it indicates that there is a sufficient area of effective human body region in the image frame to support subsequent pose data detection, and the steps of determining key point location information and human body contour continue. If the ratio is less than the preset detection accuracy threshold, it indicates that the visible human body region in the image frame is too small, which may be due to reasons such as the user being too far away, camera angle deviation, or the user not entering the shooting range. At this time, it is determined that no effective target is detected in the current image frame, the image frame is recorded as invalid, the current detection process ends, and the detection is restarted at the next detection opportunity.

[0090] Optionally, if multiple consecutive detections result in a ratio less than the detection accuracy threshold, the electronic device can issue a prompt to guide the user to adjust the device position or shooting angle. For example, it can display a pop-up window on the screen saying "No valid target detected, please adjust the camera position," or remind the user via voice announcement. Simultaneously, the electronic device can record an error log, noting the time and frequency of invalid detections for user review or subsequent analysis.

[0091] It should be noted that step 302, namely determining the key point localization information of the visible human body region and the human body contour of the occluded human body region based on the mask image and image frame, includes various methods. For example, in one optional embodiment, the detection process of key point localization information and human body contour is executed sequentially. In another optional embodiment, in order to improve processing efficiency, the detection process of determining the key point localization information of the visible human body region and the human body contour of the occluded human body region based on the mask image and image frame is executed in parallel. Specifically, the electronic device can allocate different detection processes to different processing units (such as the neural network processing unit NPU for key point localization and the central processing unit CPU for contour extraction) or different sub-threads for parallel processing, thereby further improving detection efficiency.

[0092] See Figure 4 , Figure 4 This is a flowchart illustrating another processing method provided in an embodiment of this application. Figure 4 This illustrates a specific method for parallel execution of key point localization information and human contour detection processes, such as... Figure 4 As shown, this method specifically includes:

[0093] Step 401: Define the visible human body region in the image frame using a mask image to obtain a visible human body image.

[0094] In this embodiment, a mask image is used to define the visible human body region of the original image frame. That is, only the part marked as the visible human body region in the mask image is retained, and the human body occlusion area and background are filtered out, thereby obtaining a visible human body image.

[0095] In one optional embodiment, the specific implementation method for obtaining a visible human body image by defining the visible human body region in an image frame using a mask image is as follows:

[0096] First, a first binary mask is generated based on the mask image. This first binary mask maps pixels with a value of 1 (corresponding to the visible human body area) in the mask image to 1, and pixels with a value of 0 (corresponding to the background) and a value of 2 (corresponding to the human body occlusion area) to 0. Therefore, in the first binary mask, only pixels corresponding to the visible human body area have a value of 1, while all other areas have a value of 0.

[0097] Then, a bitwise AND operation is performed between the first binary mask and the original image frame. The resulting image is the visible human body image. After performing the bitwise AND operation on each corresponding pixel, the pixels in the image frame corresponding to a pixel value of 1 in the first binary mask remain unchanged, while the pixels corresponding to a pixel value of 0 in the first binary mask are set to 0. The resulting image is the visible human body image, which retains only the visible human body area from the original image frame, while the background and areas occluded by the human body are filtered out.

[0098] Step 402: Call the human pose recognition model to obtain key point localization information based on the visible human body image.

[0099] In this embodiment, a pre-trained human pose recognition model is invoked. A visible human image is used as input, and the coordinate information of key points of the human body in the visible human image is obtained through reasoning by the human pose recognition model. That is, key point positioning information is used to accurately represent the specific pose of the user's visible parts.

[0100] Specifically, a human pose recognition model (such as Openpose) is invoked to perform inference on visible human images. The human pose recognition model takes the visible human image as input and outputs the coordinates of key points in the visible human image, including the positions of limb joints, fingers, facial contours, and eyes. This human pose recognition model can run on a neural network processing unit (NPU), utilizing the parallel computing capabilities of the NPU to accelerate the inference process and ensure real-time detection on mobile devices.

[0101] Step 403: Define the human body occlusion area in the image frame using a mask image to obtain the human body occlusion image.

[0102] In this embodiment, a mask image is used to define the human body occlusion area of the original image frame. Only the part marked as the human body occlusion area in the mask image is retained, and the visible human body area and background are filtered out, thereby obtaining the human body occlusion image.

[0103] In one optional embodiment, the specific implementation method for obtaining the human body occlusion image by defining the human body occlusion area in the image frame using a mask image is as follows:

[0104] First, a second binary mask is generated based on the mask image. This second binary mask maps pixels with a value of 2 (corresponding to the human body occlusion area) in the mask image to 1, and pixels with a value of 0 (corresponding to the background) and a value of 1 (corresponding to the visible human body area) to 0. Therefore, in the second binary mask, only the pixels corresponding to the human body occlusion area have a value of 1, while the pixels in the remaining areas have a value of 0.

[0105] Then, the second binary mask is bitwise ANDed with the original image frame, that is, a logical AND operation is performed on each corresponding pixel. The pixels in the original image frame corresponding to the pixel value of the second binary mask being 1 remain unchanged, while the pixels corresponding to the pixel value of the second binary mask being 0 are set to 0.

[0106] Furthermore, the largest bounding box of valid pixels in the human occlusion region is calculated, and this bounding box is used to crop the bitwise ANDed image to obtain the human occlusion image. Valid pixels refer to pixels with a value of 1 on the second binary mask. Thus, the cropping operation further reduces the image size, removes invalid blank areas, and reduces the amount of data required for subsequent processing.

[0107] Step 404: Perform adaptive filtering on the grayscale image of the occluded human body to obtain the human body contour.

[0108] In this embodiment, the image of a human body occlusion undergoes grayscale conversion and adaptive filtering to extract the human body contour. The adaptive filtering dynamically adjusts the filtering parameters based on image quality, effectively removing noise while preserving edge details, resulting in a clearer human body contour.

[0109] Specifically, the occluded human image is first converted to grayscale. Then, the OpenCV library is used to perform adaptive spatial filtering on the grayscale image to obtain the human contour. The filtering parameters (such as the filtering radius) can be dynamically adjusted according to the current ambient light intensity. The filtering radius is negatively correlated with the current light intensity; the higher the light intensity, the smaller the filtering radius to retain more details, and the lower the light intensity, the larger the filtering radius to enhance noise reduction. Thus, the adaptive mechanism can adapt to differences in image quality under different lighting conditions, improving the robustness of contour extraction.

[0110] In summary, steps 401-402 illustrate a specific method for determining key point localization information of visible human body regions based on mask images and image frames, and steps 403-404 illustrate a specific method for determining the human body contour of occluded regions based on mask images and image frames.

[0111] As can be seen from the above technical solution, this application divides the human body region into visible human body regions and human body occlusion regions, adopts appropriate technical means for the characteristics of the two types of regions, and executes the two detection paths in parallel, significantly improving detection efficiency by utilizing the heterogeneous computing resources of electronic devices. Furthermore, this application combines neural networks with traditional image processing. For visible human body regions, it utilizes the high precision advantage of deep learning to locate key points through a human pose recognition model. For occlusion regions, it leverages the low computational cost and strong robustness to occlusion scenes of traditional image processing methods, using adaptive filtering for contour extraction. This overcomes the shortcomings of using only neural networks to detect occluded user scenes and also compensates for the high false positive rate of traditional image processing alone. While ensuring detection efficiency, it significantly improves the robustness and accuracy of detection, enabling it to adapt to more complex and diverse detection scenarios.

[0112] Further, see Figure 5 , Figure 5 This is a flowchart illustrating another processing method provided in an embodiment of this application. Figure 5 The following illustrates the specific implementation process of step 103, which involves determining whether the target object is in a target state based on human posture data. Figure 5 As shown in the embodiment of this application, a processing method may include steps 501 to 503, which are described in detail below.

[0113] Step 501: Obtain the first inter-frame distance between corresponding key points in the adjacent preceding image frames and between image frames.

[0114] In this embodiment, based on the key point coordinate information in the human posture data, the distance between the current image frame and its adjacent preceding image frames, corresponding to the key points of the human body, is calculated as the first inter-frame distance. It can be understood that the first inter-frame distance is used to characterize the degree of positional change of key body parts between two adjacent detections.

[0115] In one optional embodiment, the first inter-frame distance can be the average point-to-point distance based on Euclidean distance. Specifically, the coordinate set of human body key points in the current image frame and the coordinate set of corresponding key points in adjacent preceding image frames are obtained. The Euclidean distance between each pair of corresponding key points is calculated, and the average distance of all key point pairs is averaged to obtain the average point-to-point distance, which is used as the first inter-frame distance.

[0116] It should be noted that the average point-to-point distance is only one optional distance measurement method. In other embodiments, the maximum point-to-point distance (i.e., the maximum value of the distances between all keypoint pairs) or other distance measurement methods can also be used as the first inter-frame distance. This application does not limit this.

[0117] Step 502: Obtain the second inter-frame distance between adjacent preceding image frames and between image frames, and between human body contours.

[0118] In this embodiment, the degree of difference in the human body contour between the current image frame and its adjacent preceding image frames is calculated as the second inter-frame distance. The second inter-frame distance is used to characterize the degree of shape change of the human body contour between two adjacent detections.

[0119] In one optional embodiment, the second inter-frame distance can be the average symmetric surface distance. Specifically, the human body contour of the image frame, i.e., the shortest distance from each point on the current contour to the previous contour, and the human body contour of the adjacent previous image frame, i.e., the shortest distance from each point on the previous contour to the current contour, are calculated. Then, the average of all the shortest distances is calculated to obtain the average symmetric surface distance, which is used as the second inter-frame distance.

[0120] It should be noted that the average symmetric surface distance is only one optional distance metric. In other embodiments, other distance metrics such as shape context distance and Hausdorff distance can also be used as the second inter-frame distance. This application does not limit this.

[0121] Step 503: In response to the fact that the first inter-frame distance and the second inter-frame distance both meet the preset state conditions in the inter-frame detection of a continuous number of targets, determine that the target object is in the target state.

[0122] In this embodiment, the state conditions are preset based on the state characteristics of the target state. When the first inter-frame distance and the second inter-frame distance both continuously meet the state conditions during the inter-frame detection of a continuous number of targets, it indicates that the posture change characteristics of the target object remain stable within a preset time period and conform to the state characteristics corresponding to the target state. Therefore, it is determined that the target object is in the target state.

[0123] Taking a sleep detection scenario as an example, the first and second inter-frame distances obtained in each frame interval are compared with corresponding preset thresholds to determine whether preset state conditions are met. State conditions include: the first inter-frame distance is less than the first threshold, and the second inter-frame distance is less than the second threshold. When a consecutive number of target frames meet the state conditions, the target object is determined to be in the target state. Specifically, if the current frame interval meets the state condition, it is determined that the target object has remained stationary during the current detection interval, and the stationary count is incremented by 1. If the stationary count reaches a preset target number (e.g., 20 consecutive times), it is determined that the user has entered a sleep state. If any frame interval in a consecutive target number of frames does not meet the state condition, it is determined that the user has not remained stationary, the stationary count is reset to zero, and the counting restarts. Therefore, by setting a continuous multiple-judgment mechanism, misjudgments caused by brief periods of user stillness can be effectively avoided, improving the accuracy of sleep state determination.

[0124] In one optional embodiment, the target state is divided into multiple stages according to its degree, and detection is performed stage by stage. Taking sleep detection as an example, sleep states can be divided into resting state, light sleep state, and deep sleep state, with each stage corresponding to different duration requirements. Specifically, if the user is detected to be in a resting state for 10 consecutive minutes, the user is determined to have entered a light resting state; if the user is detected to be in a resting state for 15 consecutive minutes, the user is determined to have entered a light sleep state; and if the user is detected to be in a resting state for 20 consecutive minutes, the user is determined to have entered a deep sleep state. Thus, the hierarchical identification of the target state is achieved by accumulating the duration of satisfying the state conditions.

[0125] As can be seen from the above technical solution, this application achieves comprehensive perception of human posture changes by simultaneously acquiring the first inter-frame distance between key points of the human body and the second inter-frame distance between human body contours. This allows for a comprehensive assessment of the user's state from two dimensions: changes in key point position and changes in contour shape. Specifically, the key point distance can sensitively capture minute movements of the user's limbs, while the contour distance reflects the stability of the user's overall posture, thus reducing the misjudgment rate caused by relying on a single dimension for state determination. Furthermore, by setting a determination mechanism where the state conditions are met for a consecutive number of targets across frames, the accuracy and reliability of target state determination are improved.

[0126] In one optional embodiment, the control instruction includes at least one of a first control instruction for controlling the electronic device to enter the operating mode corresponding to the target state, or a second control instruction for controlling the associated device of the electronic device to enter the operating mode corresponding to the target state.

[0127] In this embodiment, the first control command is used to control the electronic device itself to enter the working mode corresponding to the target state, and the second control command is used to control the associated devices of the electronic device to enter the working mode corresponding to the target state. Specifically, when it is determined that the target object is in the target state, the electronic device outputs the first control command to control the electronic device itself to switch to a working mode matching the target state, such as adjusting the power consumption level, changing the operating state, or executing a preset function. At the same time, the electronic device outputs the second control command to control the associated devices to switch to a working mode matching the target state. The associated devices can be smart home devices or wearable devices pre-registered and bound to the electronic device, and the associated devices and the electronic device communicate via wired or wireless communication.

[0128] For example, after determining that the target object is in a sleep state, a first control command is output to control the electronic device to enter a low-power mode or shut down to save power. A second control command is output, which controls the smart speaker to shut down or the smart lamp to turn off via Bluetooth. In summary, by triggering multiple control commands associated with the sleep mode, the electronic device itself and all associated devices can work together to enter sleep mode, creating a comfortable sleep environment for the user.

[0129] It should be noted that when the target state is divided into multiple stages according to the degree of the state, the control commands associated with the working mode corresponding to the target state can also be responded to in a hierarchical manner according to the stage of the target state.

[0130] Taking sleep detection as an example, when the system determines that the user has entered a light resting state, it triggers the corresponding working mode, i.e., the low-power mode, and the control commands associated with that state are activated. When the system determines that the user has entered a light sleep state, it triggers the corresponding working mode, i.e., the hibernation mode, and when the system determines that the user has entered a deep sleep state, it triggers the corresponding working mode, i.e., the sleep mode, and the control commands associated with that state are activated. This association of state stages with response levels enables phased identification and response to target states, improving device control accuracy and intelligence.

[0131] Based on the above embodiments, the processing method provided in this application can be specifically applied to sleep detection scenarios (i.e., the target state is sleep). Considering that users of electronic devices (such as smartphones and tablets) may enter a sleep state while using electronic devices, causing the device to run continuously, wasting power resources, and even affecting the user's sleep quality, it is necessary to use sleep detection to regulate the sleep state of smart devices. In existing sleep detection solutions, most solutions require the camera to be facing the user's face to detect eye status, which is not applicable to home sleep scenarios where the user is lying on their side, prone, or with obstructed vision. Furthermore, existing sleep detection solutions use fixed detection methods, without combining the user's own habits and historical data, and without dynamically adjusting the detection sensors and parameters according to the environment during detection, resulting in a high false positive rate and poor adaptability. Based on this, the processing method provided in this application determines whether the user has entered a sleep state by detecting the user's joint coordinate group and human body contour. After the user falls asleep, the user's sleep detection data is recorded to adjust the parameters for the next detection and adjust the home appliances associated with the device (such as turning off the table lamp, raising the air conditioner temperature, etc.). Finally, the electronic device automatically enters a low-power state or shuts down.

[0132] In a sleep monitoring scenario, before a user prepares to fall asleep, the electronic device is placed on a nearby table or bedside table to ensure that the camera's field of view covers the user. Figure 6a A flowchart illustrating the specific implementation of a processing method provided in this application is shown below. Figure 6a As shown, this method specifically includes:

[0133] 1. Detection trigger.

[0134] Specifically, when the system detects that the user has not touched the screen for an extended period, the upper-level framework of the device's operating system calls the camera hardware abstraction layer (HAL) to initiate the sleep detection process. The camera HAL calls the ambient light sensor to detect the current ambient light intensity ρ. If ρ is less than a threshold T1, an infrared light sensor is selected as the image sensor (ISP sensor) for taking a picture; otherwise, a visible light sensor is used. Based on this, the dynamic selection of dual sensors can adapt to the different sleep scenarios of different users. That is, some users prefer a lit environment, while others prefer a dark environment. A single sensor cannot meet all scenarios. Almost all electronic devices have visible light sensors, and infrared light sensors are easy to integrate and inexpensive.

[0135] Configure the initialization counter variable num = 0 to record the current detection round.

[0136] 2. Image frame pre-detection and segmentation.

[0137] The camera HAL calls upon the selected image sensor (ISP sensor) to capture a pre-detection map Ppre.

[0138] The camera HAL calls the Neural Processing Unit Hardware Abstraction Layer (NPU HAL) through an interface, passing the image data descriptor of the detection map Ppre to the NPU HAL.

[0139] The NPU HAL sends messages to the Neural Processing Unit (NPU) subsystem through the Neural Processing Unit driver, invoking the locally deployed MobileNet V3 model (i.e., the MobileNet V3 model) on the electronic device to complete a one-frame inference task on Ppre, obtaining the human body segmentation mask of Ppre. MobileNet V3 is a lightweight object segmentation model that divides Ppre into three parts, corresponding to three pixel values on the mask: regions with a pixel value of 1: unobstructed human body parts, such as the face or arms outside a blanket; regions with a pixel value of 2: obstructed body parts, such as the torso covered by a blanket; and regions with a pixel value of 0: the background.

[0140] The NPU HAL performs pixel statistics on the unobstructed human body portion of the mask, obtaining the ratio 'a' of the number of pixels in the unobstructed human body portion to the total number of pixels. If 'a' is less than the OpenPose detection accuracy, the NPU HAL returns that the camera HAL did not capture a valid target, ending the current sleep detection round and waiting 10 minutes before resuming the next sleep detection round. This avoids performing subsequent, more complex detections in unoccupied scenes. If 'a' is not less than the OpenPose detection accuracy, it enters the parallel branch detection.

[0141] 3. Human posture data detection.

[0142] Two sub-threads are created to perform pre-detection and segmentation in parallel: sub-thread 1 for detecting unobstructed human body parts and sub-thread 2 for detecting occluded human body parts. Combining the characteristics of both sub-threads yields more accurate detection results. The two sub-threads can execute in parallel, improving detection efficiency.

[0143] Figure 6b This is a schematic diagram illustrating the effect of pre-detection and segmentation provided in an embodiment of this application, combined with... Figure 6a and Figure 6bThe detection process for unobstructed human body parts, i.e., visible human body regions, using sub-thread 1 includes:

[0144] The NPU HAL performs a bitwise AND operation on the mask and Ppre, retaining only the unobstructed portion of the human path, i.e., the areas in the mask with a pixel value of 1, and filling the rest with 0. The result is denoted as Pfast. Only the effective detection area is retained in Ppre, so that the hardware acceleration function of the neural network processing unit can be used in subsequent steps to improve the detection frame rate, while also defining the detection range to obtain higher accuracy.

[0145] The NPU HAL calls the NPU subsystem to infer the Openpose model, which completes a frame inference task for Pfast and returns the joint coordinate set arr of the detected Pfas. This joint coordinate set includes the coordinates of key points such as limb joints, fingers, facial contours, and eyes.

[0146] In this embodiment, the detection process for occluded body parts, i.e., occluded areas of the human body, using sub-thread 2 includes:

[0147] The NPU HAL calculates the largest bounding box of the occluded body portion in the mask, uses this largest bounding box to clip Ppre, and records the result as Psmall.

[0148] The NPU HAL calls the OpenCV open-source computer vision library to calculate adaptive spatial filtering of the grayscale image Psmall. It then performs a bitwise AND operation (AND) between Psmall and the mask, retaining only the occluded body parts—that is, the areas in the mask with pixel values of 2, filling the rest with 0s. The result is denoted as Pbody. It should be noted that the filtering radius r is related to the current illumination intensity ρ: the larger ρ is, the smaller r is. Considering that traditional image detection techniques are more suitable for images with simple contours and few details, retaining only the occluded body parts can eliminate other interference and improve detection accuracy.

[0149] In summary, the approximate range of the human body occlusion area is first calculated based on the mask image (e.g., using the bounding box of the region with a pixel value of 2 in the mask image). This bounding box is then cropped to obtain a sub-image. Finally, a bitwise AND operation is performed between the cropped sub-image and its corresponding mask sub-region. This pre-cropping significantly reduces the amount of data processed in the subsequent bitwise AND operation, further improving computational efficiency.

[0150] It should be noted that the execution order of the bitwise AND operation and the trim operation can also be AND operation first, followed by trim operation, such as... Figure 6bAs shown, the human occlusion area is separated from the original image using a bitwise AND operation. At this point, the image size is the same as the original image, but most areas (the visible human body area and the background) have been set to 0. Then, the largest bounding box of the effective pixels (areas with non-zero pixel values) is calculated, and this bounding box is cropped to obtain a smaller image of the occluded human body. Therefore, by performing the bitwise AND operation at the original image size, followed by cropping, all invalid areas can be accurately removed.

[0151] After completing a human pose data detection, the NPU HAL returns the descriptors of the camera HAL joint coordinate set arr and the human body contour Pbody.

[0152] 4. Determining whether a state is at rest or in hibernation.

[0153] After the camera HAL receives the descriptors of the joint coordinate set arr and the human body contour Pbody, it increments the count value num by 1 and checks whether the value of num is equal to 1. If it is, it means that this is the first detection, and it needs to be timed t minutes before initiating the second detection. If not, the camera HAL retrieves the joint coordinate set and human body contour returned by the NPU HAL last time, calculates the Euclidean distance d (first inter-frame distance) between two adjacent joint coordinate sets arr, and the inter-frame difference sum of the human body contour Pbody (second inter-frame distance).

[0154] If d and sum are less than thresholds T2 and T3 respectively, it is considered that the user has remained stationary between two consecutive detections. At this time, it is determined whether num is not greater than 3 (i.e., whether it has been detected 3 times consecutively). If so, the next human posture data detection is performed after counting t minutes. If num is greater than 3 (i.e., the user has been stationary in 3 consecutive detections), it means that the user has met the sleep determination condition and enters the sleep preparation mode.

[0155] If d and sum cannot simultaneously meet the condition of being less than the corresponding threshold, it means that the user did not remain still between two adjacent detections, and then waits for 10 minutes before re-entering the next round of sleep detection.

[0156] In this embodiment, the selection of the detection interval t is related to the current detection time, the current light intensity, and the user's historical habits. It is obtained by looking up a table, and Table 1 is the lookup table for the detection interval t:

[0157] Table 1

[0158] ;

[0159] In Table 1, ΔT represents the current detection time minus the average time it takes for users to fall asleep (unit: minutes), and Δρ represents the current light intensity minus the average light intensity required for users to fall asleep (unit: lux).

[0160] Table 2 shows the user's sleep habits:

[0161] Table 2

[0162] ;

[0163] Table 2 records the user's average sleep onset time and average sleep light intensity. To protect user privacy, the user's sleep habit table is stored locally on the device and updated after each successful detection of the user falling asleep.

[0164] 5. Enter the work mode corresponding to sleep state.

[0165] Specifically, the electronic device performs the following three steps:

[0166] You can use Bluetooth to switch connected home appliances to sleep mode, such as turning off table lamps, adjusting air conditioner temperature, and setting alarms.

[0167] Record the information of this test, update the test time and light intensity to the user habit table, so as to dynamically select the test interval t for the next sleep test, and record the user's human joint coordinate group arr to assist the user health management function in electronic devices.

[0168] Close all applications running on your electronic device, enter low-power mode, or shut down.

[0169] As can be seen from the above technical solutions, the processing method provided in this application embodiment has the following technical effects in the sleep detection scenario:

[0170] This application employs dual-sensor adaptive selection to ensure effective image data acquisition under all weather and multi-light conditions. Lightweight convolutional neural networks are used for image pre-segmentation, dividing the human body region into unobstructed and occluded areas. Adaptive detection methods, including keypoint localization and contour extraction, are applied to each type of region, with the two paths executed in parallel, significantly improving detection efficiency and robustness. By recording and utilizing the user's historical sleep habits to dynamically adjust detection parameters, personalized intelligent sleep is achieved. The user's static state is determined by continuous multi-frame pose changes, combined with adaptive thresholds for sleep determination, effectively avoiding false alarms. Once the user is confirmed to be asleep, dual control—device self-sleep and linkage with associated home appliances—creates a comfortable sleep environment while saving power. Therefore, this application significantly improves the accuracy, adaptability, and user experience of intelligent sleep in electronic devices.

[0171] The above describes a processing method provided by an embodiment of this application. The following will describe the related apparatus for performing the above processing method.

[0172] This application also provides an electronic device. Figure 7 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application, such as... Figure 7 As shown, the electronic device 700 includes: an image acquisition module 701, an attitude detection module 702, a state recognition module 703, and a mode control module 704, wherein:

[0173] The image acquisition module is used to acquire image sequences;

[0174] The pose detection module is used to detect multiple image frames in the image sequence to determine human pose data;

[0175] The state recognition module is used to determine whether the target object is in the target state based on the human posture data;

[0176] The mode control module is used to trigger control commands associated with the working mode corresponding to the target state in response to the target object being in the target state.

[0177] It should be noted that the specific functions of each hardware module can be found in the above embodiments, and will not be repeated in this embodiment.

[0178] It should be further noted that the electronic device includes at least one processor and a memory connected to the processor, wherein: the memory is used to store a computer program; the processor is used to execute the computer program to enable the electronic device to implement the processing method of the first aspect or any implementation thereof described above. The electronic device in the embodiments of this application may include, but is not limited to, fixed terminals such as mobile phones, laptops, PDAs (personal digital assistants), PADs (tablet computers), desktop computers, etc. Optionally, the electronic device may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.), which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) or a program loaded from a storage device into a random access memory (RAM). When the electronic device is powered on, the RAM also stores various programs and data required for the operation of the electronic device. The processing device, ROM, and RAM are interconnected via a bus. Input / output (I / O) interfaces are also connected to the bus.

[0179] Typically, the following devices can be connected to an I / O interface: input devices such as touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices such as liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices such as memory cards, hard drives, etc.; and communication devices. Communication devices allow electronic devices to communicate wirelessly or wiredly with other devices to exchange data.

[0180] This application also provides a computer program product including computer-readable instructions, which, when executed on an electronic device, cause the electronic device to implement any of the processing methods provided in this application.

[0181] This application also provides a computer-readable storage medium that carries one or more computer programs. When the one or more computer programs are executed by an electronic device, the electronic device can implement any of the processing methods provided in this application.

[0182] It should be noted that the human image data and human posture data involved in the embodiments of this application are all legally collected through electronic devices with the user's knowledge and authorization. The relevant data are only used locally for status detection and parameter updates, and are not used for identity recognition or sharing with third parties.

[0183] It should also be noted that the device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. In addition, in the device embodiment drawings provided in this application, the connection relationship between modules indicates that they have a communication connection, which can be implemented as one or more communication buses or signal lines.

[0184] Through the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general-purpose hardware, or it can be implemented by special-purpose hardware including application-specific integrated circuits, special-purpose CPUs, special-purpose memory, special-purpose components, etc. Generally, any function performed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structure used to implement the same function can also be diverse, such as analog circuits, digital circuits, or special-purpose circuits. However, for this application, software program implementation is more often the preferred implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a readable storage medium, such as a computer floppy disk, USB flash drive, mobile hard disk, ROM, RAM, magnetic disk, or optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, training equipment, or network device, etc.) to execute the methods described in the various embodiments of this application.

[0185] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product.

[0186] The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training device or data center that integrates one or more available media. The available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., solid-state drives (SSDs)).

Claims

1. A processing method, comprising: Obtain image sequences; Detect multiple image frames in the image sequence to determine human pose data; Based on the human posture data, determine whether the target object is in the target state; In response to the target object being in the target state, a control command associated with the working mode corresponding to the target state is triggered.

2. The processing method according to claim 1, wherein detecting multiple image frames in the image sequence to determine human pose data includes: A convolutional neural network is invoked to segment the image frame, resulting in a mask image of the image frame. Each pixel in the mask image is marked with a different pixel value to indicate the visible human body area, the human body occlusion area, and the background. Based on the mask image and the image frame, determine the key point location information of the visible human body area and the human body outline of the occluded human body area; The key point positioning information includes the coordinates of at least one key point on the human body; the human posture data includes key point positioning information and human body contour.

3. The processing method according to claim 2, wherein determining whether the target object is in a target state based on the human posture data includes: Obtain the first inter-frame distance between corresponding key points between the adjacent preceding image frames of the image frame and the image frame; Obtain the second inter-frame distance between the human body contour between the adjacent preceding image frames and the image frames; In response to the fact that both the first inter-frame distance and the second inter-frame distance satisfy a preset state condition in the inter-frame detection of a continuous number of targets, the target object is determined to be in a target state.

4. The processing method according to claim 2, wherein determining the key point location information of the visible human body region and the human body contour of the occluded human body region based on the mask image and the image frame includes: Based on pixel values, the ratio of the number of pixels in the visible human body region to the total number of pixels in the mask image is calculated. If the ratio is not less than the detection accuracy threshold, the key point location information of the visible human body region and the human body contour of the human body occlusion region are determined based on the mask image and the image frame.

5. The processing method according to claim 2 or 4, determining the key point localization information of the visible human body region based on the mask image and the image frame, including: The visible human body region in the image frame is defined by the mask image to obtain a visible human body image; The human pose recognition model is invoked to obtain the key point localization information based on the visible human image.

6. The processing method according to claim 2 or 4, determining the human body contour of the occluded area based on the mask image and the image frame, comprising: The human body occlusion area in the image frame is defined by the mask image to obtain a human body occlusion image; The human body contour is obtained by adaptive filtering of the grayscale image of the occluded human body.

7. The processing method according to claim 1, wherein obtaining the image sequence includes: Detect the current light intensity in the current environment; In response to the current light intensity being less than the target light intensity threshold, a first type of image frame acquired by the first image sensor is obtained; In response to the current light intensity being not less than the target light intensity threshold, a second type of image frame acquired by the second image sensor is obtained; The image sequence includes a first type of image frames and / or a second type of image frames arranged in chronological order.

8. The processing method according to claim 1 or 7, wherein obtaining the image sequence comprises: In response to the arrival of a detection timing, the image sequence is acquired, wherein the detection timing is determined based on the target state features of the target object being photographed.

9. The processing method according to claim 1, wherein the control command includes: At least one of the following: a first control instruction for controlling an electronic device to enter the operating mode corresponding to the target state, or a second control instruction for controlling an associated device of the electronic device to enter the operating mode corresponding to the target state.

10. An electronic device, comprising: The system includes an image acquisition module, a pose detection module, a state recognition module, and a mode control module, among which: The image acquisition module is used to acquire image sequences; The pose detection module is used to detect multiple image frames in the image sequence to determine human pose data; The state recognition module is used to determine whether the target object is in the target state based on the human posture data; The mode control module is used to trigger control commands associated with the working mode corresponding to the target state in response to the target object being in the target state.