Image recognition-based method and device for identifying mild cognitive impairment of the elderly

By using image recognition-based methods to acquire image sequences of elderly people, perform object recognition and image extraction, and generate object description information, the problem of difficulty in timely identification of mild cognitive impairment in existing technologies is solved, and effective identification and risk reduction of mild cognitive impairment are achieved.

CN120998468BActive Publication Date: 2026-06-26BEIJING HUIZHI GONGCHUANG TECH DEV CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING HUIZHI GONGCHUANG TECH DEV CO LTD
Filing Date
2025-08-20
Publication Date
2026-06-26

Smart Images

  • Figure CN120998468B_ABST
    Figure CN120998468B_ABST
Patent Text Reader

Abstract

Embodiments of the present disclosure disclose an image recognition-based mild cognitive impairment recognition method and device for the elderly. A specific embodiment of the method comprises: acquiring an initial image sequence for a to-be-recognized object; performing object recognition on an initial image in the initial image sequence to obtain a target position sequence; performing image extraction on the initial image sequence according to the target position sequence to obtain a target image sequence; generating an object description information set corresponding to the to-be-recognized object according to a pre-trained object description information generation model and the target image sequence; generating mild cognitive impairment evaluation information for the to-be-recognized object according to the object description information set; and synchronizing the mild cognitive impairment evaluation information to a smart elderly care platform. The embodiment realizes effective recognition of changes in mild cognitive impairment of a patient.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The embodiments disclosed herein relate to the field of computer technology and the field of mild cognitive impairment assessment, specifically to a method and apparatus for identifying mild cognitive impairment in the elderly based on image recognition. Background Technology

[0002] Mild cognitive impairment (MCI) refers to a decline in memory or other cognitive functions that does not affect basic daily life. It falls between a normal state and dementia. When patients experience mild cognitive impairment, they exhibit a decline in one or more cognitive functions without obvious signs of dementia. Incomplete statistics show a high prevalence of mild cognitive impairment in the elderly population, and it also carries a risk of developing Alzheimer's disease and other conditions. Currently, mild cognitive impairment is primarily diagnosed through medical means.

[0003] However, the onset of mild cognitive impairment is characterized by a long period of time. The conventional medical diagnostic method of regular follow-up diagnosis is not very effective in identifying mild cognitive impairment during the window period, making it difficult to detect changes in patients' mild cognitive impairment in a timely manner, thereby increasing the risk of developing Alzheimer's disease and other diseases. Summary of the Invention

[0004] The summary portion of this disclosure is intended to provide a brief overview of the concepts, which will be described in detail in the detailed description portion. This summary portion is not intended to identify key or essential features of the claimed technical solutions, nor is it intended to limit the scope of the claimed technical solutions.

[0005] Some embodiments of this disclosure propose a method and apparatus for identifying mild cognitive impairment in the elderly based on image recognition, in order to solve the technical problems mentioned in the background section above.

[0006] In a first aspect, some embodiments of this disclosure provide a method for identifying mild cognitive impairment in the elderly based on image recognition, applied to a smart elderly care platform. The method includes: acquiring an initial image sequence for an object to be identified, wherein the object to be identified is an object pre-marked as requiring mild cognitive impairment identification and whose corresponding age is greater than a preset age; performing object recognition on the initial images in the initial image sequence to obtain a target position sequence, wherein the target position represents the sequence position of the initial image containing the object to be identified within the initial image sequence; performing image extraction on the initial image sequence based on the target position sequence to obtain a target image sequence, wherein the target images correspond to target positions; generating an object description information set corresponding to the object to be identified based on a pre-trained object description information generation model and the target image sequence, wherein the object description information includes: predicted action type, identified action type, and emotion description information, wherein the emotion description information represents the emotional changes of the object to be identified; generating mild cognitive impairment evaluation information for the object to be identified based on the object description information set; and synchronizing the mild cognitive impairment evaluation information to the smart elderly care platform.

[0007] Secondly, some embodiments of this disclosure provide an image recognition-based device for identifying mild cognitive impairment in the elderly. The device includes: an acquisition unit configured to acquire an initial image sequence for an object to be identified, wherein the object to be identified is an object pre-marked as requiring mild cognitive impairment identification and whose corresponding age is greater than a preset age; an object recognition unit configured to perform object recognition on the initial images in the initial image sequence to obtain a target position sequence, wherein the target position represents the sequence position of the initial image containing the object to be identified in the initial image sequence; and an image extraction unit configured to extract images from the initial image sequence according to the target position sequence. Extraction yields a target image sequence, where each target image corresponds to a target location; a first generation unit is configured to generate an object description information set corresponding to the object to be identified based on a pre-trained object description information generation model and the aforementioned target image sequence, wherein the object description information includes: predicted action type, identified action type, and emotion description information, wherein the emotion description information characterizes the emotional changes of the object to be identified; a second generation unit is configured to generate mild cognitive impairment evaluation information for the object to be identified based on the aforementioned object description information set; a synchronization unit is configured to synchronize the aforementioned mild cognitive impairment evaluation information to the aforementioned smart elderly care platform.

[0008] Thirdly, some embodiments of this disclosure provide an electronic device, including: one or more processors; and a storage device having one or more programs stored thereon, wherein when the one or more programs are executed by the one or more processors, the one or more processors implement the method described in any implementation of the first aspect above.

[0009] Fourthly, some embodiments of this disclosure provide a computer-readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect above.

[0010] The above-described embodiments of this disclosure have the following beneficial effects: The image recognition-based method for identifying mild cognitive impairment in the elderly, as described in some embodiments of this disclosure, effectively identifies changes in mild cognitive impairment in patients. Specifically, the reason for the inability to identify it in a timely manner is that the onset of mild cognitive impairment is characterized by a long period. Conventional medical diagnostic methods involving regular follow-up diagnoses are ineffective in identifying mild cognitive impairment within the window period, making it difficult to detect changes in the patient's mild cognitive impairment in a timely manner, thereby increasing the risk of developing Alzheimer's disease, etc. For example, assuming a patient needs to undergo medical diagnosis at time points T1 and T2, if mild cognitive impairment changes occur within the time window formed by T1 and T2, it is difficult to detect in a timely manner, especially as the window period increases, further increasing the risk of deterioration. Based on this, the image recognition-based method for identifying mild cognitive impairment in the elderly, as described in some embodiments of this disclosure, firstly acquires an initial image sequence for the object to be identified, wherein the object to be identified is a pre-marked object whose age is greater than a preset age and is intended for mild cognitive impairment identification. Secondly, object recognition is performed on the initial images in the aforementioned initial image sequence to obtain a target location sequence, where the target location represents the sequence position of the initial image containing the object to be identified within the initial image sequence. Next, based on the target location sequence, image extraction is performed on the initial image sequence to obtain a target image sequence, where each target image corresponds to a target location. In practice, the object to be identified, as the subject of identification for mild cognitive impairment, is not static; therefore, there may be situations where the object to be identified is not present in the image. Since the onset of mild cognitive impairment is characterized by a long period, a large number of images need to be processed for identification. To avoid unnecessary computational resource consumption, object recognition is used to determine the target location, thus filtering out invalid image frames. Furthermore, based on a pre-trained object description information generation model and the aforementioned target image sequence, a set of object description information corresponding to the object to be identified is generated. This object description information includes: predicted action type, identified action type, and emotion description information, where the emotion description information represents the emotional changes of the object to be identified. In practice, patients with mild cognitive impairment often experience a decline in one or more cognitive functions. Therefore, this disclosure generates a set of object description information (which changes over time) for the identified individual, based on both behavioral and emotional dimensions. Next, based on this object description information set, mild cognitive impairment assessment information for the identified individual is generated. This automatically generates a mild cognitive impairment assessment for the identified individual from both behavioral and emotional dimensions. Finally, the mild cognitive impairment assessment information is synchronized to the aforementioned smart elderly care platform. This method enables the effective identification of changes in mild cognitive impairment in patients. Attached Figure Description

[0011] The above and other features, advantages, and aspects of the embodiments of this disclosure will become more apparent from the accompanying drawings and the following detailed description. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic, and elements are not necessarily drawn to scale.

[0012] Figure 1 This is a flowchart of some embodiments of the image recognition-based method for identifying mild cognitive impairment in the elderly according to the present disclosure;

[0013] Figure 2 This is a schematic diagram illustrating the process of generating object description information;

[0014] Figure 3 This is a schematic diagram of the structure of some embodiments of the image recognition-based device for identifying mild cognitive impairment in the elderly according to the present disclosure;

[0015] Figure 4 This is a schematic diagram of the structure of an electronic device suitable for implementing some embodiments of the present disclosure. Detailed Implementation

[0016] Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.

[0017] It should also be noted that, for ease of description, only the parts relevant to the invention are shown in the accompanying drawings. Unless otherwise specified, the embodiments and features described in this disclosure can be combined with each other.

[0018] It should be noted that the concepts of "first" and "second" mentioned in this disclosure are used only to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or their interdependencies.

[0019] It should be noted that the terms "a" and "a plurality of" used in this disclosure are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".

[0020] The names of messages or information exchanged between multiple devices in the embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

[0021] The collection, storage, and use of the initial image sequences of the individuals to be identified as described in this disclosure have been carried out by the relevant organizations or individuals in accordance with obligations including, but not limited to, conducting personal information security impact assessments, informing the personal information subjects, and obtaining their prior authorization and consent before execution. In particular, since the individuals to be identified may have one or more cognitive functions that are impaired, the relevant organizations or individuals have also fulfilled the aforementioned obligations to their legal guardians.

[0022] This disclosure will now be described in detail with reference to the accompanying drawings and embodiments.

[0023] refer to Figure 1 The diagram illustrates a flow 100 of some embodiments of an image recognition-based method for identifying mild cognitive impairment in older adults according to the present disclosure. This image recognition-based method for identifying mild cognitive impairment in older adults includes the following steps:

[0024] Step 101: Obtain the initial image sequence for the object to be identified.

[0025] In some embodiments, the execution subject (e.g., a computing device) of the image recognition-based method for identifying mild cognitive impairment in the elderly can acquire an initial image sequence of the object to be identified via a wired or wireless connection.

[0026] The subjects to be identified are those pre-marked as needing identification for mild cognitive impairment, whose corresponding age is greater than a preset age. Specifically, the preset age can be set based on statistically obtained incidence rates of mild cognitive impairment in different age groups. For example, if a survey shows that the incidence rate of mild cognitive impairment in the elderly is 25%–56%, then the preset age could be set to 50 years old. Correspondingly, to improve the coverage of mild cognitive impairment identification, the preset age can also be lowered.

[0027] The initial image sequence can be acquired by using a camera positioned within the activity area of ​​the object to be identified.

[0028] Example 1: Taking an elderly person living alone as the subject to be identified, multiple cameras can be set up in their residence to collect the initial image sequence mentioned above.

[0029] Example 2, taking a non-elderly person not living alone as an example, multiple cameras can be set up within their activity area (e.g., residence) to collect the initial image sequence mentioned above. In particular, since the activity area includes not only the person to be identified, to ensure the effectiveness and targeting of the data collection, the person to be identified can wear a positioning device (e.g., a positioning bracelet). This positioning device embeds an Ultra Wide Band (UWB) transmitter. The camera can have at least two UWB receivers. The UWB receivers and the UWB transmitter can exchange signals. The time difference between the signals received by the two UWB receivers is used to locate the person to be identified, thereby controlling the camera to perform operations including but not limited to rotation and zoom, to ensure that the initial image contains only the person to be identified, and that when the person to be identified appears within the camera's acquisition range, the person to be identified appears in the frame (in the initial image) as much as possible.

[0030] In practice, for example, the aforementioned computing device can be deployed within the activity area corresponding to the object to be identified. In this case, the computing device can acquire an initial image sequence via a wired or wireless connection with the camera, achieving local data processing by deploying it within the activity area. Alternatively, the computing device can be deployed remotely, in which case it can acquire an initial image sequence via a wireless connection with the camera. Crucially, during the transmission of the initial image sequence, the initial images need to undergo appropriate encryption to ensure data transmission security.

[0031] It should be noted that the aforementioned wireless connection methods may include, but are not limited to, 3G / 4G / 5G connections, WiFi connections, Bluetooth connections, WiMAX connections, Zigbee connections, UWB (Ultra Wide Band) connections, and other currently known or future wireless connection methods.

[0032] It should be noted that the aforementioned computing devices can be either hardware or software. When the computing device is hardware, it can be implemented as a distributed cluster consisting of multiple servers or terminal devices, or as a single server or a single terminal device. When the computing device is software, it can be installed on the hardware devices listed above. It can be implemented as, for example, multiple software programs or software modules used to provide distributed services, or as a single software program or software module. No specific limitations are made here.

[0033] Step 102: Perform object recognition on the initial images in the initial image sequence to obtain the target location sequence.

[0034] In some embodiments, the aforementioned execution entity may perform object recognition on the initial image in the initial image sequence to obtain the target location sequence.

[0035] The target location represents the sequence position of the initial image containing the object to be identified in the initial image sequence.

[0036] In practice, object detection models such as Tiny-YOLO (You Only Look Once) can be used to identify objects in the initial image sequence to obtain the target location sequence.

[0037] In some optional implementations of some embodiments, the execution entity performs object recognition on the initial image in the initial image sequence to obtain a target location sequence, including:

[0038] Step S1: Determine whether there is a change in the camera angle of the target camera within the target time period.

[0039] The target camera is the camera that acquires the initial image sequence, and the target time period is the acquisition time period corresponding to the initial image sequence. The target camera may be equipped with an IMU (Inertial Measurement Unit) module, thus allowing for monitoring of changes in the target camera's angle.

[0040] In practice, firstly, when the target camera changes angle, a record consisting of the angle change value monitored by the IMU module and the corresponding timestamp can be generated and stored. Then, by comparing the angle change values ​​contained in the records with timestamps within the target time period, it can be determined whether the target camera angle changed within the target time period.

[0041] Step S2: In response to the absence of a camera angle change p, acquire the target background image.

[0042] The aforementioned target background image is a static background image taken at the current camera angle corresponding to the aforementioned target camera.

[0043] In practice, during the initial calibration process, the target camera captures and caches static background images from different angles. Specifically, to reduce the number of cached images, assuming the target camera's capture range is α, firstly, if the target camera captures an image range of β at a fixed angle, then (α / β) static background images can be captured. When α / β is not an integer, it can be rounded up. Then, these (α / β) static background images are stitched together to obtain the overall static background image for the capture range α. Next, based on the current camera angle corresponding to the target camera, the target static background image corresponding to the current camera angle is cropped from the overall background image.

[0044] Step S3: Determine the image difference between each initial image in the above initial image sequence and the above target background image to generate a difference image and obtain a difference image sequence.

[0045] In practice, the difference image = initial image - target background image. Specifically, since both the initial image and the target background image are captured by the target camera, their image specifications are identical. For each pixel in the initial image, the image difference is determined using the following formula:

[0046]

[0047] Where i represents the horizontal axis index, j represents the vertical axis index, and P s P represents the pixel value corresponding to a pixel in the initial image. t P represents the pixel value corresponding to a pixel in the target background image. d This represents the pixel value corresponding to a pixel in the difference image. This represents the pixel value corresponding to the i-th row and j-th column pixel in the initial image. This represents the pixel value corresponding to the i-th row and j-th column pixel in the target background image. This represents the pixel value corresponding to the i-th row and j-th column pixel in the difference image. This represents the weighting coefficient. The value range is [0,1]. Specifically, the weighting coefficient... The default value can be 0.9, and σ represents the noise threshold. Since the pixel value range is [0, 255], the default value of the noise threshold σ can be 5.

[0048] Step S4: Select the differential images that meet the selection criteria from the above differential image sequence as candidate images to obtain the candidate image sequence.

[0049] The filtering condition is that the number of pixels with non-zero pixel values ​​in the difference image is greater than a preset number. Specifically, by setting a preset number, the initial screening for live objects in the initial image is achieved. In particular, assuming the image size of the initial image is H×W, the preset number can be set to (H×W) / 4.

[0050] Step S5: For each candidate image in the above candidate image sequence, perform the following processing steps:

[0051] Step S51: Determine whether the above candidate images contain non-background objects.

[0052] Since the purpose of step S51 is to determine whether the candidate image contains a person, methods such as facial recognition and human body recognition can be used to determine whether the candidate image contains a non-background object. Specifically, a model such as FaceNet can be used for facial recognition of the candidate image. Alternatively, YOLO-v5 (You Only Look Once-version 5) can be used for human body recognition of the candidate image.

[0053] Step S52: In response to the presence of a non-background object in the candidate image, perform object feature extraction on the candidate image to obtain the target object features.

[0054] In practice, for example, when using the FaceNet model for facial recognition of candidate images, if a non-background object is detected in the candidate image, multiple linearly connected fully connected layers can be added after the FaceNet model to map the facial features extracted by the FaceNet model into a one-dimensional feature vector, which serves as the target object feature. Similarly, when using the Tiny-YOLO model for human recognition of candidate images, if a non-background object is detected in the candidate image, multiple linearly connected fully connected layers can be added after the Tiny-YOLO model to map the facial features extracted by the Tiny-YOLO model into a one-dimensional feature vector, which serves as the target object feature. In this disclosure, the Tiny-YOLO model is preferred for constructing target object features. The reason is that as a person moves, the face may not be included in the candidate image, while the body occupies a larger proportion of the entire human body; therefore, the success rate of recognizing and extracting target object features is higher.

[0055] Step S53: Perform object feature matching between the above target object features and the pre-stored object features corresponding to the above object to be identified to obtain the object feature matching degree.

[0056] Specifically, object features corresponding to the object to be identified can be collected and stored in advance. In particular, the object features corresponding to the object to be identified are also one-dimensional feature vectors.

[0057] In practice, the feature similarity between the target object features and the pre-stored object features corresponding to the above-mentioned objects to be identified can be determined by calculating the cosine similarity, which is used as the object feature matching degree.

[0058] Step S54: In response to the object feature matching degree being greater than the preset matching degree threshold, the image position of the above candidate image in the above initial image sequence is determined as the target position.

[0059] Optionally, the target camera may have at least one camera angle change within the target time period. Therefore, the initial image sequence can be divided into multiple initial image groups for the time interval of each camera angle change in at least one camera angle change. Each initial image group corresponds to one camera angle change, and the above steps S2 to S5 are repeated.

[0060] Step 103: Based on the target location sequence, extract images from the initial image sequence to obtain the target image sequence.

[0061] In some embodiments, the aforementioned execution entity can extract images from the initial image sequence based on the target location sequence to obtain the target image sequence.

[0062] In this context, the target image corresponds to the target location. Since the target location represents the sequence position of the initial image containing the object to be identified in the initial image sequence, the initial image of the initial image sequence corresponding to the target location can be used as the target image to obtain the target image sequence.

[0063] As an example, please refer to the following pseudocode:

[0064] Target_Img_List=[]

[0065] for i in range(len(Target_Position_List)):

[0066] Target_Img_List.append(Initial_Img_List[Target_Position_List[i]])

[0067] Here, "Target_Img_List" represents the target image sequence, "Target_Position_List" represents the target position sequence, and "Initial_Img_List" represents the initial image sequence.

[0068] Step 104: Generate a set of object description information corresponding to the object to be identified based on the pre-trained object description information generation model and the target image sequence.

[0069] In some embodiments, the aforementioned execution entity can generate a model and a target image sequence based on pre-trained object description information to generate a set of object description information corresponding to the object to be identified.

[0070] The object description information includes: predicted action type, recognized action type, and emotion description information. The recognized action type characterizes the predicted action type of the object to be identified within the target image of the current frame. The predicted action type characterizes the predicted action type of the object to be identified within the target image. The emotion description information characterizes the emotional changes of the object to be identified. The object description information generation model is a machine learning model used to generate object description information.

[0071] As an example, the target image sequence may include: target image I1, target image I2, target image I3, target image I4, and target image I5. For target image I4, firstly, the object description information generation model can perform action type recognition and emotion prediction on target image I4 to obtain the recognized action type and emotion description information included in the object description information corresponding to target image I4. Then, based on target image I1, target image I2, target image I3, and the object description information generation model, the predicted action type included in the object description information corresponding to target image I5 is generated.

[0072] Optionally, the object description information generation model includes: an action type recognition network, an action type prediction network, and an emotion recognition network. The action type recognition network is used to identify the action type of the object to be identified within a single target image. The action type prediction network is used to predict the action type of the object to be identified in the next frame of the target image.

[0073] As an example, see Figure 2 The diagram illustrates the process of generating object description information. The target image sequence may include: target image I1, target image I2, target image I3, target image I4, and target image I5. Taking target image I5 as an example, firstly, target image I5 can be input into the action type recognition network 203 of the object description information generation model 201 to obtain the recognized action type included in the object description information corresponding to target image I5. Then, target image I5 is input into the emotion recognition network 204 to obtain the emotion description information included in the object description information corresponding to target image I5. Next, at least one target image preceding target image I5 (target image I1, target image I4, target image I3, target image I5) is input into the action type prediction network 202 to obtain the predicted action type included in the object description information corresponding to target image I5.

[0074] In some optional implementations of certain embodiments, the execution entity generates a set of object description information corresponding to the object to be identified based on a pre-trained object description information generation model and the target image sequence, including:

[0075] For each target image in the above target image sequence, perform the following recognition steps:

[0076] Step S1: Using an action type recognition network, perform action type recognition on the target image to obtain the recognized action type included in the object description information corresponding to the target image.

[0077] The action type recognition network comprises three sub-action type recognition networks: A1, A2, and A3. All three networks use the same "Encoder-Decoder" structure. The difference lies in the encoder: A1 has 5 Transformer Blocks; A2 has 9; and A3 has 13. Each Transformer Block consists of an LN (Layer Normalization) layer, an MHSA (Multi-Head Self-Attention) layer, another LN layer, and an FFN (Feed Forward Network). For each Transformer Block, features are input to the MHSA layer after passing through the first LN layer. The output of the first LN layer is then superimposed on the output of the MHSA layer, serving as the input to the second LN layer. The output of the second LN layer is then the input to the FFN network. Finally, the output of the first LN layer is superimposed with the output of the MHSA layer, and the output of the FNN network is superimposed to obtain the output of the TransformerBlock. Specifically, the 5 TransformerBlocks in the Encoder part of sub-action type recognition network A1 are identical to the first 5 Transformer Blocks in the 9 Transformer Blocks in the Encoder part of sub-action type recognition network A2. The 9 Transformer Blocks in the Encoder part of sub-action type recognition network A2 are identical to the first 9 Transformer Blocks in the 13 Transformer Blocks in the Encoder part of sub-action type recognition network A3. The Decoder parts of sub-action type recognition networks A1, A2, and A3 all consist of a Deconv layer, a BN layer, a ReLU activation function, a Deconv layer, a BN (Batch Normalization) layer, and a ReLU activation function.

[0078] Specifically, firstly, the target image is processed using Patch Embedding. Then, the Patch Embedded image patch set is input into the sub-action type recognition network A1 to generate action type P1. When the confidence level of action type P1 is less than a confidence threshold, the features extracted from the Encoder part of sub-action type recognition network A1 are used as input to the 6th Transformer Block in sub-action type recognition network A2, and then passed through sub-action type recognition network A2 to generate action type P2. When the confidence level of action type P2 is less than a confidence threshold, the features extracted from the Encoder part of sub-action type recognition network A2 are used as input to the 10th Transformer Block in sub-action type recognition network A2, and then passed through sub-action type recognition network A3 to generate action type P3. Action type P3 is then used as the action type corresponding to the target image. When the confidence level of identified action type P1 is greater than the confidence threshold, identified action type P1 is taken as the identified action type corresponding to the target image, and action type identification is no longer performed through sub-action type identification networks A2 and A3. When the confidence level of identified action type P2 is greater than the confidence threshold, identified action type P2 is taken as the identified action type corresponding to the target image, and action type identification is no longer performed through sub-action type identification network A3.

[0079] In practice, due to the varying abundance of information features contained in images, the conventional approach using a fixed network structure suffers from redundant feature processing. That is, while high-accuracy action type recognition can be achieved with a few shallow feature processing layers, the fixed network structure necessitates redundant feature processing. Therefore, this disclosure proposes a method that establishes three sub-action type recognition networks with varying structural complexities, sequentially determining the action type according to their complexity. This approach reduces the amount of feature processing and improves recognition speed. Furthermore, to avoid the problem of repeated feature extraction when the confidence level of the previous sub-action type recognition network is low, requiring subsequent sub-action type recognition networks to perform the recognition, the three sub-action type recognition networks have overlapping architectures to ensure direct feature input. This method achieves rapid action type determination with low complexity.

[0080] Step S2: Based on the above emotion recognition network, perform object emotion recognition on the above target image to obtain the emotion label corresponding to the above target image.

[0081] The emotion recognition network uses FaceNet as its backbone and connects it to a Fully Connected Layer (FC) as an emotion label classifier.

[0082] In practice, since the action type recognition network has already extracted key points on the face during the action type determination process, the region enclosed by these key points can first be designated as the region of interest (ROI). Then, the pixel values ​​of pixels outside the ROI in the target image are set to 0, resulting in an updated target image. Finally, the updated target image is input into the emotion recognition network to obtain the corresponding emotion label. This approach reduces data processing volume while allowing the ROI located by the action type recognition network to be directly reused during facial recognition.

[0083] Step S3: In response to the first target image in the target image-non-target image sequence, generate the predicted action type of the object description information corresponding to the target image based on the action type prediction network and at least one target image in the target image sequence that is located before the target image.

[0084] The action type prediction network uses MoveNet as its backbone network.

[0085] In particular, the action type recognition network generates images containing skeletal anchor points during the action type determination process. Therefore, after processing each target image in the target image sequence, a sequence of skeletal anchor point images is obtained. Thus, at least one skeletal anchor point image corresponding to at least one target image preceding the target image can be input into the action type prediction network to obtain the predicted action type included in the object description information corresponding to the target image. This method avoids redetermining the skeletal anchor points, thereby reducing the amount of data processing.

[0086] Step S4: In response to the first target image in the target image-non-target image sequence, based on the action type prediction network and the emotion label corresponding to at least one target image in the target image sequence preceding the target image and the emotion label corresponding to the target image, generate the emotion description information included in the object description information corresponding to the target image.

[0087] The emotion description information can be stored in sequence to store the emotion tags corresponding to each target image preceding the target image.

[0088] Step S5: In response to the target image being the first target image in the target image sequence, the recognition action type corresponding to the target image is determined to be the predicted action type included in the object description information corresponding to the target image, and the current emotion label is determined to be the emotion description information included in the object description information corresponding to the target image.

[0089] Step 105: Generate mild cognitive impairment assessment information for the object to be identified based on the object description information set.

[0090] In some embodiments, the aforementioned executing entity may generate mild cognitive impairment assessment information for the object to be identified based on the object description information set.

[0091] When the subject to be identified has mild cognitive impairment, they may experience problems such as motor amnesia and abnormal emotional changes due to a decline in one or more cognitive functions. Therefore, by judging the difference between the recognized motor type and the predicted motor type included in the subject's description information, the degree of motor amnesia of the subject can be measured. At the same time, the emotional changes of the subject can be linearly characterized based on the emotional tags included in the subject's description information.

[0092] In practice, firstly, for each object description including the identified action type and the predicted action type, the difference between the identified and predicted action types can be determined by calculating their similarity. The proportion of object descriptions with a difference greater than a preset threshold is then counted. Secondly, since the emotion tags included in the object description linearly depict the emotional changes of the object to be identified, the frequency of peaks and troughs can be statistically analyzed to quantify the emotional changes of the object. Finally, a decision tree model is used to generate a mild cognitive impairment assessment for the object to be identified. Specifically, the mild cognitive impairment assessment can correspond to multiple assessment levels.

[0093] Step 106: Synchronize the mild cognitive impairment assessment information to the smart elderly care platform.

[0094] In some embodiments, the aforementioned implementing entity may synchronize mild cognitive impairment assessment information to the smart elderly care platform.

[0095] Among them, the smart elderly care platform can be a platform used to monitor the physical condition of users (elderly people) in real time.

[0096] In some optional implementations of some embodiments, the above method further includes:

[0097] Step S1: Determine whether there is a historical user profile corresponding to the above-mentioned object to be identified.

[0098] The smart elderly care platform may include a user profiling module. This module can be used to store data that represents the physical condition of elderly people in the form of user profiles.

[0099] In practice, the identity identifier of the object to be identified can be used to search within the user profile module to determine whether a historical user profile corresponding to the object to be identified exists.

[0100] Step S2: In response to the existence, based on the above mild cognitive impairment assessment information, update the historical user profile corresponding to the above-mentioned object to be identified, and obtain the updated user profile.

[0101] In practice, based on the assessment information of mild cognitive impairment, the historical assessment of mild cognitive impairment in the historical user profile corresponding to the subject to be identified can be updated to obtain the updated user profile.

[0102] Step S3: In response to non-existence, generate an updated user profile based on the above mild cognitive impairment assessment information and the basic object information corresponding to the above object to be identified.

[0103] In practice, when the information is not available, an updated user profile can be constructed by combining the basic information of the target user with the assessment information of mild cognitive impairment. The basic information may include: the user's age, gender, medical history, and medical history.

[0104] Step S4: Based on the updated user profile and the pre-built warning matching rules, determine whether to initiate a mild cognitive impairment warning.

[0105] Among them, the early warning matching rules can be manually set automatic triggering rules based on age, gender, medical history, and mild cognitive impairment assessment information.

[0106] In practice, corresponding rule triggers can be created by combining early warning matching rules, thereby achieving automatic matching between the updated user profile and the pre-built early warning matching rules, and triggering early warnings for mild cognitive impairment.

[0107] Step S5: In response to the initiation of a mild cognitive impairment warning, automatically send a mild cognitive impairment warning to the target terminal.

[0108] The target terminals include: a first target terminal, a second target terminal, and a third target terminal. The first target terminal is bound to the identity of a direct relative of the person to be identified; the second target terminal is bound to the identity of a caregiver / guardian of the person to be identified; and the third target terminal is bound to the identity of a medical practitioner / patient of the person to be identified. The caregiver / guardian can be someone who provides routine care to the person to be identified, such as a nurse. The medical practitioner / patient can be someone who diagnoses or treats the person to be identified, such as a doctor.

[0109] Step S6: Retrieve auxiliary diagnostic suggestion information corresponding to the updated user profile to obtain a set of auxiliary diagnostic suggestion information.

[0110] In practice, diagnostic suggestions made by different doctors for different situations can be uniformly stored in the diagnostic suggestion pool. Therefore, the top K auxiliary diagnostic suggestions that match the updated user profile can be retrieved from the diagnostic suggestion pool as an auxiliary diagnostic suggestion information set.

[0111] Step S7: Synchronize the above-mentioned auxiliary diagnostic suggestion information set to the above-mentioned third target terminal.

[0112] In practice, the data is sent to a third target terminal to assist the diagnostic subject in further determining the mild cognitive impairment of the subject to be identified.

[0113] Step S8: Synchronize the target auxiliary diagnostic suggestion information to the second target terminal.

[0114] Among them, the aforementioned target auxiliary diagnostic suggestion information is auxiliary diagnostic suggestion information selected by the patient from the set of auxiliary diagnostic suggestion information that matches the subject to be identified.

[0115] In practice, by sending the information to a second target terminal, the care recipient can initiate a process for determining and diagnosing mild cognitive impairment based on the target-assisted diagnostic suggestions.

[0116] The above-described embodiments of this disclosure have the following beneficial effects: The image recognition-based method for identifying mild cognitive impairment in the elderly, as described in some embodiments of this disclosure, effectively identifies changes in mild cognitive impairment in patients. Specifically, the reason for the inability to identify it in a timely manner is that the onset of mild cognitive impairment is characterized by a long period. Conventional medical diagnostic methods involving regular follow-up diagnoses are ineffective in identifying mild cognitive impairment within the window period, making it difficult to detect changes in the patient's mild cognitive impairment in a timely manner, thereby increasing the risk of developing Alzheimer's disease, etc. For example, assuming a patient needs to undergo medical diagnosis at time points T1 and T2, if mild cognitive impairment changes occur within the time window formed by T1 and T2, it is difficult to detect in a timely manner, especially as the window period increases, further increasing the risk of deterioration. Based on this, the image recognition-based method for identifying mild cognitive impairment in the elderly, as described in some embodiments of this disclosure, firstly acquires an initial image sequence for the object to be identified, wherein the object to be identified is a pre-marked object whose age is greater than a preset age and is intended for mild cognitive impairment identification. Secondly, object recognition is performed on the initial images in the aforementioned initial image sequence to obtain a target location sequence, where the target location represents the sequence position of the initial image containing the object to be identified within the initial image sequence. Next, based on the target location sequence, image extraction is performed on the initial image sequence to obtain a target image sequence, where each target image corresponds to a target location. In practice, the object to be identified, as the subject of identification for mild cognitive impairment, is not static; therefore, there may be situations where the object to be identified is not present in the image. Since the onset of mild cognitive impairment is characterized by a long period, a large number of images need to be processed for identification. To avoid unnecessary computational resource consumption, object recognition is used to determine the target location, thus filtering out invalid image frames. Furthermore, based on a pre-trained object description information generation model and the aforementioned target image sequence, a set of object description information corresponding to the object to be identified is generated. This object description information includes: predicted action type, identified action type, and emotion description information, where the emotion description information represents the emotional changes of the object to be identified. In practice, patients with mild cognitive impairment often experience a decline in one or more cognitive functions. Therefore, this disclosure generates a set of object description information (which changes over time) for the identified individual, based on both behavioral and emotional dimensions. Next, based on this object description information set, mild cognitive impairment assessment information for the identified individual is generated. This automatically generates a mild cognitive impairment assessment for the identified individual from both behavioral and emotional dimensions. Finally, the mild cognitive impairment assessment information is synchronized to the aforementioned smart elderly care platform. This method enables the effective identification of changes in mild cognitive impairment in patients.

[0117] Further reference Figure 3 As an implementation of the methods shown in the above figures, this disclosure provides some embodiments of an image recognition-based device for identifying mild cognitive impairment in the elderly. These device embodiments are similar to... Figure 1 Corresponding to the method embodiments shown, this image recognition-based device for recognizing mild cognitive impairment in the elderly can be specifically applied to various electronic devices.

[0118] like Figure 3 As shown, an image recognition-based device 300 for identifying mild cognitive impairment in the elderly, according to some embodiments, includes: an acquisition unit 301, an object recognition unit 303, an image extraction unit 303, a first generation unit 304, a second generation unit 305, and a synchronization unit 306. Among them,

[0119] The acquisition unit 301 is configured to acquire an initial image sequence for an object to be identified, wherein the object to be identified is an object pre-marked as being to be identified for mild cognitive impairment and whose corresponding age is greater than a preset age; the object recognition unit 302 is configured to perform object recognition on the initial images in the initial image sequence to obtain a target position sequence, wherein the target position represents the sequence position of the initial image containing the object to be identified in the initial image sequence; the image extraction unit 303 is configured to perform image extraction on the initial image sequence according to the target position sequence to obtain a target image sequence, wherein the target image and the target position are... The system is configured to: 1) generate a set of object description information corresponding to the object to be identified, based on a pre-trained object description information generation model and the target image sequence; 2) generate a set of object description information corresponding to the object to be identified, wherein the object description information includes: predicted action type, identified action type, and emotion description information, wherein the emotion description information represents the emotion changes of the object to be identified; 3) generate a set of object description information corresponding to the object to be identified, based on the set of object description information; and 4) synchronize the set of object description information to the smart elderly care platform.

[0120] It is understandable that the units described in this image recognition-based device 300 for recognizing mild cognitive impairment in the elderly are similar to those in the reference device. Figure 1 The steps in the described method correspond to each other. Therefore, the operations, features, and beneficial effects described above for the method also apply to the image recognition-based mild cognitive impairment identification device 300 for the elderly and the units contained therein, and will not be repeated here.

[0121] The following is for reference. Figure 4 It illustrates a schematic diagram of the structure of an electronic device (e.g., a computing device) suitable for implementing some embodiments of the present disclosure. Figure 4The electronic device shown is merely an example and should not be construed as limiting the functionality or scope of the embodiments of this disclosure. Figure 4 As shown, the computer device includes a processor, memory, and a network interface connected via a system bus. The memory may include a non-volatile storage medium and internal memory. The non-volatile storage medium may store an operating system and a computer program. The computer program includes program instructions that, when executed, cause the processor to perform any of the methods described above. The processor provides computational and control capabilities to support the operation of the entire computer device. The internal memory provides an environment for the execution of the computer program in the non-volatile storage medium; when executed by the processor, the computer program causes the processor to perform any of the methods described above. The network interface is used for network communication, such as sending assigned tasks. Those skilled in the art will understand that... Figure 4 The structure shown is merely a block diagram of a portion of the structure related to the present disclosure and does not constitute a limitation on the computer device to which the present disclosure is applied. A specific computer device may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0122] It should be understood that the processor can be a Central Processing Unit (CPU), but it can also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Among these, a general-purpose processor can be a microprocessor or any conventional processor.

[0123] In one embodiment, the processor is configured to run a computer program stored in a memory to perform the following steps: acquiring an initial image sequence for an object to be identified, wherein the object to be identified is an object pre-labeled as requiring mild cognitive impairment identification and whose corresponding age is greater than a preset age; performing object identification on the initial images in the initial image sequence to obtain a target position sequence, wherein the target position represents the sequence position of the initial image containing the object to be identified in the initial image sequence; performing image extraction on the initial image sequence according to the target position sequence to obtain a target image sequence, wherein the target images correspond to the target positions; generating an object description information set corresponding to the object to be identified according to a pre-trained object description information generation model and the target image sequence, wherein the object description information includes: predicted action type, identified action type, and emotion description information, wherein the emotion description information represents the emotion changes of the object to be identified; generating mild cognitive impairment evaluation information for the object to be identified according to the object description information set; and synchronizing the mild cognitive impairment evaluation information to the smart elderly care platform.

[0124] This disclosure also provides a computer-readable storage medium storing a computer program, the computer program including program instructions, and the method implemented when the program instructions are executed can be referred to the various embodiments of the methods described above.

[0125] The aforementioned computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiments, such as the hard disk or memory of the computer device. Alternatively, the aforementioned computer-readable storage medium may be an external storage device of the computer device, such as a plug-in hard disk, SmartMedia Card (SMC), Secure Digital (SD) card, or Flash Card equipped on the computer device.

[0126] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or system. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or system that includes that element.

[0127] The above description is merely a selection of preferred embodiments of this disclosure and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of the invention involved in the embodiments of this disclosure is not limited to technical solutions formed by specific combinations of the above-described technical features, but should also cover other technical solutions formed by arbitrary combinations of the above-described technical features or their equivalents without departing from the above-described inventive concept. For example, technical solutions formed by substituting the above-described features with (but not limited to) technical features with similar functions disclosed in the embodiments of this disclosure.

Claims

1. A method for identifying mild cognitive impairment in the elderly based on image recognition, applied to a smart elderly care platform, characterized in that, include: Acquire an initial image sequence for the object to be identified, wherein the object to be identified is an object whose age is greater than a preset age and which is pre-labeled as an object to be identified for mild cognitive impairment; Object recognition is performed on the initial images in the initial image sequence to obtain the target location sequence, where the target location represents the sequence position of the initial image containing the object to be identified in the initial image sequence. Based on the target location sequence, image extraction is performed on the initial image sequence to obtain the target image sequence, where the target image corresponds to the target location; Based on a pre-trained object description information generation model, for each target image in the target image sequence, the following recognition steps are performed to generate an object description information set corresponding to the object to be recognized. The object description information includes: predicted action type, recognized action type, and emotion description information. The object description information generation model includes an action type recognition network, an action type prediction network, and an emotion recognition network. The recognition steps include: An action type recognition network is used to identify the target image and obtain the corresponding action type. The action type recognition network includes three sub-action type recognition networks: A1, A2, and A3, with 5, 9, and 13 Transformer Blocks, respectively. Each Transformer Block consists of a first LN layer, an MHSA layer, a second LN layer, and an FFN network. The 5 Transformer Blocks of A1 are consistent with the first 5 Transformer Blocks of A2, and the 9 Transformer Blocks of A2 are consistent with the first 9 Transformer Blocks of A3. After patch embedding processing of the target image, it is input into A1 to generate the recognized action type P1. When the confidence of P1 is less than a threshold, the features extracted by the encoder in A1 are used as the input to the 6th Transformer Block in A2, and A2 is used to generate the recognized action type P2. When the confidence of P2 is less than a threshold, the features extracted by the encoder in A2 are used as the input to the 10th Transformer Block in A3. The input to Block is processed by A3 to generate the recognition action type P3, and P3 is used as the recognition action type corresponding to the target image. When the confidence of P1 is greater than the threshold, P1 is used as the recognition action type corresponding to the target image; when the confidence of P2 is greater than the threshold, P2 is used as the recognition action type corresponding to the target image. During the recognition process, a skeletal anchor point image and a region of interest surrounded by key points contained in the face are generated simultaneously. The pixel values ​​of pixels outside the region of interest in the target image are set to 0 to obtain the updated target image. This updated target image is then input into the emotion recognition network to obtain the emotion label corresponding to the target image. In response to the first target image in the target image-non-target image sequence, a predicted action type is generated based on the action type prediction network and at least one skeletal anchor point image corresponding to at least one target image in the target image sequence that is located before the target image. In response to the first target image in the target image non-target image sequence, emotion description information is generated based on the action type prediction network and the emotion label corresponding to at least one target image in the target image sequence that is located before the target image and the emotion label corresponding to the target image. Based on the set of object description information, generate mild cognitive impairment assessment information for the object to be identified; The assessment information for mild cognitive impairment will be synchronized to the smart elderly care platform.

2. The method according to claim 1, characterized in that, The method further includes: Determine whether a historical user profile exists corresponding to the object to be identified; In response to the existence, the user profile of the historical user profile corresponding to the object to be identified is updated based on the mild cognitive impairment assessment information to obtain the updated user profile; In response to the absence of the object, an updated user profile is generated based on the mild cognitive impairment assessment information and the basic object information corresponding to the object to be identified. Based on the updated user profile and the pre-built warning matching rules, determine whether to initiate a mild cognitive impairment warning; In response to initiating a mild cognitive impairment warning, an automatic mild cognitive impairment warning is sent to target terminals, including: a first target terminal, a second target terminal, and a third target terminal. The first target terminal is a terminal bound to the identity of the immediate family member of the person to be identified, the second target terminal is a terminal bound to the identity of the caregiver of the person to be identified, and the third target terminal is a terminal bound to the identity of the patient receiving medical treatment for the person to be identified.

3. The method according to claim 2, characterized in that, The method further includes: Recall the auxiliary diagnostic suggestion information corresponding to the updated user profile to obtain a set of auxiliary diagnostic suggestion information; Synchronize the set of auxiliary diagnostic suggestion information to the third target terminal; The target auxiliary diagnostic suggestion information is synchronized to the second target terminal, wherein the target auxiliary diagnostic suggestion information is the auxiliary diagnostic suggestion information selected by the treatment subject from the auxiliary diagnostic suggestion information set and matched with the object to be identified.

4. The method according to claim 3, characterized in that, The step of performing object recognition on the initial images in the initial image sequence to obtain the target location sequence includes: Determine whether there is a change in camera angle of the target camera within a target time period, wherein the target camera is the camera that acquires the initial image sequence, and the target time period is the acquisition time period corresponding to the initial image sequence; In response to the absence of camera angle change, a target background image is acquired, wherein the target background image is a static background image at the current camera angle corresponding to the target camera; The image difference between each initial image in the initial image sequence and the target background image is determined to generate a difference image, thus obtaining a difference image sequence; From the difference image sequence, difference images that meet the filtering criteria are selected as candidate images to obtain a candidate image sequence. The filtering criteria are: the number of pixels with non-zero pixel values ​​contained in the difference image is greater than a preset number. For each candidate image in the candidate image sequence, perform the following processing steps: Determine whether the candidate image contains a non-background object; In response to the presence of a non-background object in the candidate image, object features are extracted from the candidate image to obtain the target object features; The target object features are matched with the pre-stored object features corresponding to the object to be identified to obtain the object feature matching degree. In response to an object feature matching degree greater than a preset matching degree threshold, the image position of the candidate image in the initial image sequence is determined as the target position.

5. A device for identifying mild cognitive impairment in the elderly based on image recognition, characterized in that, include: The acquisition unit is configured to acquire an initial image sequence for an object to be identified, wherein the object to be identified is an object that is pre-labeled as being to be identified for mild cognitive impairment and whose corresponding age is greater than a preset age; The object recognition unit is configured to perform object recognition on the initial image in the initial image sequence to obtain a target position sequence, wherein the target position represents the sequence position of the initial image containing the object to be recognized in the initial image sequence. The image extraction unit is configured to extract images from an initial image sequence based on a target location sequence to obtain a target image sequence, wherein the target images correspond to the target locations; The first generation unit is configured to generate a model based on pre-trained object description information, and for each target image in the target image sequence, perform the following recognition steps to generate a set of object description information corresponding to the object to be recognized. The object description information includes: predicted action type, recognized action type, and emotion description information. The object description information generation model includes an action type recognition network, an action type prediction network, and an emotion recognition network. The recognition steps include: An action type recognition network is used to identify the target image and obtain the corresponding action type. The action type recognition network includes three sub-action type recognition networks: A1, A2, and A3, with 5, 9, and 13 Transformer Blocks, respectively. Each Transformer Block consists of a first LN layer, an MHSA layer, a second LN layer, and an FFN network. The 5 Transformer Blocks of A1 are consistent with the first 5 Transformer Blocks of A2, and the 9 Transformer Blocks of A2 are consistent with the first 9 Transformer Blocks of A3. After patch embedding processing of the target image, it is input into A1 to generate the recognized action type P1. When the confidence of P1 is less than a threshold, the features extracted by the encoder in A1 are used as the input to the 6th Transformer Block in A2, and A2 is used to generate the recognized action type P2. When the confidence of P2 is less than a threshold, the features extracted by the encoder in A2 are used as the input to the 10th Transformer Block in A3. The input to Block is processed by A3 to generate the recognition action type P3, and P3 is used as the recognition action type corresponding to the target image. When the confidence of P1 is greater than the threshold, P1 is used as the recognition action type corresponding to the target image; when the confidence of P2 is greater than the threshold, P2 is used as the recognition action type corresponding to the target image. During the recognition process, a skeletal anchor point image and a region of interest surrounded by key points contained in the face are generated simultaneously. The pixel values ​​of pixels outside the region of interest in the target image are set to 0 to obtain the updated target image. This updated target image is then input into the emotion recognition network to obtain the emotion label corresponding to the target image. In response to the first target image in the target image-non-target image sequence, a predicted action type is generated based on the action type prediction network and at least one skeletal anchor point image corresponding to at least one target image in the target image sequence that is located before the target image. In response to the first target image in the target image non-target image sequence, emotion description information is generated based on the action type prediction network and the emotion label corresponding to at least one target image in the target image sequence that is located before the target image and the emotion label corresponding to the target image. The second generation unit is configured to generate mild cognitive impairment assessment information for the object to be identified based on a set of object description information. The synchronization unit is configured to synchronize mild cognitive impairment assessment information to the smart elderly care platform.

6. An electronic device, characterized in that, include: One or more processors; A storage device on which one or more programs are stored; When the one or more programs are executed by the one or more processors, the one or more processors implement the method as described in any one of claims 1 to 4.

7. A computer-readable medium, characterized in that, It stores a computer program thereon, wherein the computer program, when executed by a processor, implements the method as described in any one of claims 1 to 4.