Information processing device, information processing method, and information processing program

The information processing device accurately estimates human behavior by analyzing skeletal features and posture duration, addressing the challenge of low accuracy in existing systems.

JP2026103905APending Publication Date: 2026-06-25KK TOSHIBA

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
KK TOSHIBA
Filing Date
2024-12-13
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing systems struggle to accurately estimate human behavior from captured images.

Method used

An information processing device comprising a posture estimation unit and a behavior estimation unit that analyzes skeletal features and posture duration to accurately determine human behavior.

Benefits of technology

Enables high-accuracy estimation of human behavior by utilizing skeletal estimation techniques and machine learning algorithms to analyze posture and duration, improving detection and warning systems.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026103905000001_ABST
    Figure 2026103905000001_ABST
Patent Text Reader

Abstract

To estimate human behavior with high accuracy. [Solution] The information processing device 10 comprises a posture estimation unit 15C and a behavior estimation unit 15D. The posture estimation unit 15C estimates the posture of person P included in the video. The behavior estimation unit 15D estimates the behavior of person P based on the position of person P, the posture, and the duration t of the posture.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] Embodiments of the present invention relate to an information processing apparatus, an information processing method, and an information processing program.

Background Art

[0002] A system for estimating human behavior is known. For example, a system for estimating human behavior from the arrangement of joint points of a person captured in a photographed image is disclosed.

[0003] However, in the prior art, it has sometimes been difficult to accurately estimate human behavior.

Prior Art Documents

Patent Documents

[0004]

Patent Document 1

Non-Patent Documents

[0005]

Non-Patent Document 1

Non-Patent Document 2

Non-Patent Document 3

Non-Patent Document 4

[0006] The problem that this invention aims to solve is to provide an information processing device, an information processing method, and an information processing program that can estimate a person's behavior with high accuracy. [Means for solving the problem]

[0007] The information processing device of this embodiment comprises a posture estimation unit and a behavior estimation unit. The posture estimation unit estimates the posture of a person included in the video. The behavior estimation unit estimates the behavior of the person based on the person's position, posture, and the duration of the posture. [Brief explanation of the drawing]

[0008] [Figure 1] An explanatory diagram of an example of an information processing system according to an embodiment. [Figure 2A] A schematic diagram of an example of a captured image. [Figure 2B] A schematic diagram of an example of a captured image. [Figure 3A] A schematic diagram of an example of a captured image. [Figure 3B] A schematic diagram of an example of a captured image. [Figure 3C] A schematic diagram of an example of a captured image. [Figure 4A] A schematic diagram of an example of a captured image. [Figure 4B] An explanatory diagram illustrating an example of a warning trigger determination. [Figure 5] A flowchart illustrating an example of an information processing flow. [Figure 6A] An explanatory diagram illustrating an example of an information processing device application. [Figure 6B] Explanatory diagram of an application example of an information processing apparatus. [Figure 6C] Explanatory diagram of an application example of an information processing apparatus. [Figure 7] Hardware configuration diagram of an example of an information processing apparatus 10.

Embodiments for Carrying Out the Invention

[0009] Hereinafter, an information processing apparatus, an information processing method, and an information processing program will be described in detail with reference to the attached drawings.

[0010] FIG. 1 is an explanatory diagram of an example of an information processing system 1 of the present embodiment.

[0011] The information processing system 1 is a system for estimating the behavior of a person P.

[0012] The information processing system 1 includes an information processing apparatus 10 and a photographing apparatus 20. The information processing apparatus 10 and the photographing apparatus 20 are communicably connected via a network NW or the like.

[0013] The photographing apparatus 20 acquires photographed image data by photographing and transmits it to the information processing apparatus 10. The photographing apparatus 20 sequentially transmits the photographed images photographed along the time series to the information processing apparatus 10. That is, the photographing apparatus 20 transmits video data composed of a plurality of photographed image data to the information processing apparatus 10. Hereinafter, the photographed image data will be simply referred to as a photographed image, and the video data will be simply referred to as a video for explanation. The photographed image may also be referred to as a frame.

[0014] The photographing apparatus 20 is arranged so as to be able to photograph a predetermined space in the real space RS. The predetermined real space is the real space RS where the person P, which is the object of behavior estimation, may exist. The predetermined space is, for example, a station platform, a space within a predetermined building, a predetermined outdoor space, etc., but is not limited thereto. The photographing apparatus 20 may be configured to be movable in the real space RS or may be fixedly arranged.

[0015] In this embodiment, a configuration in which the imaging device 20 is fixedly positioned in the real space RS where person P may be present will be described as an example. For this reason, the field of view and its installation position are pre-adjusted so that the imaging device 20 can capture the real space RS where person P may be present. Furthermore, the information processing system 1 may be equipped with multiple imaging devices 20. Figure 1 shows an example configuration in which the information processing system 1 is equipped with one imaging device 20, from the viewpoint of simplifying the explanation.

[0016] The information processing device 10 performs processing such as estimating the behavior of person P. The information processing device 10 is one or more information processing devices. The information processing device 10 is composed of one or more dedicated or general-purpose computers.

[0017] The information processing device 10 comprises a communication unit 11, an input unit 12, an output device 13, a storage unit 14, and a processing unit 15. The communication unit 11, input unit 12, output device 13, storage unit 14, and processing unit 15 are communicated with each other via a bus or the like.

[0018] Furthermore, the information processing device 10 may be configured to include a camera 20. In this case, the camera 20 and the processing unit 15 may be connected to each other via a bus or the like.

[0019] The communication unit 11 communicates with the imaging device 20 and external information processing devices via a network NW or the like. The input unit 12 accepts various operations from the user. The input unit 12 is, for example, an input device such as a touch panel, keyboard, or buttons. The output device 13 outputs various types of information. The output device 13 is, for example, a display that shows various types of information, or a speaker that outputs various types of sound. At least one of the input unit 12 and the output device 13 may be configured to be located outside the information processing device 10 and connected to the processing unit 15 in a manner that allows communication. For example, the output device 13 may be placed in a position that can be confirmed by a person P who is within the imaging range of the imaging device 20 in real space RS.

[0020] The storage unit 14 stores various types of data. The storage unit 14 may be, for example, a semiconductor memory element such as RAM (Random Access Memory) or flash memory, a hard disk, or an optical disc. The storage unit 14 may also be a storage device located outside the information processing device 10. Alternatively, the storage unit 14 may be a storage medium on which programs and various types of information are downloaded and stored or temporarily stored via a LAN (Local Area Network) or the Internet.

[0021] The processing unit 15 performs information processing in the information processing device 10.

[0022] The processing unit 15 includes an acquisition unit 15A, a person detection unit 15B, a posture estimation unit 15C, a behavior estimation unit 15D, a target detection unit 15E, a warning target determination unit 15F, and an output control unit 15G.

[0023] The acquisition unit 15A, the person detection unit 15B, the posture estimation unit 15C, the behavior estimation unit 15D, the target detection unit 15E, the warning target determination unit 15F, and the output control unit 15G are implemented by one or more processors. For example, each of the above units included in the processing unit 15 may be implemented by having a processor such as a CPU (Central Processing Unit) execute a program, i.e., by software. Alternatively, each of the above units included in the processing unit 15 may be implemented by a processor such as a dedicated IC (Integrated Circuit), i.e., by hardware. Each of these units may also be implemented using a combination of software and hardware. When multiple processors are used, each processor may implement one of the above units, or two or more of the above units.

[0024] Alternatively, at least one of the above-mentioned components included in the processing unit 15 may be mounted on an external information processing device, such as a server device, which is connected to the information processing device 10 via a network NW or the like.

[0025] The acquisition unit 15A can acquire video or video acquired from an external source. In this embodiment, an example is described in which the acquisition unit 15A acquires video captured by the shooting device 20.

[0026] In this embodiment, the imaging device 20 sequentially captures images at a predetermined frame rate in a time series and transmits the captured images sequentially to the information processing device 10 in the order they were captured.

[0027] The acquisition unit 15A acquires video footage captured by the camera 20. The acquisition unit 15A acquires video footage by sequentially receiving captured images from the camera 20. Alternatively, the acquisition unit 15A may acquire video footage by reading the video footage captured by the camera 20 and stored in the storage unit 14.

[0028] The person detection unit 15B detects a person P that appears in the video acquired by the acquisition unit 15A and starts tracking person P.

[0029] Figure 2A is a schematic diagram of an example of captured image 30A. Figure 2B is a schematic diagram of an example of captured image 30B. Captured images 30A and 30B are examples of captured images 30, which are frames included in the video captured by the camera 20. A person P is visible in captured image 30.

[0030] Returning to Figure 1, we continue the explanation.

[0031] The person detection unit 15B detects a person P appearing in the video through the following process.

[0032] The person detection unit 15B performs preprocessing on the captured image 30, such as noise reduction and contrast adjustment. Then, the person detection unit 15B extracts features of person P from the preprocessed captured image 30. The features of person P can be, for example, a face, body outline, clothing, etc., and can be predetermined. Based on the extracted features, the person detection unit 15B detects person P in the captured image by performing image analysis on the captured image 30 using a known method.

[0033] The person detection unit 15B may use a machine learning model or a deep learning algorithm to detect a person P that appears in the captured image 30. For example, the person detection unit 15B can use a deep learning algorithm to detect a person P from the captured image 30 by training a neural network to detect only person P using object detection model algorithms such as those described in Non-Patent Documents 1 to 3, thereby pre-generating an algorithm for detecting a person P from the captured image 30 and using it for person P detection.

[0034] Furthermore, as a post-processing step after detecting person P, the person detection unit 15B may adjust the position and size of the detected person P in the captured image using a known method.

[0035] Next, the person detection unit 15B tracks the detected person P that appears in the video. The person detection unit 15B tracks the detected person P across multiple frames, i.e., the video. A known method can be used to track person P.

[0036] In detail, for example, the person detection unit 15B sets the initial position of the detected person P. The person detection unit 15B also selects the features of the detected person P. As mentioned above, the features of person P include, for example, the face, body outline, clothing, etc.

[0037] The person detection unit 15B then tracks person P between frames (between captured images 30) based on the selected features. Techniques such as color matching, template matching, optical flow, and color histogram can be used to track person P. Position estimation using a Kalman filter or the like may also be used in combination. The person detection unit 15B then updates the tracking results for each captured image 30 (frame) to acquire the position of person P in real time and track person P. The person detection unit 15B may also perform filtering to correct false detections that may occur during tracking. Known methods can be used for filtering. The person detection unit 15B may also perform tracking of person P using known algorithms or deep learning models for person P tracking.

[0038] The posture estimation unit 15C estimates the posture of person P included in the video.

[0039] The posture of person P refers to the position and stance of person P's body. More specifically, the posture of person P is represented by at least one of the following: the positional relationship of the parts of person P's body, the angle of the parts, the orientation of the parts, the inclination of the parts relative to the reference direction, the position of the parts, the movement of the parts, and the speed of the movement.

[0040] A body part is a part of the body that makes up person P. Specifically, the body parts of person P include, but are not limited to, the head, face, neck, torso, arms, upper body, legs, waist, lower body, back, knees, ankles, toes, feet, and joints. The reference direction is a predetermined direction in real space. The angle of a body part refers to the angle of each body part with respect to the reference direction, and the angles formed between multiple body parts. The reference direction is, for example, the vertical direction, the horizontal direction, etc., but is not limited to these. In this embodiment, the explanation will assume that the reference direction is the vertical direction. Movement represents the positional movement of the entire person P or each body part of person P, the angle change between body parts, the movement of each body part, etc.

[0041] The posture estimation unit 15C estimates the posture of person P by utilizing known skeletal estimation techniques. For example, the posture estimation unit 15C extracts skeletal features such as the joint positions of person P included in the captured image 30 using OpenPose, as shown in Non-Patent Literature 4, which estimates the skeleton of person P included in the captured image 30. Then, the posture estimation unit 15C estimates the posture of person P from the extracted skeletal features using deep learning or machine learning algorithms that estimate the posture of person P from skeletal features. In addition, the posture estimation unit 15C may correct the posture estimation result by known methods as needed.

[0042] The behavior estimation unit 15D estimates the behavior of person P based on the posture of person P and the duration of that posture.

[0043] The behavior of person P refers to the posture and manner of person P. In this embodiment, the behavior of person P is represented by the posture of person P and the duration of that posture.

[0044] Person P's actions may include, but are not limited to, peeking, looking around, waving, looking up, running, walking, falling, or crouching.

[0045] The behavior estimation unit 15D analyzes the posture of person P included in the video, which is composed of time-series consecutive captured images 30 (frames), by the posture estimation unit 15C, and calculates the duration of the posture estimated by the posture estimation unit 15C.

[0046] The behavior estimation unit 15D then stores in the storage unit 14 the behavior corresponding to the posture of person P and the duration of that posture in advance, and estimates the behavior of person P by reading the behavior corresponding to the posture and duration of that posture estimated by the posture estimation unit 15C from the storage unit 14. The behavior estimation unit 15D may also estimate the behavior of person P from the posture and duration of that posture using a machine learning model or rule-based algorithm that outputs the behavior of person P from the posture and duration of that posture.

[0047] For example, consider a case where the behavior "peeking" is pre-registered in the memory unit 14, associated with a registered posture "specific forward-leaning posture" represented by a predetermined range of face orientation (Tθh1 or more and Tθh2 or less) and a predetermined angular range of tilt in a predetermined forward-leaning direction (Tθa1 or more and Tθa2 or less), and a duration Td which is a threshold for the duration of the registered posture. The duration Td is an example of a predetermined duration threshold.

[0048] The registered posture is information representing each of several types of postures that are pre-stored in the memory unit 14.

[0049] The registered posture "forward leaning posture" is a posture in which the forward lean of the upper body portion of person P relative to the vertical direction is represented by a predetermined angular range (Tθa1 or greater and Tθa2 or less). The upper body portion of person P is represented by a straight line connecting the waist and head of person P. The forward leaning direction is the direction in which the head of person P is tilted toward the abdominal side of person P.

[0050] The registered posture "specific forward-leaning posture" is a posture in which the orientation of person P's face is within a predetermined range of face orientation (Tθh1 or more and Tθh2 or less), and the forward tilt of the upper body portion of person P relative to the vertical is represented by a predetermined angular range (Tθa1 or more and Tθa2 or less).

[0051] The memory unit 14 has pre-registered multiple different registered postures and corresponding behaviors, and the combinations of registered postures and behaviors registered in the processing unit 15 are not limited to these. The same applies in the following description.

[0052] The behavior estimation unit 15D estimates the behavior of person P by reading from the storage unit 14 a registered posture that matches the posture estimated by the posture estimation unit 15C, and behavior associated with the duration of the registered posture.

[0053] Figure 3A is a schematic diagram of an example of captured image 30C. Captured image 30C is an example of captured image 30.

[0054] For example, consider a scenario where the posture estimation unit 15C estimates the posture of person P shown in Figure 3A, which is represented by the inclination θa of the upper body of person P in the forward direction relative to the reference direction (vertical direction), the direction of the face θh, and the direction of the body θc.

[0055] The orientation of the face θh is the direction obtained by skeletal estimation by the person detection unit 15B. Alternatively, the posture estimation unit 15C may estimate the orientation of the face θh by known image analysis processing of the captured image 30 and use it for posture estimation. In this case, the posture estimation unit 15C can estimate the orientation of the face θh by known image analysis processing, for example, from multiple feature points such as eyes, nose, and mouth included in the face of person P. By having the posture estimation unit 15C estimate the orientation of the face θh, it is possible to estimate the orientation of the face θh with higher accuracy compared to estimating the orientation of the face θh based on skeletal estimation. Furthermore, the posture estimation unit 15C may estimate the orientation of the face θh using a machine learning algorithm that directly estimates the direction of the face from a partial image including the head, so as to be able to handle cases where parts of the face (eyes, nose, etc.) are partially hidden.

[0056] The body orientation θc is the direction in which the front of the torso of person P is facing. The body orientation θc is the direction obtained by skeletal estimation by the person detection unit 15B. Alternatively, the posture estimation unit 15C may estimate the body orientation θc by known image analysis processing of the captured image 30.

[0057] The behavior estimation unit 15D reads from the storage unit 14 a registered posture and the duration Td of the registered posture that matches the posture of person P estimated by the posture estimation unit 15C. More specifically, the behavior estimation unit 15D reads from the storage unit 14 a registered posture and the duration Td of the registered posture that are associated with a predetermined range of face orientations (Tθh1 or more and Tθh2 or less) including the face orientation θh of person P represented by the posture of person P estimated by the posture estimation unit 15C, and a predetermined angular range of forward tilt (Tθa1 or more and Tθa2 or less) including the forward tilt θa represented by the posture of person P.

[0058] Then, the behavior estimation unit 15D estimates the behavior of person P as "peeking" if the duration t of the posture of person P estimated by the posture estimation unit 15C is greater than or equal to the read duration Td (t≧Td).

[0059] In other words, the behavior estimation unit 15D determines that the posture of person P is the registered posture "specific forward-leaning posture" if the forward-leaning inclination θa represented by the posture estimation unit 15C is within a predetermined angular range of forward-leaning inclination (Tθa1 or more and Tθa2 or less), and the face orientation θh represented by the posture is within the range of face orientation (Tθh1 or more and Tθh2 or less). Then, the behavior estimation unit 15D determines that the posture of person P estimated by the posture estimation unit 15C is the registered posture "specific forward-leaning posture," and the duration t of the registered posture is greater than or equal to the duration threshold Td associated with the registered posture, and the behavior estimation unit 15D determines that the behavior of person P is "peeking."

[0060] In the above specific example, the objective is to detect the long-duration behavior of "peeking," which is the behavior to be estimated, so the estimation condition was set to a duration of Td or more. However, depending on the characteristics of the behavior to be estimated, the estimation condition may also be set to a duration of Td or less. That is, the specific behavior "peeking" is a behavior represented by the posture being a specific posture "specific forward-leaning posture," and the duration t of the specific posture being a predetermined duration Td or more (t≧Td), or the duration t of the specific posture being Td or less (t≦Td). In this specification, the method of setting conditions regarding duration will be the same. In this embodiment, a form in which the behavior is represented by the posture being a specific posture "specific forward-leaning posture," and the duration t of the specific posture being Td or more (t≧Td), will be described as an example.

[0061] The behavior estimation unit 15D can similarly estimate the behavior of person P if the posture of person P estimated by the posture estimation unit 15C is a different posture, by reading the behavior corresponding to the conditions represented by a specific posture that matches the said posture and the duration of the said specific posture from the storage unit 14. Alternatively, as described above, the behavior estimation unit 15D may estimate the behavior of person P from the posture and duration of the said posture using a machine learning model or rule-based algorithm that outputs the behavior of person P from the posture and duration of the said posture.

[0062] The behavior estimation unit 15D can estimate the behavior of person P with high accuracy by using not only the posture of person P but also the duration of that posture to estimate the behavior of person P represented by the posture and the duration of that posture.

[0063] The behavior estimation unit 15D may further estimate the behavior of person P by also including the position of person P. More specifically, the behavior estimation unit 15D may further estimate the behavior of person P by also using the result of determining whether or not person P is located within a specific range SE. By further estimating the behavior of person P by also including the position of person P, the behavior estimation unit 15D can estimate the behavior of person P with even higher accuracy.

[0064] A specific range SE is a predetermined area in real space RS. A specific range SE is, for example, an area in real space RS where there is a possibility of danger or the need to issue a warning to person P if person P is located within the specific range SE. Examples of specific range SE include an area at a predetermined distance from the boundary between a station platform and the tracks toward the platform, or an area at a predetermined distance from the boundary between a sidewalk and a road toward the sidewalk.

[0065] In this case, the memory unit 14 only needs to store in advance the association between a specific range SE in real space RS, the posture of person P, and the behavior corresponding to the duration of said posture. For example, it is assumed that the memory unit 14 has in advance registered a registered posture "specific forward-leaning posture" which is represented by a specific range SE, a predetermined range of face orientation (Tθh1 or more and Tθh2 or less), and a predetermined angular range of tilt in a predetermined forward-leaning direction (Tθa1 or more and Tθa2 or less), and the behavior "peeking" which is associated with the duration Td, which is a threshold for the duration of said registered posture. Multiple different types of specific range SEs, multiple different types of registered postures, and behaviors corresponding to said specific range SE and said registered posture are pre-registered in the memory unit 14, and the combinations of specific range SEs, registered postures and behaviors registered in the memory unit 14B are not limited to these.

[0066] The behavior estimation unit 15D determines whether the standing position P1 of person P, estimated by the person detection unit 15B, is within a specific range SE. For example, the behavior estimation unit 15D determines whether the standing position P1 of person P in the captured image 30C is within a specific range SE by determining whether the position coordinates of the standing position P1 of person P in the captured image 30C are within a specific range SE in the captured image 30C.

[0067] Figure 3B is a schematic diagram of an example of captured image 30F. Captured image 30F is an example of captured image 30.

[0068] The behavior estimation unit 15D may determine whether or not a person P is located within a specific range SE by determining, for example, whether or not the point representing the foot FP of person P estimated by the posture estimation unit 15C falls within the region SA set in the captured image 30 (is located within region SA).

[0069] Region SA is an area that includes at least a part of the specified range SE. The area of ​​region SA is greater than or equal to the area of ​​the specified range SE. By adding the position of the point representing the feet FP of person P to the determination, the behavior estimation unit 15D can prevent misidentification of person P as an alarm target when it is outside the specified range SE, thereby contributing to improved accuracy.

[0070] However, the body part of person P used as the basis for the determination is not limited to the feet FP, but may be any arbitrarily set body part, such as the head position HP, at least a part of the legs, or a part of the spine. The behavior estimation unit 15D may then determine whether person P is located within a specific range SE by determining whether the body part enters the area SA.

[0071] Furthermore, region SA only needs to be a region useful for determining whether person P is located within a specific range SE, and it does not need to include the specific range SE. The area of ​​region SA may also be smaller than the area of ​​the specific range SE.

[0072] Then, the behavior estimation unit 15D estimates the behavior of person P as "peeking" if the standing position P1 of person P detected by person detection unit 15B is located within a specific range SE, and the duration t of the registered posture "specific forward leaning posture" that matches the posture of person P estimated by posture estimation unit 15C is greater than or equal to the duration Td (t≧Td) associated with the specific range SE and the registered posture.

[0073] The behavior estimation unit 15D can similarly estimate the behavior of person P if the posture of person P estimated by the posture estimation unit 15C is a different posture, by reading the behavior corresponding to the conditions represented by the position of person P, a specific posture matching that posture, and the duration of the specific posture from the storage unit 14. Alternatively, as described above, the behavior estimation unit 15D may estimate the behavior of person P from the position of person P, the posture of person P, and the duration of the posture using a learning model or rule-based algorithm that outputs behavior from the position of person P, the posture of person P, and the duration of the posture.

[0074] Furthermore, the behavior estimation unit 15D may change the threshold values ​​(Tθa1, Tθa2) of a predetermined angle range (Tθa1 or more and Tθa2 or less) of the forward tilt used in advance when determining the behavior, according to the orientation θc of the person P's body. This is because the appearance of the forward tilt angle estimated from the captured image 30 may change depending on the orientation of the person P's body as captured in the captured image 30.

[0075] For example, if the orientation θc of person P's body satisfies the relationship pθ1≦θc≦pθ2, the behavior estimation unit 15D uses the threshold values ​​(Tθa1, Tθa2) of a predetermined angular range (Tθa1 or more and Tθa2 or less) of the forward tilt stored in the memory unit 14 without changing them to estimate the behavior of person P. On the other hand, if the orientation θc of person P's body does not satisfy the relationship pθ1≦θc≦pθ2 but satisfies the relationship pθ2<θc≦pθ3, the behavior estimation unit 15D can change the threshold values ​​and use Tθa3 or more and Tθa4 or less as the predetermined angular range of the forward tilt.

[0076] pθ1 is less than pθ2. pθ2 is less than pθ3. Tθa1 is less than Tθa2. Tθa3 is less than Tθa4. Tθa1, Tθa2, Tθa3, and Tθa4 are all distinct values.

[0077] The behavior estimation unit 15D can estimate the behavior of person P with even greater accuracy by adjusting the threshold values ​​(Tθa1, Tθa2) of a predetermined angle range (Tθa1 or more and Tθa2 or less) of the forward tilt direction that are stored in advance and used when determining the behavior, according to the orientation θc of person P's body.

[0078] Furthermore, the behavior estimation unit 15D may also estimate the behavior of person P by adding the direction of movement of person P. More specifically, the behavior estimation unit 15D may further estimate the behavior of person P by using the result of determining whether or not person P is moving in a predetermined direction. The predetermined direction is, for example, the direction from outside a specific range SE to inside a specific range SE.

[0079] This will be explained using Figure 3C. Figure 3C is a schematic diagram of an example of captured image 30D. Captured image 30D is an example of captured image 30.

[0080] In this case, the memory unit 14 only needs to store in advance the movement direction D of person P, the posture of person P, and the behavior corresponding to the duration of the posture. For example, the memory unit 14 only needs to store in advance the movement direction D of person P, the posture of person P, and the behavior corresponding to the duration of the posture. For example, it is assumed that the memory unit 14 has in advance registered a registered posture "specific forward-leaning posture" represented by a predetermined movement direction D (predetermined direction DS), a predetermined range of face orientation (Tθh1 or more and Tθh2 or less), and a predetermined angular range of tilt in a predetermined forward-leaning direction (Tθa1 or more and Tθa2 or less), and the behavior "peeking" is in advance associated with the duration Td, which is a threshold for the duration of the registered posture. Multiple types of predetermined directions DS, multiple types of registered postures that are different from each other, and behaviors corresponding to the predetermined direction DS and registered postures are pre-registered in the memory unit 14, and the combinations of predetermined directions DS, registered postures, and behaviors registered in the processing unit 15 are not limited to these.

[0081] The behavior estimation unit 15D identifies the direction of movement D1 of the person P tracked by the person detection unit 15B. The behavior estimation unit 15D identifies the direction of the vector from the tracking start point P1a to the latest tracking point P1b, obtained from the tracking results of person P captured in multiple time-series consecutive images 30 (frames) by the person detection unit 15B, as the direction of movement D1 of person P.

[0082] Furthermore, if a predetermined direction DS of person P that matches the direction of movement D1 of the identified person P is stored, the behavior estimation unit 15D estimates the behavior of person P as "peeking" if the posture of person P at at least one timing during movement toward the predetermined direction DS is a registered posture "specific forward-leaning posture" associated with the predetermined direction DS, and the duration t of the registered posture is greater than or equal to the duration Td associated with the predetermined direction DS and the registered posture (t≧Td). Also, as described above, if a predetermined direction DS of person P that matches the direction of movement D1 of the identified person P is stored, the behavior estimation unit 15D may estimate the behavior of person P as "peeking" if the posture of person P at at least one timing during movement toward the predetermined direction DS is a registered posture "specific forward-leaning posture" associated with the predetermined direction DS, and the duration t of the registered posture is less than or equal to the duration Td associated with the predetermined direction DS and the registered posture (t≦Td).

[0083] The behavior estimation unit 15D can similarly estimate the behavior of person P when the direction of movement D1 is in another direction, or when the posture of person P estimated by the posture estimation unit 15C is in another posture, by reading the behavior corresponding to the conditions expressed by the direction of movement D1, a specific posture matching the posture, and the duration of the specific posture from the storage unit 14. Alternatively, as described above, the behavior estimation unit 15D may estimate the behavior of person P from the posture and duration of the posture using a machine learning model or rule-based algorithm that outputs the behavior of person P from the direction of movement D1, the posture of person P, and the duration of the posture.

[0084] In this way, the behavior estimation unit 15D can estimate the behavior of person P with even greater accuracy by also estimating the movement direction D of person P.

[0085] Returning to Figure 1, we continue the explanation.

[0086] The object detection unit 15E detects objects included in the video.

[0087] Figure 4A is a schematic diagram of an example of captured image 30E. Captured image 30E is an example of captured image 30.

[0088] The target detection unit 15E detects the target T included in the captured image 30.

[0089] Object T is an element other than person P, and is an element that can be detected from the captured image 30. Object T is, for example, an object other than person P, or characters or images written or displayed on an object. An object that is Object T is, for example, something that is held or can be held by person P, or something that can be seen or confirmed by person P. Specifically, an object that is Object T is, for example, a smartphone, a mobile terminal, a wristwatch, a clock, a display screen such as a display panel, a moving object such as a vehicle, etc. The object detection unit 15E may further detect Object T that are within a predetermined range from person P as surrounding objects. This predetermined range is, for example, the shooting range of the shooting device 20, but is not limited to this range. The predetermined range can be set in advance.

[0090] The object detection unit 15E should detect the object T included in the video using a known object detection method that employs a deep learning algorithm. The object detection method can be one of the methods described in Non-Patent Documents 1-3, etc., such as Faster-RCNN, YOLO, or SSD.

[0091] Returning to Figure 1, we continue the explanation.

[0092] The warning target determination unit 15F determines whether the behavior of person P, as estimated by the behavior estimation unit 15D, is a target for a warning.

[0093] In detail, the warning target determination unit 15F determines that the behavior of person P is a warning target if the behavior estimated by the behavior estimation unit 15D is a specific behavior.

[0094] A specific behavior is a behavior that is subject to a warning. A behavior that is subject to a warning is, for example, a behavior that is considered dangerous from the standpoint of safety, security, crime prevention, etc. A specific behavior corresponds to some of the multiple types of behaviors that can be estimated by the behavior estimation unit 15D.

[0095] In this embodiment, we will explain assuming that the specific behavior is a dangerous behavior from a safety standpoint, such as when person P leans over from the station platform toward the tracks, or when person P leans over from the sidewalk toward the roadway, as shown in Figures 2A and 2B, or a suspicious behavior from a security or crime prevention standpoint. Specifically, the specific behavior may be, for example, peeking, or looking around nervously. In this embodiment, we will explain the case where the specific behavior is peeking as an example. Note that the specific behavior may be of any type other than peeking, and is not limited to peeking.

[0096] The specific behavior only needs to be stored in the memory unit 14 beforehand. The memory unit 14 has the specific behavior and the conditions used to estimate the specific behavior registered in a pre-associated manner.

[0097] The conditions used to estimate specific behaviors are the conditions that the behavior estimation unit 15D uses to determine the estimation of each behavior.

[0098] As described above, the specific behavior "peeking" is a behavior represented by the posture being a specific posture, a specific forward-leaning posture, and the duration of that specific posture being longer than a predetermined time. The registered posture, the specific forward-leaning posture, is a posture represented by a predetermined range of face orientation (Tθh1 or more and Tθh2 or less) and a predetermined range of angles of tilt in the predetermined forward-leaning direction (Tθa1 or more and Tθa2 or less). Furthermore, the threshold for the duration of the registered posture, the specific forward-leaning posture, in the specific behavior "peeking" is the duration Td. These conditions correspond to the conditions used to estimate the specific behavior.

[0099] This will be explained using Figure 3A.

[0100] The warning target determination unit 15F determines whether the behavior of person P estimated by the behavior estimation unit 15D is a specific posture "specific forward leaning posture" which is represented by the direction of person P's face θh, which is represented by the posture of person P, being in the range of Tθh1 to Tθh2 (Tθh1≦θh≦Tθh2), the inclination in the forward leaning direction θa, which is represented by the posture of person P, being in the range of Tθa1 to Tθa2 (Tθa1≦θa≦Tθa2), and the duration t of the specific posture being Td or longer (t≧Td), and whether it is a specific behavior "peeking".

[0101] The warning target determination unit 15F determines that the behavior of person P is subject to a warning if the behavior estimation unit 15D determines that the behavior of person P is the specific behavior "peeping". Conversely, the warning target determination unit 15F determines that the behavior of person P is not subject to a warning if the behavior estimation unit 15D determines that the behavior of person P is not the specific behavior "peeping".

[0102] The warning target determination unit 15F, similar to the behavior estimation unit 15D, may change the threshold values ​​(Tθa1, Tθa2) of a predetermined angle range (Tθa1 or more and Tθa2 or less) of the forward tilt used in advance during behavior determination, according to the orientation θc of the person P's body. The adjustment method is the same as described above.

[0103] Furthermore, as described above, the behavior estimation unit 15D may also use the result of determining whether or not person P is located within a specific range SE to estimate person P's behavior. For this reason, a specific behavior may be represented by person P being located within a specific range SE, having a specific posture, and the duration t of the specific posture being greater than or equal to a predetermined time. More specifically, as described above, the specific behavior "peeking" may be represented by person P's standing position P1 being within a specific range SE, having a specific posture "a specific forward-leaning posture," and the duration t of the specific posture being greater than or equal to a predetermined time (duration Td).

[0104] In this case, the warning target determination unit 15F determines whether the behavior of person P estimated by the behavior estimation unit 15D is a specific behavior "peeking" which is represented by the following conditions: person P's standing position P1 is within a specific range SE; the direction of person P's face θh, represented by person P's posture, is in the range of Tθh1 to Tθh2 (Tθh1≦θh≦Tθh2); the inclination θa in the forward-leaning direction, represented by person P's posture, is in the range of Tθa1 to Tθa2 (Tθa1≦θa≦Tθa2); and the duration t of the specific posture is greater than or equal to Td (t≧Td).

[0105] The warning target determination unit 15F determines that the behavior of person P is subject to a warning if the behavior estimation unit 15D determines that the behavior of person P is the specific behavior "peeping". Conversely, the warning target determination unit 15F determines that the behavior of person P is not subject to a warning if the behavior estimation unit 15D determines that the behavior of person P is not the specific behavior "peeping".

[0106] This will be explained using Figure 3C. As described above, the behavior estimation unit 15D may also estimate the movement direction of person P when estimating the behavior of person P.

[0107] Therefore, a specific behavior may be defined as a behavior in which the direction of movement D1 of person P is a predetermined direction DS, the posture of person P at at least one timing during movement toward the predetermined direction DS is a specific posture, and the duration t of the specific posture is a predetermined time or longer. More specifically, as described above, the specific behavior "peeking" may be defined as a behavior in which the direction of movement D1 of person P is a predetermined direction DS, the posture of person P at at least one timing during movement toward the predetermined direction DS is a specific posture "specific forward-leaning posture", and the duration t of the specific posture is a predetermined time (duration Td) or longer.

[0108] In this case, the warning target determination unit 15F determines whether the behavior of person P estimated by the behavior estimation unit 15D is a specific behavior "peeking," which is represented by the fact that the direction of movement D1 of person P is a predetermined direction DS, the posture of person P at least one timing while moving toward the predetermined direction DS is a specific posture "specific forward leaning posture" represented by the direction θh of person P's face represented by the posture of person P being in the range of Tθh1 to Tθh2 (Tθh1≦θh≦Tθh2), the inclination θa in the forward leaning direction represented by the posture of person P being Tθa1 to Tθa2 (Tθa1≦θa≦Tθa2), and the duration t of the specific posture is a duration Td or more (t≧Td).

[0109] The warning target determination unit 15F determines that the behavior of person P is subject to a warning if the behavior estimation unit 15D determines that the behavior of person P is the specific behavior "peeping". Conversely, the warning target determination unit 15F determines that the behavior of person P is not subject to a warning if the behavior estimation unit 15D determines that the behavior of person P is not the specific behavior "peeping".

[0110] This will be explained using Figure 4A. The warning target determination unit 15F determines that the behavior estimated by the behavior estimation unit 15D is a specific behavior, the target T detected by the target detection unit 15E is a specific target TS, and the person P is looking at the predetermined target TS, in which case the behavior of person P is determined not to be a warning target.

[0111] A specified target TS is an object T detected by the object detection unit 15E, and is used to determine whether an object is a warning target. A specified target TS is an object T that can be stared at or seen by a person P. Examples of specified target TS include, but are not limited to, a smartphone held by person P, a wristwatch worn by person P, or a traffic sign located vertically downstream from person P's face. A specified target TS only needs to be pre-set.

[0112] Even if the behavior estimation unit 15D determines that the behavior of person P is a specific behavior, the warning target determination unit 15F determines that the behavior of person P is not subject to a warning if it determines that the target T included in the captured image 30 is a specific target TS and that person P is viewing the specific target TS. The determination of whether or not person P is viewing the specific target TS can be performed by image analysis of the captured image 30 using known image processing techniques.

[0113] Furthermore, the warning target determination unit 15F determines that the behavior of person P, as estimated by the behavior estimation unit 15D, is a specific behavior, and that the target T included in the captured image 30 is a specific target TS, and that person P is not visually observing the specific target TS, then determines that the behavior of person P is a target for a warning. Also, the warning target determination unit 15F determines that the behavior of person P, as estimated by the behavior estimation unit 15D, is a specific behavior, and that the target T included in the captured image 30 is not a specific target TS, then determines that the behavior of person P is a target for a warning.

[0114] Therefore, for example, even if the warning target determination unit 15F determines that person P's behavior is the specific behavior "peeking," if it is presumed that person P is performing behavior equivalent to the specific behavior "peeking" in order to view a specific target TS such as a smartphone, then person P's behavior can be excluded from the warning target. In this case, the warning target determination unit 15F can prevent the misidentification of person P as a warning target when the situation is not actually one that should be excluded from the warning target.

[0115] Furthermore, when the behavior estimation unit 15D determines that the target T included in the captured image 30 is a specific target TS and that the person P is not seeing the specific target TS, the count-up of the posture duration t is stopped and reset to zero, thereby suppressing misjudgments during behavior determination.

[0116] Figure 4B is an explanatory diagram illustrating an example of a determination made by the warning target determination unit 15F.

[0117] The warning target determination unit 15F determines that the behavior of person P is not subject to a warning if the behavior of person P estimated by the behavior estimation unit 15D is a specific behavior and the surrounding object TQ of person P is moving away from person P. Furthermore, the warning target determination unit 15F determines that the behavior of person P is subject to a warning if the behavior estimated by the behavior estimation unit 15D is a specific behavior and the surrounding object TQ of person P is moving towards person P.

[0118] A surrounding object TQ is an object T that is within a predetermined range from person P. The predetermined range can be set in advance. In this embodiment, one example described is a configuration in which the predetermined range is the shooting range of the shooting device 20. For example, the surrounding object TQ is a movable object T such as a vehicle. Known image processing techniques can be used to determine whether the surrounding object TQ is moving towards or away from person P.

[0119] In the example shown in Figure 4B, vehicle TQ1, moving in direction X1, is moving in a direction that approaches person P. Vehicle TQ1 is an example of a surrounding object TQ. When vehicle TQ1 detected by object detection unit 15E is moving in a direction that approaches person P, object detection unit 15E determines that person P's behavior is a warning target if the behavior of person P estimated by behavior estimation unit 15D is a specific behavior.

[0120] On the other hand, vehicle TQ2, which is moving in direction X2, is moving away from person P. Vehicle TQ2 is an example of a surrounding object TQ. When vehicle TQ2 detected by object detection unit 15E is moving away from person P, object detection unit 15E determines that person P's behavior is not subject to warning, even if the behavior of person P estimated by behavior estimation unit 15D is a specific behavior.

[0121] Furthermore, if the target detection unit 15E detects multiple surrounding objects TQ and at least one surrounding object TQ is moving toward person P, the target detection unit 15E may determine that person P's behavior is a target for a warning if the behavior of person P estimated by the behavior estimation unit 15D is a specific behavior.

[0122] Furthermore, the warning target determination unit 15F may determine that a person P is not subject to a warning if the behavior of person P estimated by the behavior estimation unit 15D matches or is similar to an exclusion determination behavior. The exclusion determination behavior can be predetermined.

[0123] For example, the warning target determination unit 15F pre-stores exclusion postures as exclusion determination behaviors, which are represented by a group of one or more vectors V1 pointing from the center position of person P to each of one or more parts of person P. The warning target determination unit 15F then identifies a group of vectors V2 that represents the posture of person P, as determined by the behavior estimation unit 15D. The vector group V2 is a group of one or more vectors pointing from the center position of person P to each of one or more parts of person P. The warning target determination unit 15F then determines that the behavior of person P estimated by the behavior estimation unit 15D matches or is similar to the exclusion determination behavior if the similarity between the vector group V2 and the vector group V1 is greater than or equal to a threshold.

[0124] Returning to Figure 1, we continue the explanation.

[0125] The output control unit 15G outputs the result of determining whether or not person P's behavior is subject to a warning to the output device 13. Therefore, the output control unit 15G can provide the user with a confirmation of whether or not person P's behavior is subject to a warning. The output control unit 15G may output the determination result to the output device 13 if it is determined that person P's behavior is subject to a warning. The output control unit 15G may omit outputting the determination result to the output device 13 if it is determined that person P's behavior is not subject to a warning.

[0126] As described above, the output device 13 may be configured to be located outside the information processing device 10 and to be connected to the processing unit 15 in a manner that allows communication. For example, the output device 13 may be placed in a position that can be confirmed by a person P who is within the shooting range of the imaging device 20 in real space RS. By outputting the judgment result to the output device 13 placed in a position that can be confirmed by person P, the output control unit 15G can alert person P that they are performing a behavior that warrants a warning.

[0127] Furthermore, the output device 13 may be placed in a location where it can be seen by other people, such as supervisors or facility managers, who are not persons P within the shooting range of the imaging device 20 in real space RS. By outputting the judgment result to the output device 13 placed in a location where it can be seen by supervisors or facility managers, the output control unit 15G can notify supervisors or facility managers to be alerted that persons P is exhibiting behavior that warrants a warning.

[0128] Next, an example of the information processing flow performed by the information processing device 10 of this embodiment will be described.

[0129] Figure 5 is a flowchart showing an example of the information processing flow performed by the information processing device 10 of this embodiment.

[0130] The acquisition unit 15A starts acquiring the video captured by the imaging device 20 (step S100).

[0131] The person detection unit 15B detects a person P that appears in the video acquired by the acquisition unit 15A and starts tracking person P (step S102).

[0132] The target detection unit 15E detects the target T included in the video (step S104).

[0133] The posture estimation unit 15C estimates the posture of person P included in the video (step S106).

[0134] The behavior estimation unit 15D estimates the behavior of person P based on the posture of person P estimated in step S106 and the duration t of that posture (step S108).

[0135] The warning target determination unit 15F determines whether the behavior of person P, estimated in step S108, is subject to a warning (step S110).

[0136] Then, the output control unit 15G outputs the determination result from step S110 to the output device 13 (step S112).

[0137] The processing unit 15 determines whether or not to terminate the process (step S114). For example, the processing unit 15 makes the determination in step S114 by determining whether or not it has received instruction information indicating the termination of the process, such as an operation instruction from the user to the input unit 12. If the determination in step S114 is negative (step S114: No), the process returns to step S102. If the determination in step S114 is positive (step S114: Yes), the routine terminates.

[0138] As described above, the information processing device 10 of this embodiment comprises a posture estimation unit 15C and a behavior estimation unit 15D. The posture estimation unit 15C estimates the posture of person P included in the video. The behavior estimation unit 15D estimates the behavior of person P based on the position, posture, and duration t of the posture of person P.

[0139] Thus, in this embodiment, the information processing device 10 estimates the behavior represented by the position, posture, and duration t of the posture of person P, based on the position, posture, and duration t of the posture.

[0140] Therefore, the information processing device 10 of this embodiment can estimate the behavior of person P with high accuracy.

[0141] Furthermore, in the information processing device 10 of this embodiment, the warning target determination unit 15F determines whether or not the behavior of person P is subject to a warning.

[0142] In addition to the above effects, the information processing device 10 of this embodiment can determine with high accuracy whether the behavior estimated with high accuracy is subject to a warning.

[0143] (Examples of application) Next, specific application examples of the information processing device 10 described in the above embodiment will be explained.

[0144] The information processing device 10 described in the above embodiment can be applied as various information processing devices that alert people P who are on a train station platform or walking on a road or sidewalk. Furthermore, the information processing device 10 of this embodiment can also be used as an application to detect people P who are peering at a device or the actions of other people.

[0145] Figures 6A to 6C are explanatory diagrams illustrating an example of an application example of the information processing device 10 of this embodiment. Figures 6A to 6C show a usage scenario of the ATM (Automated Teller Machine) 40. The ATM 40 is equipped with an operation screen 42 and an input device 44. The input device 44 is, for example, a keyboard.

[0146] As shown in Figure 6A, we assume a scenario where a person P other than user PQ is peering at the ATM 40's operation screen 42 or input device 44 while the ATM 40 is in use. In such a scenario, the information processing device 10 can determine whether the behavior of person P is a specific behavior, "peeking," which is subject to a warning, by executing the above process.

[0147] More specifically, as shown in Figure 6B, the warning target determination unit 15F determines whether the behavior of person P estimated by the behavior estimation unit 15D is a specific behavior "peeking" which is represented by the following conditions: person P's standing position P1 is within a specific range SE; the direction of person P's face θh, represented by person P's posture, is in the range of Tθh1 to Tθh2 (Tθh1≦θh≦Tθh2); the inclination θa in the forward-leaning direction, represented by person P's posture, is in the range of Tθa1 to Tθa2 (Tθa1≦θa≦Tθa2); and the duration t of the specific posture is greater than or equal to Td (t≧Td).

[0148] Then, if the warning target determination unit 15F determines that the behavior of person P is a specific behavior "peeking," it determines that the behavior of person P is a warning target. Furthermore, the output control unit 15G outputs the determination result to the output device 13.

[0149] Furthermore, as shown in Figure 6C, the warning target determination unit 15F determines whether the behavior of person P estimated by the behavior estimation unit 15D is a specific behavior "peeking," which is represented by the following conditions: the direction of movement D1 of person P is a predetermined direction DS, the posture of person P at least one timing while moving toward the predetermined direction DS is a specific posture "specific forward-leaning posture" represented by the direction θh of person P's face represented by the posture of person P being in the range of Tθh1 to Tθh2 (Tθh1≦θh≦Tθh2), the inclination θa in the forward-leaning direction represented by the posture of person P being Tθa1 to Tθa2 (Tθa1≦θa≦Tθa2), and the duration t of the specific posture is Td or greater (t≧Td).

[0150] Then, if the warning target determination unit 15F determines that the behavior of person P is a specific behavior "peeking," it determines that the behavior of person P is a warning target. Furthermore, the output control unit 15G outputs the determination result to the output device 13.

[0151] As described above, the output device 13 is a display, speaker, etc. The judgment result output when a warning is issued may be warning information such as a warning sound, warning text, or warning image. The warning information may be, but is not limited to, information prompting user PQ to interrupt processing, or information prompting person P to pay attention.

[0152] By positioning the output device 13 near the ATM 40, it becomes possible to alert users PQ while they are using the ATM. Furthermore, by installing the output device 13 in a location where the ATM 40's administrator or security guard may be present, or by mounting it on an information processing device such as a portable terminal used by the administrator or security guard, it becomes possible to alert the administrator.

[0153] Next, an example of the hardware configuration of the information processing device 10 of the above embodiment will be described.

[0154] Figure 7 is a hardware configuration diagram of an example of the information processing device 10 of the above embodiment.

[0155] The information processing device 10 in the above embodiment has a CPU (Central Processing Unit) 80, ROM (Read Only Memory) 82, RAM (Random Access Memory) 84, and I / F 86, etc., all interconnected by a bus 88, and has a hardware configuration that uses a normal computer.

[0156] The CPU 80 is an arithmetic unit that controls the information processing device 10 of the above embodiment. The ROM 82 stores programs and the like that realize information processing by the CPU 80. The RAM 84 stores data necessary for various processes performed by the CPU 80. The I / F 86 is an interface connected to the storage unit, input unit, output unit, sensor, and communication unit, etc., for sending and receiving data.

[0157] In the information processing device 10 of the above embodiment, the CPU 80 reads a program from the ROM 82 onto the RAM 84 and executes it, thereby realizing each of the above-mentioned functional units on the computer.

[0158] The program for executing each of the above processes performed by the information processing device 10 of the above embodiment may be stored in the HDD (hard disk drive). Furthermore, the program for executing each of the above processes performed by the information processing device 10 and information processing device 10B of the above embodiment may be pre-installed and provided in the ROM 82.

[0159] Furthermore, the program for executing the above-described process performed by the information processing device 10 of the above embodiment may be stored in an installable or executable file format on a computer-readable storage medium such as a CD-ROM, CD-R, memory card, DVD (Digital Versatile Disk), or flexible disk (FD), and provided as a computer program product. Alternatively, the program for executing the above-described process performed by the information processing device 10 of the above embodiment may be stored on a computer connected to a network such as the Internet and provided by allowing download via the network. Alternatively, the program for executing the above-described process performed by the information processing device 10 of the above embodiment may be provided or distributed via a network such as the Internet.

[0160] Although embodiments have been described above, these embodiments are presented as examples only and are not intended to limit the scope of the invention. This novel embodiment can be implemented in various other forms, and various omissions, substitutions, and modifications can be made without departing from the spirit of the invention. This embodiment and its variations are included in the scope and spirit of the invention, as well as in the claims of the invention and its equivalents. [Explanation of Symbols]

[0161] 10 Information Processing Devices 15C Posture estimation section 15D Behavior Estimation Unit 15E Target detection unit 15F Warning Target Determination Unit 15G Output Control Unit

Claims

1. A posture estimation unit that estimates the posture of a person included in the video, A behavior estimation unit that estimates the behavior of the person based on the person's position, posture, and duration of the posture, An information processing device equipped with the following features.

2. The attitude estimation unit, The posture is estimated by at least one of the positional relationship of the body parts of the person, the angle of the body parts, the orientation of the body parts, the inclination of the body parts with respect to a reference direction, position, movement, and speed of movement. The information processing apparatus according to claim 1.

3. A warning target determination unit that determines whether the aforementioned behavior of the aforementioned person is subject to a warning. The information processing apparatus according to claim 1, comprising:

4. The warning target determination unit is: The estimated behavior is, If the person is located within a specific range, their posture is a specific posture, and the duration of that specific posture is either longer than or less than a predetermined time, then it is determined that the person's behavior is subject to the warning. The information processing apparatus according to claim 3.

5. The warning target determination unit is: The estimated behavior is, If the person's direction of movement is a predetermined direction toward a specific range, the person's posture at at least one timing during movement toward the predetermined direction is a specific posture, and the duration of the specific posture is greater than or equal to a predetermined time, then the person's behavior is determined to be subject to the warning. The information processing apparatus according to claim 3.

6. The system includes an object detection unit that detects objects included in the aforementioned video, The warning target determination unit is: The estimated behavior is a specific behavior represented by the fact that the posture is a specific posture and the duration of the specific posture is greater than or equal to a predetermined time or less than or equal to a predetermined time. If it is determined that the aforementioned target is a specific target and that the person is visually observing the specific target, the person's behavior is determined to be outside the scope of a warning. The information processing apparatus according to claim 3.

7. The system includes an object detection unit that detects surrounding objects present around the person included in the video, The warning target determination unit is: The estimated behavior is, In cases where the aforementioned posture is a specific posture and the duration of the aforementioned posture is equal to or equal to the predetermined time, representing a specific behavior, If the surrounding object is moving away from the person, the person's behavior is determined not to be subject to a warning. If the surrounding object is moving in a direction that approaches the person, it is determined that the person's behavior is subject to the warning. The information processing apparatus according to claim 3.

8. The warning target determination unit is: If the estimated behavior matches or is similar to the exclusion determination behavior, it is determined that it is not subject to a warning. The information processing apparatus according to claim 3.

9. The aforementioned specific posture is, The forward tilt of the upper body portion of the person relative to the vertical is represented by a predetermined angular range. The information processing apparatus according to claim 4.

10. The behavior estimation unit, The predetermined angle range is changed according to the orientation of the person's body as represented by the aforementioned posture, and the changed predetermined angle range is used to determine whether or not the posture is the specified posture. The information processing apparatus according to claim 9.

11. An information processing method performed by an information processing device, A pose estimation step that estimates the posture of a person included in the video, A behavior estimation step in which the behavior of the person is estimated based on the person's position, posture, and duration of the posture, Information processing methods including

12. A pose estimation step that estimates the posture of a person included in the video, A behavior estimation step in which the behavior of the person is estimated based on the person's position, posture, and duration of the posture, An information processing program that causes a computer to execute something.