Information processing methods, programs, and information processing devices.
The information processing system enhances feature point detection accuracy by aggregating undetected time series and determining detection accuracy, improving motion analysis in moving images.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- BRIDGESTONE SPORTS CO LTD
- Filing Date
- 2024-12-20
- Publication Date
- 2026-07-02
AI Technical Summary
Existing techniques struggle with accurately detecting feature points from moving images, leading to low detection accuracy and subsequent inadequate analysis of predetermined motions.
An information processing system that detects multiple feature points in each frame, aggregates undetected time series, and determines detection accuracy based on an index value representing undetected time and its influence on the determination result.
Enables accurate determination of feature point detection, allowing for improved analysis of predetermined motions.
Smart Images

Figure 2026109631000001_ABST
Abstract
Description
Technical Field
[0001] The present disclosure relates to an information processing method and the like.
Background Art
[0002] Conventionally, a technique has been known for detecting a plurality of feature points representing a predetermined motion (for example, a golf swing motion) from a moving image and analyzing the predetermined motion (see, for example, Patent Document 1).
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] By the way, depending on various conditions, feature points may not be detected from a moving image. Therefore, in a situation where the detection accuracy of feature points is low, it may be impossible to appropriately analyze a predetermined motion. Therefore, it is desirable to be able to determine the detection accuracy of feature points.
[0005] Therefore, in view of the above problems, an object is to provide a technique capable of determining the detection accuracy when detecting feature points representing a predetermined motion from a moving image.
Means for Solving the Problems
[0007] In other embodiments of this disclosure, In an information processing device, The acquisition step involves obtaining video footage showing a specific action of the subject, A detection step of detecting two or more predetermined feature points representing the predetermined action for each of the multiple frames included in the video, A summation step of aggregating the undetected time in the time series of the multiple frames for each predetermined number of feature points, A determination step is performed to determine the detection accuracy of a predetermined number of feature points, based on an index value representing the undetected time for each predetermined number of feature points and information defining the degree to which the undetected time for each predetermined number of feature points has an influence on the determination result. A program will be provided.
[0008] Furthermore, in yet another embodiment of this disclosure, An acquisition unit that acquires video footage showing a specific action of the subject, A detection unit that detects two or more predetermined number of feature points representing the predetermined action for each of the multiple frames included in the video, A summation unit that aggregates the undetected time in the time series of the multiple frames for each of the predetermined number of feature points, The system includes a determination unit that performs a determination regarding the detection accuracy of a predetermined number of feature points based on an index value representing the undetected time for each predetermined number of feature points and information defining the degree to which the undetected time for each predetermined number of feature points influences the determination result. An information processing device is provided. [Effects of the Invention]
[0009] According to the above embodiment, it is possible to determine the detection accuracy when detecting feature points representing a predetermined motion from a moving image.
Brief Description of the Drawings
[0010] [Figure 1] It is a diagram showing an example of an information processing system. [Figure 2] It is a diagram showing an example of a moving image representing the state of a golf swing motion by a user. < The embodiments will be described below with reference to the drawings.
[0012] In this specification, a person who uses the functions of the information processing system 1 through the user terminal 200 is referred to as a "user," and a person who engages in tasks and management for providing the functions of the information processing system 1 to the user through the information processing device 300 may be conveniently referred to as a "worker, etc."
[0013] [Overview of the Information Processing System] Referring to Figures 1 to 3, an overview of the information processing system 1 according to this embodiment will be described.
[0014] Figure 1 shows an example of information processing system 1. Figure 2 shows an example of a video (video 20) representing a user's golf swing motion. Figure 3 shows another example (video 30) of a video (video 30) representing a user's golf swing motion. Specifically, Figures 2 and 3 are concrete examples of videos (videos) representing a user's golf swing motion.
[0015] Figure 2 shows frames 21-28, extracted from all the frames (still images) included in the video 20. Frame 21 represents the address position in the user's golf swing. Frame 22 represents the take-back position in the user's golf swing. Frame 23 represents the backswing position in the user's golf swing. Frame 24 represents the top position in the user's golf swing. Frame 25 represents the halfway down position in the user's golf swing. Frame 26 represents the impact position in the user's golf swing. Frame 27 represents the follow-through position in the user's golf swing. Frame 28 represents the finish position in the user's golf swing.
[0016] Similarly, Figure 3 shows frame 31, an excerpt from all the frames of the video 30. Frame 31 represents the halfway down position in the user's golf swing motion.
[0017] As shown in Figure 1, the information processing system 1 includes a camera 100, a user terminal 200, and an information processing device 300.
[0018] The information processing system 1, using the information processing device 300, detects N feature points (N: an integer greater than or equal to 2) representing a predetermined action for each of multiple frames of the video image data acquired by the camera 100, which contains footage of a predetermined action performed by a subject. The information processing system 1 then analyzes the predetermined action performed by the subject based on the N feature points for each of the multiple frames. As a result, the information processing system 1 can present the user with the analysis results of the predetermined action performed by the subject via the user terminal 200, and, based on the analysis results, provide support information for improving the predetermined action performed by the subject.
[0019] The subject of the photograph is, for example, a user of Information Processing System 1. Alternatively, the subject may be a customer to whom the user of Information Processing System 1 provides sales assistance for equipment (e.g., hitting equipment or balls). Furthermore, the subject may be a student receiving advice on specific actions from the user of Information Processing System 1. The following explanation will focus primarily on the case where the subject is the user of Information Processing System 1 themselves.
[0020] The specified action may be, for example, a golf swing. Alternatively, the specified action may be a golf putting motion. Furthermore, the specified action may be an action from another type of sport. Other types of sport actions include, for example, baseball batting, baseball pitching, and various swing motions such as tennis serves, forehands, and backhands. The specified action may also be an action that does not use equipment, such as running or walking. Furthermore, the specified action may be a type of action different from those in sports or general exercise, such as an action related to a specific skill in a factory. In addition, the specified action to be analyzed in the information processing system may be one type or multiple types.
[0021] The N feature points include, for example, feature points representing the body parts of the user in the frame's image (hereinafter referred to as "physical feature points"). Physical feature points may represent, for example, the joint positions in the user's skeleton, and the physical feature points included in the N feature points may include points on the image corresponding to the head, shoulders, elbows, wrists, hips, knees, ankles, etc.
[0022] Furthermore, the N feature points include, for example, feature points representing parts of tools held by the subject's user (e.g., golf clubs, bats, tennis rackets, etc.).
[0023] Furthermore, the N feature points include, for example, feature points representing the target ball that is launched by a tool (specifically, a striking tool).
[0024] Camera 100 captures images of a predetermined action performed by the user and acquires a video image representing that predetermined action. The video image is composed of a series of still images (frames).
[0025] Camera 100 is, for example, a so-called two-dimensional camera and acquires two-dimensional moving images. Alternatively, camera 100 may be a three-dimensional camera capable of acquiring depth information of the two-dimensional moving images in addition to the two-dimensional moving images. Depth information of a two-dimensional moving image refers to, for example, information representing the depth position of the object depicted in each pixel of each frame that makes up the two-dimensional moving image.
[0026] Camera 100 may, for example, acquire video footage representing the user's predetermined actions in response to the operation of a photographer other than the user performing the predetermined action. Alternatively, camera 100 may acquire video footage representing the user's predetermined actions in response to the user's operation using a self-timer function or the like.
[0027] Camera 100 may acquire video images representing a predetermined action of the user captured from a single viewpoint, or it may acquire video images representing a predetermined action of the user captured from multiple viewpoints. In the latter case, for example, multiple cameras 100 may acquire video images representing a predetermined action of the user from different viewpoints at the same time. Alternatively, by changing the position of one camera 100, images representing predetermined actions of the user from different viewpoints may be acquired sequentially.
[0028] For example, as shown in Figure 2, camera 100 faces the user directly and captures the golf swing motion from the front of the user. Alternatively, as shown in Figure 3, camera 100 may capture the golf swing motion from behind the user. "Behind the user" means behind the user when the direction of the ball flight path assumed by the user (hereinafter referred to as the "virtual ball flight path") is considered "forward". Camera 100 may also acquire both video footage representing the golf swing motion from the front of the user and video footage representing the golf swing motion from behind the user. In other words, camera 100 may acquire video footage representing the user's actions from one viewpoint, or it may acquire video footage representing the user's actions from multiple viewpoints.
[0029] In Figure 1, the camera 100 and the user terminal 200 are drawn separately, but the camera 100 may be built into the user terminal 200 or it may be provided separately from the user terminal 200. In the latter case, the output of the camera 100 (i.e., video data) may be received by the user terminal 200 via communication through the communication interface 206 described later, or it may be received by the user terminal 200 via the recording medium 201A described later.
[0030] The user terminal 200 is a terminal device used by the user. The user terminal 200 may be a terminal device placed in, for example, a golf lesson facility or shop, or it may be a terminal device owned by the user.
[0031] The user terminal 200 is, for example, a portable terminal device, i.e., a mobile terminal. A mobile terminal is, for example, a smartphone, a tablet device, or a laptop PC (Personal Computer). Alternatively, the user terminal 200 may be a stationary terminal device. A stationary terminal device is, for example, a desktop PC.
[0032] The user terminal 200 is connected to the information processing device 300 in a communicative manner via a predetermined communication line. The predetermined communication line includes, for example, a local area network (LAN). The predetermined communication line may also include a wide area network (WAN). The wide area network includes, for example, the internet. The wide area network may also include a mobile communication network with base stations as its endpoints or a satellite communication network utilizing communication satellites. The predetermined communication line may also include, for example, WiFi, Bluetooth®, or local 5G®. th This may include short-range communication lines using specified wireless communication standards such as Generation.
[0033] The user terminal 200 captures video images from the camera 100 representing a predetermined action performed by the user and transmits them to the information processing device 300. The user terminal 200 then presents the user with the analysis results of the predetermined action, which are returned by the information processing device 300, and also presents support information for improving the predetermined action based on the analysis results.
[0034] The information processing device 300 analyzes a predetermined action performed by the user based on a video image received from the user terminal 200 that represents the user's predetermined action. The information processing device 300 then returns the analysis result of the predetermined action performed by the user to the user terminal 200.
[0035] The information processing device 300 is, for example, a server device with relatively high processing power. The server device may be a cloud server, an on-premise server, or an edge server. Depending on the required processing power, the information processing device may also be a terminal device with lower processing power than the server device. The terminal device may be a stationary terminal device or a portable terminal device (mobile terminal).
[0036] [Configuration of the Information Processing System] In addition to Figure 1, the configuration of the information processing system 1 will be explained with reference to Figures 4 and 5.
[0037] <User terminal configuration> Figure 4 is a block diagram showing an example configuration of user terminal 200.
[0038] The functions of the user terminal 200 can be realized by any hardware or any combination of hardware and software. For example, as shown in Figure 4, the user terminal 200 includes an external interface 201, an auxiliary storage device 202, a memory device 203, a processor 204, a communication interface 206, an input device 207, a display device 208, and a sound output device 209. These components are connected by bus B2. Also, as described above, if the camera 100 is built into the user terminal 200, for example, the camera 100 is connected to bus B2, just like the other components.
[0039] The external interface 201 functions as an interface for reading data from and writing data to the recording medium 201A. The recording medium 201A includes, for example, flexible disks, CDs (Compact Discs), DVDs (Digital Versatile Discs), BDs (Blu-ray® Discs), SD memory cards, and USB (Universal Serial Bus) memory. This allows the user terminal 200 to read various data used in processing through the recording medium 201A, store it in the auxiliary storage device 202, and install programs that implement various functions.
[0040] Furthermore, the user terminal 200 may acquire various data and programs for processing from an external device (for example, an information processing device 300) via the communication interface 206.
[0041] The auxiliary storage device 202 stores various installed programs, as well as files and data necessary for various processes. The auxiliary storage device 202 includes, for example, an HDD (Hard Disk Drive), an SSD (Solid State Drive), or flash memory.
[0042] When a program startup command is received, the memory device 203 reads the program from the auxiliary storage device 202 and stores it. The memory device 203 includes, for example, DRAM (Dynamic Random Access Memory) or SRAM (Static Random Access Memory).
[0043] The processor 204 executes various programs loaded from the auxiliary storage device 202 into the memory device 203, and implements various functions related to the user terminal 200 according to the programs.
[0044] The processor 204 may include, for example, a CPU (Central Processing Unit). Alternatively, the processor 204 may also include, for example, a GPU (Graphics Processing Unit), an ASIC (Application Specific Integrated Circuit), or an FPGA (Field-Programmable Gate Array).
[0045] The communication interface 206 is used as an interface for communicating with external devices. This allows the user terminal 200 to acquire video data from the camera 100 through the communication interface 206. Furthermore, the user terminal 200 can communicate with external devices, such as an information processing device 300, through the communication interface 206. The communication interface 206 may also have multiple types of communication interfaces to suit the communication method between the connected devices.
[0046] The input device 207 receives various inputs from the user.
[0047] The input device 207 includes, for example, an input device that accepts mechanical input from a user (hereinafter referred to as "mechanical input device"). The mechanical input device includes, for example, buttons, toggles, levers, a touch panel implemented on the display device 208, a touchpad provided separately from the display device 208, a keyboard, a mouse, etc.
[0048] Furthermore, the input device 207 may include a voice input device capable of receiving voice input from the user. The voice input device may include, for example, a microphone capable of picking up the user's voice.
[0049] Furthermore, the input device 207 may include a gesture input device capable of receiving gesture input from the user. The gesture input device may include, for example, a camera capable of capturing images of the user's gestures.
[0050] Furthermore, the input device 207 may include a biometric input device capable of receiving biometric input from the user. The biometric input device may include, for example, a camera capable of acquiring image data containing information about the user's fingerprints or iris.
[0051] The display device 208 visually conveys information to the user by displaying information screens, operation screens, etc. The display device 208 is, for example, a liquid crystal display or an organic EL (electroluminescence) display.
[0052] The sound output device 209 conveys various information audibly to the user of the user terminal 200 by outputting a predetermined sound or voice. The sound output device 209 may be, for example, a buzzer, alarm, speaker, etc.
[0053] <Configuration of the information processing device> Figure 5 is a block diagram showing an example of the configuration of the information processing device 300.
[0054] The functions of the information processing device 300 can be realized by any hardware or any combination of hardware and software. For example, as shown in Figure 5, the information processing device 300 includes an external interface 301, an auxiliary storage device 302, a memory device 303, a processor 304, a communication interface 306, an input device 307, a display device 308, and an audio output device 309. These components are connected by bus B3.
[0055] The external interface 301 functions as an interface for reading data from and writing data to the recording medium 301A. The recording medium 301A includes, for example, a flexible disk, CD, DVD, BD, SD memory card, USB memory, etc. This allows the information processing device 300 to read various data used in processing through the recording medium 301A, store it in the auxiliary storage device 302, and install programs that realize various functions.
[0056] Furthermore, the information processing device 300 may acquire various data and programs for processing from external devices via the communication interface 306.
[0057] The auxiliary storage device 302 stores various installed programs, as well as files and data necessary for various processes. The auxiliary storage device 302 includes, for example, an HDD, SSD, or flash memory.
[0058] When a program startup command is received, the memory device 303 reads the program from the auxiliary storage device 302 and stores it. The memory device 303 includes, for example, DRAM or SRAM.
[0059] The processor 304 executes various programs loaded from the auxiliary storage device 302 into the memory device 303 and implements various functions related to the information processing device 300 according to the programs. The processor 304 includes, for example, a CPU. Alternatively, the processor 304 may include, for example, a GPU or an ASIC.
[0060] The communication interface 306 is used as an interface for connecting to external devices in a communicative manner. This allows the information processing device 300 to communicate with external devices, such as a user terminal 200, through the communication interface 306. Furthermore, the communication interface 306 may have multiple types of communication interfaces to suit the communication method between the connected devices.
[0061] The input device 307 receives various inputs from workers, etc.
[0062] The input device 307 includes, for example, a mechanical input device that accepts mechanical operation input from an operator or the like. The mechanical input device includes, for example, buttons, toggles, levers, a touch panel mounted on the display device 308, a touchpad, keyboard, mouse, etc., provided separately from the display device 308.
[0063] Furthermore, the input device 307 includes, for example, a voice input device capable of receiving voice input from a worker or the like. The voice input device includes, for example, a microphone capable of collecting the voice of a worker or the like.
[0064] Furthermore, the input device 307 includes, for example, a gesture input device capable of receiving gesture input from a worker or the like. The gesture input device includes, for example, a camera capable of capturing images of the worker's or the like's gestures.
[0065] Furthermore, the input device 307 includes, for example, a biometric input device capable of receiving biometric input from a worker or the like. The biometric input device includes, for example, a camera capable of acquiring image data containing information about the worker's fingerprints or iris.
[0066] The display device 308 visually conveys various information to workers by displaying information screens and operation screens. The display device 308 is, for example, a liquid crystal display or an organic EL display.
[0067] The sound output device 309 transmits various information by sound to operators of the information processing device 300. The sound output device 309 may be, for example, a buzzer, alarm, or speaker.
[0068] [Functional Configuration of Information Processing Systems] The functional configuration of the information processing system 1 will be explained with reference to Figures 6 to 11.
[0069] Figure 6 is a functional block diagram showing an example of the functional configuration of the information processing system 1. Figure 7 is a diagram illustrating an example of feature point data. Figures 8 to 11 show the first to fourth examples of coefficients α_j defined for each of the N feature points.
[0070] Specifically, Figure 7 is a diagram showing feature points representing the user's movements for each of multiple frames included in a video showing the user's golf swing motion. In Figure 7, the data 70 of feature points (black circles in the figure) representing the user's movements for each of multiple frames in a video showing the user's golf swing motion are schematically visualized. In Figure 7, the feature point data 71 to 78 for each frame of a portion of all frames (specifically, frames 21 to 28) of the video 20 in Figure 2 are shown as an excerpt. Data 71 represents the feature point data corresponding to frame 21, which shows the address state in the user's golf swing motion. Data 72 represents the feature point data corresponding to frame 22, which shows the take-back state in the user's golf swing motion. Data 73 represents the feature point data corresponding to frame 23, which shows the backswing state in the user's golf swing motion. Data 74 represents the feature point data corresponding to frame 24, which shows the top state in the user's golf swing motion. Data 75 represents the feature point data corresponding to frame 25, which shows the halfway down state in the user's golf swing motion. Data 76 represents feature point data corresponding to frame 26, which shows the impact state in the user's golf swing. Data 77 represents feature point data corresponding to frame 27, which shows the follow-through state in the user's golf swing. Data 78 represents feature point data corresponding to frame 28, which shows the finish state in the user's golf swing.
[0071] As shown in Figure 6, the user terminal 200 includes an application screen display processing unit 2001, a video data acquisition unit 2002, a video data transmission unit 2003, and a reply data acquisition unit 2004. These functions are realized, for example, when an application program (hereinafter simply referred to as "app") installed in the auxiliary storage device 202 is loaded into the memory device 203 and executed by the processor 204.
[0072] The application screen display processing unit 2001 displays the application-related screen (hereinafter referred to as the "application screen") on the display device 208.
[0073] The video data acquisition unit 2002 acquires video data from the camera 100 that represents a predetermined action performed by the user. For example, the video data acquisition unit 2002 acquires video data from the camera 100 that represents a predetermined action performed by the user, in response to user input on a predetermined application screen using the input device 207. In this case, the video data acquisition unit 2002 may acquire video data that has already been captured by the camera 100, or it may acquire the latest video data being acquired (captured) by the camera 100 in real time. The video data includes two-dimensional video data. In addition to the two-dimensional video data, the video data may also include information about the depth direction of the two-dimensional video.
[0074] The video data transmission unit 2003 transmits video data representing a predetermined action of the user, acquired by the video data acquisition unit 2002, to the information processing device 300 via the communication interface 206. For example, the video data transmission unit 2003 transmits video data acquired from the camera 100 to the information processing device 300 in response to user input on a predetermined application screen using the input device 207.
[0075] The reply data acquisition unit 2004 acquires reply data received from the information processing device 300, which includes the results of an analysis of a predetermined action performed by the user.
[0076] The application screen display processing unit 2001 displays the application screen, which includes the analysis results of a predetermined action performed by the user, as contained in the reply data acquired by the reply data acquisition unit 2004, on the display device 208. This allows the user to visually confirm the analysis results of their predetermined action.
[0077] Furthermore, the application screen display processing unit 2001 may display an application screen on the display device 208 that includes support information for improving a predetermined action based on the analysis results of the user's predetermined action. This allows the user to easily understand how to improve their predetermined action.
[0078] As shown in Figure 6, the information processing device 300 includes a moving image data acquisition unit 3001, a feature point detection unit 3002, an event timing acquisition unit 3003, a data loss detection unit 3004, a detection accuracy determination unit 3005, a motion analysis unit 3006, and a transmission unit 3007. These functions are realized, for example, by loading a program installed in the auxiliary storage device 302 into the memory device 303 and executing it with the processor 304. The information processing device 300 also includes a moving image data storage unit 3001A, a feature point data storage unit 3002A, an event timing information storage unit 3003A, and a data loss information storage unit 3004A. These functions are realized, for example, by loading a program installed in the auxiliary storage device 302 into the memory device 303 and executing it with the processor 304. In addition, these functions are realized by storage areas defined in the auxiliary storage device 302, the memory device 303, etc.
[0079] The video data acquisition unit 3001 acquires video data that represents a predetermined action performed by the user, which is received from the user terminal 200.
[0080] The video data storage unit 3001A stores video data acquired by the video data acquisition unit 3001.
[0081] The feature point detection unit 3002 uses known image processing techniques to extract (detect) multiple feature points related to predetermined actions of the user of the subject in each frame of the video image stored in the video image data storage unit 3001A. As a result, the feature point detection unit 3002 can obtain time-series data (hereinafter referred to as "2D feature point data") that represents the multiple feature points for each frame contained in the video image in two dimensions. Known image processing techniques include, for example, image recognition techniques for detecting (estimating) skeletal information of people and objects in an image, such as so-called bone detection and bone estimation.
[0082] Furthermore, if the video data storage unit 3001A stores video data viewed from multiple different viewpoints, the feature point detection unit 3002 may acquire 2D feature point data for each of the multiple video images corresponding to the multiple viewpoints.
[0083] For example, as shown in Figure 7, the feature point detection unit 3002 detects multiple (N) feature points representing the swinging motion of the subject user for every frame of the video.
[0084] The N feature points include, for example, feature points (physical feature points) that represent the body parts of the user in the frame image, as described above. This allows the feature point detection unit 3002 to acquire two-dimensional feature point data that represents the position and posture angle of the user's body parts in two dimensions.
[0085] Furthermore, the N feature points may include feature points representing parts of the tool held by the user. For example, as shown in Figure 7, feature points corresponding to the head of the club held by the user are extracted. This allows the feature point detection unit 3002 to acquire two-dimensional feature point data that represents the position and orientation angle of the parts of the tool held by the user in two dimensions.
[0086] Furthermore, the N feature points may include feature points that represent the target ball that is launched by the striking tool.
[0087] Furthermore, the feature point detection unit 3002 may adjust the coordinate system and scale of the coordinate system for the 2D feature point data. For example, the feature point detection unit 3002 acquires 2D feature point data based on the person being photographed (i.e., the user) and assuming a coordinate system that aligns with the scale of the real world.
[0088] The event timing acquisition unit 3003 acquires the timing of a predetermined event within the time series of the 2D feature point data, based on the 2D feature point data acquired by the feature point detection unit 3002. If the feature point detection unit 3002 acquires 2D feature point data from multiple viewpoints, the event timing acquisition unit 3003 acquires the timing of a predetermined event within the time series of the target 2D feature point data for each of the multiple 2D feature point data.
[0089] A predetermined event is a predetermined event in a predetermined action. For example, a predetermined event may be an event representing the action state corresponding to the start or end of a predetermined action in all frames of a video, an event representing the action state corresponding to the start timing of each action section when a predetermined action is divided into multiple action sections, or an event representing a specific action state in each action section. Specifically, for example, predetermined events in a golf swing motion may include address, take-back, backswing, top, halfway down, impact, and follow-through. The predetermined event for which timing is acquired within the time series of 2D feature point data may be one or multiple.
[0090] The event timing acquisition unit 3003 acquires the timing of a predetermined event by, for example, estimating the data for a single time point corresponding to a predetermined event from the data for each time point included in the two-dimensional feature point data (hereinafter, for convenience, referred to as "two-dimensional time point data"). In this case, the event timing acquisition unit 3003 selectively estimates the timing of a predetermined event from a discrete time system defined by each two-dimensional time point data included in the two-dimensional feature point data. Alternatively, the event timing acquisition unit 3003 may treat the time series of the two-dimensional feature point data as a continuous time system and estimate the timing of a predetermined event within that system. In this case, the timing of a predetermined event may be estimated as the timing between each time point corresponding to adjacent two-dimensional time point data.
[0091] The event timing acquisition unit 3003 can acquire the timing of a predetermined event within the time series of two-dimensional feature point data using a known method.
[0092] For example, the event timing acquisition unit 3003 acquires the timing of a predetermined event by estimating it based on the similarity between the two-dimensional feature point data at each point in time and the reference two-dimensional feature point data (reference operation data) corresponding to the predetermined event (see Japanese Patent Application Publication No. 2024-108340).
[0093] Furthermore, the event timing acquisition unit 3003 may estimate the timing of a predetermined event based on whether a physical quantity representing the operating state of a specific feature point, obtained from each 2D time point data included in the 2D feature point data, satisfies the conditions corresponding to a predetermined event. For example, at the top of a golf swing, the speed of the golf club head becomes zero. Therefore, the event timing acquisition unit 3003 can acquire the top event timing by extracting 2D time point data from each 2D time point data included in the 2D feature point data that satisfies the conditions under which it can be determined that the speed of the feature point corresponding to the club head is zero.
[0094] Furthermore, the operator may use the input device 307 to input a frame corresponding to a predetermined event from among all the frames of the video while viewing the video displayed on the display device 308. In this case, the event timing acquisition unit 3003 can acquire the timing of a predetermined event in response to the predetermined input from the operator via the input device 307.
[0095] Furthermore, the user may use the input device 207 to input a frame corresponding to a predetermined event from among all the frames of the video while viewing the video displayed on the display device 208. In this case, the event timing acquisition unit 3003 can acquire the timing of a predetermined event in response to the predetermined input from the user via the input device 207.
[0096] The event timing information storage unit 3003A stores information (event timing information) that represents the timing of a predetermined event acquired by the event timing acquisition unit 3003.
[0097] The data loss detection unit 3004 detects data loss of feature points in the 2D feature point data stored in the feature point data storage unit 3002A. Data loss of feature points means, for example, that for a given frame, the feature point detection unit 3002 is unable to detect some feature points for some reason, resulting in fewer than N detected feature points. Specifically, the data loss detection unit 3004 detects undetected feature points out of N feature points for every frame included in the 2D feature point data. Then, for each of the N feature points, the data loss detection unit 3004 aggregates the time in the time series of the 2D feature point data during which the feature points were not detected (hereinafter simply referred to as "undetected time"). The undetected time is aggregated, for example, as the number of frames in which the feature points were not detected. Alternatively, the undetected time may be calculated by multiplying the number of frames in which the feature points were not detected by the frame rate.
[0098] The data loss information storage unit 3004A stores information (data loss information) regarding data loss at feature points detected by the data loss detection unit 3004. The feature point data loss information includes information representing the detection status of undetected feature points in each frame of the 2D feature point data, and aggregated information on the undetected time for each of the N feature points.
[0099] The detection accuracy determination unit 3005 makes a determination regarding the detection accuracy of feature points in the two-dimensional feature point data based on the data loss information stored in the data loss information storage unit 3004A.
[0100] The determination of detection accuracy includes determining whether the detection accuracy is high or low (i.e., good or bad). Determining whether the detection accuracy is good or bad corresponds to determining whether the detection accuracy is above a predetermined standard or exceeds a predetermined standard. Furthermore, the determination of detection accuracy may include determining the detection accuracy level in three or more stages. Determining the detection accuracy level corresponds to determining the degree of high detection accuracy.
[0101] Furthermore, the two-stage determination of detection accuracy levels is essentially the same as determining whether the detection accuracy is high or low (good or bad).
[0102] The following explanation will focus on the case where the detection accuracy determination unit 3005 determines whether the detection accuracy is high or low (good or bad).
[0103] The detection accuracy determination unit 3005 makes a determination regarding the detection accuracy of N feature points based on an index value (hereinafter referred to as the "undetected time index value") TI_j representing the undetected time t_j (j=1,...N) for each of the N feature points, and information defining the degree to which the undetected time t_j for each of the N feature points has an influence on the determination result. The code j is a unique number for identifying the type of each of the N feature points.
[0104] For example, the undetected time index TI_j is the undetected time t_j for each of the N feature points, as shown in equation (1) below.
[0105] TI_j=t_j(j=1,...,N) ···(1)
[0106] Furthermore, the undetected time index TI_j may be the value obtained by dividing the undetected time t_j for each of the N feature points by the total time Tttl of the video from which the feature points were detected, as shown in equation (2) below.
[0107] TI_j=t_j / Tttl(j=1,...,N) ···(2)
[0108] As a result, the detection accuracy determination unit 3005 can make a determination regarding detection accuracy based on the undetected time index value TI_j, which represents the length of the undetected time t_j based on the total time Tttl of the video. Therefore, for example, when the undetected time t_j is aggregated by the number of frames, the detection accuracy determination unit 3005 becomes less susceptible to the influence of the frame rate of the video from which the feature points were detected. Also, for example, the detection accuracy determination unit 3005 becomes less susceptible to the influence of differences in the execution time of a predetermined operation depending on the person. Thus, the detection accuracy determination unit 3005 can make a more appropriate determination regarding detection accuracy.
[0109] Furthermore, the undetected time index value TI_j may be the value obtained by dividing the undetected time t_j for each of the N feature points by the execution time Top of a predetermined operation in the video image from which the feature points were detected, as shown in equation (3) below.
[0110] TI_j=t_j / Top(j=1,...,N) ···(3)
[0111] The execution time Top of a predetermined operation is automatically obtained, for example, based on event timing information for a first event corresponding to the start timing of the predetermined operation and event timing information for a second event corresponding to the end timing of the predetermined operation, both of which are stored in the event timing information storage unit 3003A. For example, if the predetermined operation is a golf swing, the first event corresponds to the address state and the second event corresponds to the finish state.
[0112] As a result, the detection accuracy determination unit 3005 can make a determination regarding detection accuracy based on an undetected time index value TI_j, which represents the length of the undetected time t_j based on the length of the execution time Top of a predetermined action in the video. Therefore, for example, the detection accuracy determination unit 3005 becomes less susceptible to the undetected time of the portion of the 2D feature data that does not include the predetermined action. Thus, the detection accuracy determination unit 3005 can make a more appropriate determination regarding detection accuracy.
[0113] For example, the information that defines the degree to which the undetected time t_j for each of the N feature points influences the judgment result is a coefficient α_j defined for each of the N feature points.
[0114] In this case, for example, the detection accuracy determination unit 3005 makes a determination regarding detection accuracy based on whether the determination value JV_j (=α_j·TI_j), obtained by multiplying the undetected time index value TI_j for each of the N feature points by the coefficient α_j corresponding to the target feature point, is greater than or equal to a certain threshold JVth, or greater than the threshold JVth. For example, if the detection accuracy determination unit 3005 finds that the determination value JV_j is greater than or equal to the threshold JVth, or greater than the threshold JVth, for at least one of the N feature points, it determines that the detection accuracy is low (bad). On the other hand, if the above condition is not found for all of the N feature points, the detection accuracy determination unit 3005 determines that the detection accuracy is high (good). Furthermore, by providing multiple thresholds JVth in stages, the detection accuracy determination unit 3005 can determine the level of detection accuracy in a similar manner.
[0115] Furthermore, for example, the detection accuracy determination unit 3005 determines that the detection accuracy is low (poor) if the sum of the values obtained by multiplying the undetected time index value TI_j for each of the N feature points by the coefficient α_j corresponding to the target feature point j, JVttl (=Σα_j·TI_j), is equal to or greater than a certain threshold JVttl_th, or greater than the threshold JVttl_th. On the other hand, the detection accuracy determination unit 3005 determines that the detection accuracy is high (poor) if the above conditions are not met. Moreover, by providing multiple thresholds JVttl_th in stages, the detection accuracy determination unit 3005 can determine the level of detection accuracy in a similar manner.
[0116] Furthermore, for example, the detection accuracy determination unit 3005 determines that the detection accuracy is low (poor) if the average value JVave (=Σα_j·TI_j / N), which is obtained by multiplying the undetected time index value TI_j for each of the N feature points by the coefficient α_j corresponding to the target feature point j, is greater than or equal to a certain threshold JVave_th, or greater than the threshold JVave_th. On the other hand, the detection accuracy determination unit 3005 determines that the detection accuracy is high (good) if the above conditions are not met. In addition, by providing multiple thresholds JVave_th in stages, the detection accuracy determination unit 3005 can determine the level of detection accuracy in a similar manner.
[0117] This allows the coefficient α_j to be independently adjusted among the N feature points, thereby varying the degree to which the undetected time t_j for each of the N feature points influences the judgment result. Specifically, the larger the coefficient α_j, the greater the influence of the undetected time t_j, and as a result, an increase in the undetected time t_j makes it easier for the detection accuracy to be lower (worse). On the other hand, the smaller the coefficient α_j, the smaller the influence of the undetected time t_j, and as a result, an increase in the undetected time t_j makes it less likely for the detection accuracy to be lower (worse).
[0118] Furthermore, the information that defines the degree to which the undetected time t_j for each of the N feature points has an impact on the judgment result is the threshold TIth_j defined for the undetected time index value TI_j for each of the N feature points.
[0119] In this case, for example, the detection accuracy determination unit 3005 makes a determination regarding detection accuracy based on whether the undetected time index value TI_j for each of the N feature points is greater than or equal to the corresponding threshold TIth_j, or greater than the threshold TIth_j. For example, if the undetected time index value TI_j is greater than or equal to the corresponding threshold TIth_j, or greater than the threshold TIth_j, the detection accuracy determination unit 3005 determines that the detection accuracy is low (bad). On the other hand, if the above condition is not met for all of the N feature points, the detection accuracy determination unit 3005 determines that the detection accuracy is high (good). Furthermore, by providing multiple thresholds TIth_j in stages, the detection accuracy determination unit 3005 can determine the level of detection accuracy in a similar manner.
[0120] This allows the threshold TIth_j to be independently adjusted among the N feature points, thereby varying the degree to which the undetected time t_j for each of the N feature points influences the judgment result. Specifically, the smaller the threshold TIth_j, the greater the influence of the undetected time t_j, and as a result, an increase in the undetected time t_j makes it easier for the detection accuracy to be lower (worse). On the other hand, the larger the threshold TIth_j, the smaller the influence of the undetected time t_j, and as a result, an increase in the undetected time t_j makes it less likely for the detection accuracy to be lower (worse).
[0121] For example, as shown in Figure 8, in this example, a coefficient α_j is set for each of the seven feature points (N=7). In this example, the coefficients α_j corresponding to the feature points of the head, shoulder, elbow, and wrist are set to a relatively small value (=0.5), while the coefficients α_j corresponding to the hip, knee, and ankle are set to a relatively large value (=1). This makes it possible to weaken the influence of the undetected time index value TI_j of the feature points of the head, shoulder, elbow, and wrist on the judgment result compared to the feature points of the hip, knee, and ankle. Therefore, for example, if the given motion is a golf swing, the shoulder, elbow, and wrist may be in a position where they are hidden by other body parts due to the characteristics of the swing motion, and it is possible to suppress situations where the detection accuracy is judged to be low due to such situations. Also, for example, if the given motion is a golf swing, the head may be in a position where it is hidden by the arms, etc., and it is possible to suppress situations where the detection accuracy is judged to be low due to such situations. Therefore, the detection accuracy determination unit 3005 can more appropriately determine the detection accuracy by suppressing situations where the detection accuracy is judged to be low due to the influence of the undetected time index value TI_j of characteristic points of body parts that are inevitably prone to being undetected.
[0122] Furthermore, the coefficient α_j or the threshold TIth_j may be varied depending on various conditions.
[0123] For example, as shown in Figure 9, in this example, a coefficient α_j is set for each of the 11 feature points (N=11). In this example, when the predetermined action is a golf swing, the coefficient α_j is varied according to the dominant hand and the direction in which the video is captured. Depending on the dominant hand (right-handed or left-handed), the types of feature points of body parts that are prone to being undetected change according to the swing motion in the video. Therefore, in accordance with the change in the types of feature points of body parts that are prone to being undetected according to the dominant hand, the coefficient α_j corresponding to the feature points of body parts that are prone to being undetected can be varied to be relatively small. Similarly, depending on the direction in which the video is captured, specifically whether the subject is filmed from the front or from behind, the types of feature points of body parts that are prone to being undetected change. Therefore, in accordance with the change in the types of feature points of body parts that are prone to being undetected according to the difference in the direction in which the video is captured, the coefficient α_j corresponding to the feature points of body parts that are prone to being undetected can be varied to be relatively small. Therefore, the detection accuracy determination unit 3005 can more appropriately determine the detection accuracy by suppressing situations where the detection accuracy is judged to be low due to the influence of the undetected time index value TI_j of characteristic points of body parts that are inevitably prone to being undetected.
[0124] Furthermore, as shown in Figure 10, for example, in this example, a coefficient α_j is set for every 7 feature points (N=7), similar to the case in Figure 8. In this example, N feature points can be detected from video footage for different types of predetermined movements, and the coefficient α_j is varied depending on the type of predetermined movement. Specifically, the coefficient α_j is varied between the case where the predetermined movement is a golf swing and the case where it is a golf putting motion. Similar to the case in Figure 8, in the case of a golf swing, the coefficient α_j corresponding to the feature points of the head, shoulder, elbow, and wrist is set to a relatively small value (=0.5), while the coefficient α_j corresponding to the hip, knee, and ankle is set to a relatively large value (=1). On the other hand, in the case of a golf putting motion, the coefficient α_j for all feature points is set to a relatively large value (=1). This is because, compared to a swing motion, the amount of movement of feature points and the amount of change in posture are very small in a putting motion, so there are no feature points that are likely to be undetected.
[0125] Furthermore, as shown in Figure 11, for example, in this example, a coefficient α_j is set for every seven feature points (N=7), similar to the cases in Figures 8 and 10. In this example, the coefficient α_j is varied according to multiple motion phases defined by a predetermined action. Specifically, the predetermined action is a golf swing, and the coefficient α_j is varied between the motion phases from the address state to the top state, the motion phases from the top state to the follow-through state, and the motion phases from the follow-through state to the finish state. This is because the posture of body parts differs greatly in each motion phase, and as a result, the types of feature points that are likely to be hidden by other body parts and thus undetected change with each motion phase.
[0126] Furthermore, the detection accuracy determination unit 3005 may vary the determination conditions for the detection accuracy of feature points depending on the shooting conditions of the video data from which the feature points are detected. The shooting conditions of the video data may include, for example, frame rate, darkness, and resolution. Specifically, the detection accuracy determination unit 3005 may vary parameters related to the determination conditions, such as the coefficient α_j and the thresholds JVth, JVttl_th, JVave_th, TIth_j, depending on the shooting conditions of the video data from which the feature points are detected.
[0127] Returning to Figure 6, the motion analysis unit 3006 performs an analysis of a predetermined operation by the user if the detection accuracy determination unit 3005 determines that the detection accuracy of the feature points is sufficiently high (for example, assuming that the high or low detection accuracy of the feature points is determined, if it is determined that the detection accuracy of the feature points is high). For example, the motion analysis unit 3006 uses a known method to analyze the operation of one or more specific operation phases from the start to the end of a predetermined operation, based on the two-dimensional feature point data and event timing information.
[0128] The transmission unit 3007 transmits the reply data, which includes the analysis results of a predetermined user action output from the motion analysis unit 3006, to the user terminal 200 via the communication interface 306. This allows the user terminal 200 to present the user with the analysis results of the predetermined action, or to present support information to improve the user's predetermined action based on the analysis results.
[0129] Furthermore, if the detection accuracy determination unit 3005 determines that the detection accuracy of feature points is low to a certain extent (for example, assuming that the high or low detection accuracy of feature points is determined, if it is determined that the detection accuracy of feature points is low), the transmission unit 3007 sends a notification to the user terminal 200 via the communication interface 306 that motion analysis by the motion analysis unit 3006 is not possible (hereinafter referred to as "analysis impossible notification"). This allows the user terminal 200 to notify the user, for example, via the display device 208, that motion analysis based on video data uploaded to the information processing device 300 by user input is not possible.
[0130] [First example of how an information processing system works] Referring to Figure 12, a first example of the operation of the information processing system 1 according to this embodiment will be described.
[0131] Figure 12 is a sequence diagram schematically showing a first example of the operation of the information processing system 1.
[0132] As shown in Figure 12, the user terminal 200 launches an application in response to a predetermined input from the user using the input device 207 (step S102).
[0133] After the completion of the process in step S102, the video data acquisition unit 2002 acquires video data from the camera 100 that represents a predetermined action of the user in response to a predetermined input from the user on a predetermined application screen using the input device 207 (step S104).
[0134] After the completion of the process in step S104, the video data acquisition unit 2002 trims the unnecessary portion of the total video data time in response to a predetermined input from the user on a predetermined application screen using the input device 207 (step S106).
[0135] This allows the user to operate the user terminal 200 to trim out parts of the video data that are unrelated to the predetermined action, and to format the data so that it contains only the footage of the predetermined action in progress.
[0136] After the completion of the process in step S106, the video data transmission unit 2003 transmits the trimmed video data to the information processing device 300 via the communication interface 206 (step S108).
[0137] The video data acquisition unit 3001 acquires video data transmitted (uploaded) from the user terminal 200 in the process of step S108 (step S110).
[0138] After the completion of the process in step S110, the feature point detection unit 3002 obtains two-dimensional feature point data by extracting N feature points that represent a predetermined user action in each frame of the video data (step S112).
[0139] After the completion of the process in step S112, the event timing acquisition unit 3003 acquires the timing of events for the 2D feature point data (step S114).
[0140] After the completion of the process in step S114, the data loss detection unit 3004 detects undetected feature points out of N feature points for every frame of the 2D feature point data and aggregates the undetected time t_j for each of the N feature points (step S116).
[0141] After the completion of the process in step S116, the detection accuracy determination unit 3005 determines the accuracy of feature point detection using the undetected time index value TI_j, which is obtained by dividing the undetected time t_j by the total time Tttl of the video (step S118).
[0142] If it is determined in step S118 that the accuracy of feature point detection is high, the motion analysis unit 3006 performs an analysis of a predetermined action by the user based on the 2D feature point data and event timing information (step S120).
[0143] After the completion of the process in step S120, the transmission unit 3007 sends the reply data, including the analysis results, to the user terminal 200 via the communication interface 306 (step S122).
[0144] The reply data acquisition unit 2004 acquires the reply data, including the analysis results, that was sent from the information processing device 300 in step S122 (step S124).
[0145] After the completion of the process in step S124, the application screen display processing unit 2001 displays a predetermined application screen containing the analysis results included in the reply data on the display device 208 (step S126).
[0146] On the other hand, if it is determined in step S118 that the accuracy of feature point detection is low, the transmission unit 3007 sends reply data including a notification of inability to analyze to the user terminal 200 via the communication interface 306 (step S128).
[0147] The reply data acquisition unit 2004 acquires the reply data, including the notification of inability to parse, that was sent from the information processing device 300 in step S128 (step S130).
[0148] After the completion of the process in step S130, the application screen display processing unit 2001 displays a predetermined application screen, including the notification of inability to parse, which is included in the reply data, on the display device 208 (step S132).
[0149] [Second example of information processing system operation] Referring to Figure 13, a second example of the operation of the information processing system 1 according to this embodiment will be described.
[0150] Figure 13 is a sequence diagram showing a second example of the operation of the information processing system 1.
[0151] As shown in Figure 13, steps S202 and S204 are the same as steps S102 and S104 in Figure 12, so their explanation is omitted.
[0152] After the completion of the process in step S204, the video data acquisition unit 2002 receives additional information, including the dominant hand of the user shown in the video and the direction in which the video was taken, in response to predetermined input from the user on a predetermined application screen using the input device 207 (step S206).
[0153] This allows the user to input additional information using the input device 207 when capturing video data on the user terminal 200.
[0154] After the completion of the process in step S206, the video data transmission unit 2003 transmits the video data, including additional information, to the information processing device 300 via the communication interface 206 (step S208).
[0155] The video data acquisition unit 3001 acquires video data including additional information that was transmitted (uploaded) from the user terminal 200 in step S208 (step S210).
[0156] Steps S212 and S214 are the same as steps S112 and S114 in Figure 12, so their explanation is omitted.
[0157] After step S214 is completed, the data loss detection unit 3004 identifies the period during which a predetermined operation is being performed based on the event timing information in the event timing information storage unit 3003A (step S216).
[0158] After step S216 is completed, the data loss detection unit 3004 detects undetected feature points among the N feature points for every frame of the 2D feature point data for the period during which a predetermined operation is being performed, and aggregates the undetected time t_j for each of the N feature points (step S218).
[0159] After step S218 is completed, the detection accuracy determination unit 3005 determines the accuracy of feature point detection using the undetected time index value TI_j, which is obtained by dividing the undetected time t_j by the execution time Top of a predetermined operation in the video (step S220).
[0160] In this case, the detection accuracy determination unit 3005 can refer to the above-mentioned additional information and vary the coefficient α_j according to the dominant hand of the user shown in the video and the direction in which the video is shot.
[0161] Steps S222 and beyond are the same as the processes from step S120 onwards in Figure 12, so the explanation is omitted.
[0162] [Third example of how an information processing system operates] Referring to Figure 14, a third example of the operation of the information processing system 1 according to this embodiment will be described.
[0163] As shown in Figure 14, steps S302, S304, and S306 are the same as steps S102, S104, and S106 in Figure 12, so their explanation is omitted.
[0164] After the completion of the process in step S306, the video data acquisition unit 2002 receives additional information, including the type of predetermined action shown in the video (for example, whether it is a golf swing or putting action in Figure 10), in response to predetermined input from the user on a predetermined application screen using the input device 207 (step S308).
[0165] Steps S310 and S312 are the same as the processes in steps S208 and S210 in Figure 13, so their explanation is omitted.
[0166] Steps S314, S316, and S318 are the same as steps S112, S114, and S116 in Figure 12, so their explanation is omitted.
[0167] After the processing in step S318, the detection accuracy determination unit 3005 determines the accuracy of feature point detection using the undetected time index value TI_j, which is obtained by dividing the undetected time t_j by the total time Tttl of the video (step S320).
[0168] In this case, the detection accuracy determination unit 3005 can refer to the above-mentioned additional information and vary the coefficient α_j according to the type of predetermined action of the user shown in the video.
[0169] Steps S322 and beyond are the same as the processes from step S120 onwards in Figure 12, so the explanation is omitted.
[0170] [Other embodiments] Other embodiments will be described.
[0171] The embodiments described above may be modified or altered as appropriate. Hereinafter, examples of modifications or alterations to the embodiments described above will be referred to as "modified versions" for convenience.
[0172] For example, in the above-described embodiment, the functions of the user terminal 200 and the information processing device 300 may be implemented by a single information processing device, or they may be implemented in a distributed manner by three or more information processing devices. For example, the functions of the information processing device 300 may be transferred to the user terminal 200.
[0173] Furthermore, in the embodiments and their modifications described above, the event timing acquisition unit 3003 may estimate the timing of a predetermined event based on measurement data from a motion sensor attached to the user's body when the video data was acquired. For example, the event timing acquisition unit 3003 estimates the timing of a predetermined event based on whether a physical quantity (e.g., velocity or acceleration) representing the operating state of a specific part of the user's body, obtained based on the output of the motion sensor, satisfies the conditions corresponding to the predetermined event.
[0174] [Effect] The operation of the information processing method, program, and information processing device according to this embodiment will be described.
[0175] In a first aspect of this embodiment, an information processing method is provided that includes an acquisition step, a detection step, an aggregation step, and a determination step. The acquisition step is, for example, steps S110, S210, and S312 described above. The detection step is, for example, steps S112, S212, and S314 described above. The aggregation step is, for example, steps S116, S218, and S318 described above. The determination step is, for example, steps S118, S220, and S320 described above. Specifically, in the acquisition step, the information processing device acquires a video image showing a predetermined action of a subject. The video image is, for example, the video images 20 and 30 described above. In the detection step, the information processing device detects two or more predetermined number of feature points representing the predetermined action for each of the multiple frames included in the video image. The predetermined number is, for example, N as described above. In the aggregation step, the information processing device aggregates the undetected time in the time series of the multiple frames for each of the predetermined number of feature points. The aforementioned undetected time is, for example, the undetected time t_j described above. In the determination step, the information processing device makes a determination regarding the detection accuracy of the predetermined number of feature points based on an index value representing the undetected time for each predetermined number of feature points and information defining the degree to which the undetected time for each predetermined number of feature points has an influence on the determination result. The information processing device is, for example, the user terminal 200 or the information processing device 300 described above. The index value is, for example, the undetected time index value TI_j described above. The information defining the degree of influence is, for example, the coefficient α_j or threshold TIth_j described above.
[0176] Furthermore, in the first aspect of this embodiment, a program may be provided that causes the information processing device to perform the acquisition step, the detection step, the aggregation step, and the determination step.
[0177] Furthermore, in the first aspect of this embodiment, an information processing device may be provided that includes an acquisition unit, a detection unit, an aggregation unit, and a determination unit. The information processing device is, for example, the information processing device 300 described above. Alternatively, the information processing device may be, for example, the user terminal 200 described above. The acquisition unit is, for example, the video image data acquisition unit 3001 described above. The detection unit is, for example, the feature point detection unit 3002 described above. The aggregation unit is, for example, the data loss detection unit 3004 described above. The determination unit is, for example, the detection accuracy determination unit 3005 described above. Specifically, the acquisition unit acquires video images showing a predetermined action of a subject. The detection unit detects two or more predetermined number of feature points representing the predetermined action for each of the multiple frames included in the video image. The aggregation unit aggregates the undetected time in the time series of the multiple frames for each of the predetermined number of feature points. The determination unit then makes a determination regarding the detection accuracy of the predetermined number of feature points based on an index value representing the undetected time for each predetermined number of feature points and information defining the degree to which the undetected time for each predetermined number of feature points has an influence on the determination result.
[0178] This allows the information processing device to determine the accuracy of feature point detection when detecting feature points representing a predetermined action from a moving image. Furthermore, by reflecting the likelihood or difficulty of each predetermined feature point being undetected in information that defines the degree to which the undetection time for each predetermined number of feature points influences the determination result, the information processing device can more appropriately determine the accuracy of feature point detection.
[0179] Furthermore, in a second aspect of this embodiment, based on the first aspect described above, the information defining the degree of influence may be a coefficient set for each predetermined feature point. The coefficient is, for example, the coefficient α_j described above. Then, in the determination step, the index value for each predetermined number of feature points may be multiplied by the coefficient corresponding to the target feature point, and then the determination regarding the detection accuracy may be made.
[0180] This allows the information processing device to more appropriately determine the detection accuracy of feature points by reflecting the likelihood or difficulty of each predetermined feature point being undetected in a coefficient.
[0181] Furthermore, in a third aspect of this embodiment, based on the second aspect described above, the determination step may be performed based on the values obtained by multiplying the index value for each predetermined number of feature points by the coefficient corresponding to the target feature point.
[0182] This allows the information processing device to make more accurate judgments regarding the detection accuracy of feature points.
[0183] Furthermore, in a fourth aspect of this embodiment, based on the second aspect described above, the determination step may be performed based on the sum or average value of the values obtained by multiplying the index value for each predetermined number of feature points by the coefficient corresponding to the target feature point. The sum is, for example, the sum JVttl described above. The average value is, for example, the average value JVave described above.
[0184] This allows the information processing device to make more accurate judgments regarding the detection accuracy of feature points.
[0185] Furthermore, in the fifth aspect of this embodiment, the predetermined operation may be of multiple types, based on any one of the first to fourth aspects described above. The information defining the degree of influence may differ depending on the type of predetermined operation.
[0186] This allows the information processing device to make more accurate judgments regarding the detection accuracy of feature points.
[0187] Furthermore, in the sixth aspect of this embodiment, based on any one of the first to fifth aspects described above, the information defining the degree of influence may differ depending on the dominant hand of the subject.
[0188] This allows the information processing device to make more accurate judgments regarding the detection accuracy of feature points.
[0189] Furthermore, in the seventh aspect of this embodiment, based on any one of the first to sixth aspects described above, the information defining the degree of influence may differ depending on the direction in which the video is captured relative to the subject.
[0190] This allows the information processing device to make more accurate judgments regarding the detection accuracy of feature points.
[0191] Furthermore, in the eighth aspect of this embodiment, the predetermined operation may have multiple operation phases, based on any one of the first to seventh aspects described above. The information defining the degree of influence may differ for each of the multiple operation phases.
[0192] This allows the information processing device to make more accurate judgments regarding the detection accuracy of feature points.
[0193] Furthermore, in the ninth aspect of this embodiment, assuming any one of the first to eighth aspects described above, the determination conditions in the determination step may differ depending on the shooting conditions of the moving image.
[0194] This allows the information processing device to make more accurate judgments regarding the detection accuracy of feature points.
[0195] Furthermore, in the tenth aspect of this embodiment, the information processing method may include a notification step, based on any one of the first to ninth aspects described above. The notification step is, for example, steps S132, S234, and S334 described above. Specifically, in the notification step, the information processing device may notify the user of information regarding the determination result of the determination step.
[0196] This allows users to check the results regarding the accuracy of feature point detection.
[0197] Although embodiments have been described in detail above, this disclosure is not limited to these specific embodiments, and various modifications and changes are possible within the scope of the gist described in the claims. [Explanation of Symbols]
[0198] 1. Information Processing System 20 Videos 30 Videos 100 Cameras 200 user terminals 300 Information Processing Devices 2001 Application screen display processing unit 2002 Moving Image Data Acquisition Unit 2003 Motion Image Data Transmission Unit 2004 Reply Data Acquisition Unit 3001 Moving Image Data Acquisition Unit 3001A Moving Image Data Storage Unit 3002 Feature Point Detection Unit 3002A Feature Point Data Storage Unit 3003 Event Timing Acquisition Unit 3003A Event Timing Information Storage Unit 3004 Data Loss Detection Unit 3004A Data Loss Information Storage Unit 3005 Detection accuracy determination unit 3006 Motion analysis section 3007 Transmitter
Claims
1. The information processing device acquires a video image showing a predetermined action of the subject, The information processing device performs a detection step in which it detects two or more predetermined number of feature points representing the predetermined operation for each of the multiple frames included in the video, The information processing device performs an aggregation step of aggregating the undetected time in the time series of the plurality of frames for each predetermined number of feature points, The information processing device includes a determination step in which it makes a determination regarding the detection accuracy of a predetermined number of feature points based on an index value representing the undetected time for each predetermined number of feature points and information defining the degree to which the undetected time for each predetermined number of feature points has an influence on the determination result, Information processing methods.
2. The information defining the degree of influence is a coefficient set for each predetermined feature point. In the determination step, the index value for each predetermined number of feature points is multiplied by the coefficient corresponding to the target feature point, and then a determination is made regarding the detection accuracy. The information processing method according to claim 1.
3. In the determination step, a determination regarding the detection accuracy is made based on the values obtained by multiplying the index value for each predetermined number of feature points by the coefficient corresponding to the target feature point. The information processing method according to claim 2.
4. In the determination step, the determination regarding the detection accuracy is made based on the sum or average value of the values obtained by multiplying the index value for each predetermined number of feature points by the coefficient corresponding to the target feature point. The information processing method according to claim 2.
5. The aforementioned predetermined operation can be of multiple types. The information defining the degree of influence differs depending on the type of predetermined operation. The information processing method according to any one of claims 1 to 4.
6. The information defining the degree of influence differs depending on the dominant hand of the subject. The information processing method according to any one of claims 1 to 4.
7. The information defining the degree of influence differs depending on the direction in which the video is captured for the subject. The information processing method according to any one of claims 1 to 4.
8. The aforementioned predetermined operation has multiple operation phases, The information defining the degree of impact differs for each of the multiple operational phases. The information processing method according to any one of claims 1 to 4.
9. The determination conditions for the determination step vary depending on the conditions under which the video is captured. The information processing method according to any one of claims 1 to 4.
10. The information processing device includes a notification step in which it notifies the user of information regarding the determination result of the determination step. The information processing method according to any one of claims 1 to 4.
11. In an information processing device, The acquisition step involves obtaining video footage showing a specific action of the subject, A detection step of detecting two or more predetermined number of feature points representing the predetermined action for each of the multiple frames included in the video, A summation step of aggregating the undetected time in the time series of the multiple frames for each predetermined number of feature points, A determination step is performed to determine the detection accuracy of a predetermined number of feature points, based on an index value representing the undetected time for each predetermined number of feature points and information defining the degree to which the undetected time for each predetermined number of feature points has an influence on the determination result. program.
12. An acquisition unit that acquires video footage showing a specific action of the subject, A detection unit that detects two or more predetermined number of feature points representing the predetermined action for each of the multiple frames included in the video, A summation unit that aggregates the undetected time in the time series of the multiple frames for each of the predetermined number of feature points, The system includes a determination unit that performs a determination regarding the detection accuracy of a predetermined number of feature points based on an index value representing the undetected time for each predetermined number of feature points and information defining the degree to which the undetected time for each predetermined number of feature points influences the determination result. Information processing device.