Control device, control method, and program

The imaging control device addresses unintended filming by counting subjects within the competition area and adjusting tracking operations, ensuring accurate subject capture in PTZ camera systems.

JP2026110201APending Publication Date: 2026-07-02CANON KK

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
CANON KK
Filing Date
2024-12-20
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Existing imaging systems using PTZ cameras may mistakenly identify individuals outside the competition area as subjects for filming, leading to unintended shooting during automated photography in sports events with varying participant numbers.

Method used

An imaging control device that includes an acquisition means for capturing images, a control means for tracking subjects, and a counting means to manage the number of subjects, allowing the system to switch between tracking and stopping based on the counted subjects.

Benefits of technology

The system effectively suppresses unintended shooting by accurately tracking intended subjects while excluding individuals outside the competition area.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026110201000001_ABST
    Figure 2026110201000001_ABST
Patent Text Reader

Abstract

The purpose is to suppress unintended shooting when automatically photographing a subject. [Solution] The present invention is an imaging control device comprising: an acquisition means for acquiring an image captured by an imaging unit; a control means for controlling the imaging unit to track subjects included in an image based on the image acquired by the acquisition means; and a counting means for counting the number of subjects included in the image acquired by the acquisition means. The control means is characterized in that it controls the imaging unit to switch between tracking subjects and stopping subject tracking based on the number of subjects counted by the counting means.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to an imaging control device, an imaging control method, and a program.

Background Art

[0002] In recent years, a method of automatically performing photography by an edge AI device controlling an imaging device (referred to as a PTZ camera) capable of changing the shooting direction (pan / tilt direction) and the angle of view (zoom value) is becoming popular. As a method of automatically controlling a PTZ camera, a technique is known in which AI (Artificial Intelligence) is used to detect a desired subject in a photographed image and control the PTZ camera to track the subject. Further, by applying the AI technology and determining the shooting direction of the PTZ camera based on the positional relationship of a plurality of detected subjects, it is possible to automatically control the PTZ camera so that not only a single subject but also a plurality of subjects are within the angle of view.

[0003] Patent Document 1 discloses a moving object photographing system in which a control unit uses a movable camera capable of moving in the vertical, horizontal, upper, and lower directions of a photographing range to include a group of moving objects included in a predetermined area within a photographing angle of view. By using the technique of Patent Document 1, for example, in a combat sport composed of a plurality of players and a referee such as judo or boxing, it is possible to photograph the PTZ camera so that a plurality of players and the referee are within the photographing angle of view. That is, the control unit can automate the photographing by controlling the photographing direction of the PTZ camera so that a plurality of players and the referee are within the photographing angle of view of the PTZ camera. In general, in combat sports such as judo and boxing, the number of players and the number of referees are predetermined. Therefore, it can also be assumed that the control unit detects that a predetermined number of people are in the competition area according to the type of competition and automatically photographs the intended photographed image with the PTZ camera by photographing so that the detected number of subject groups are within the photographing angle of view.

Prior Art Documents

Patent Documents

[0004] [Patent Document 1] Japanese Patent Publication No. 2019-29886 [Overview of the Initiative] [Problems that the invention aims to solve]

[0005] However, depending on the type of sport, more people than the permitted number may enter the competition area. For example, in boxing, seconds (assistants for the competitors) may enter between rounds, or medical personnel may enter in case of an accident during the match. In such cases, the PTZ camera may mistakenly identify people other than the athletes who have entered the competition area as subjects for filming, potentially resulting in unintended filming.

[0006] This invention has been made in view of the circumstances described above, and aims to suppress unintended shooting when automatically photographing a subject. [Means for solving the problem]

[0007] The present invention relates to an imaging control device comprising: an acquisition means for acquiring an image captured by an imaging unit; a control means for controlling the imaging unit to track a subject included in the image based on the image acquired by the acquisition means; and a counting means for counting the number of subjects included in the image acquired by the acquisition means, wherein the control means controls the imaging unit to switch between tracking the subject and stopping the tracking of the subject based on the number of subjects counted by the counting means. [Effects of the Invention]

[0008] According to the present invention, unintended shooting can be suppressed when automatically photographing a subject. [Brief explanation of the drawing]

[0009] [Figure 1] This figure shows an example of the configuration of the imaging system according to the first embodiment. [Figure 2]This figure shows an example of the internal configuration of each device in the first embodiment. [Figure 3] This is a flowchart of the automatic selection region setup operation in the first embodiment. [Figure 4] This figure shows an example of a UI screen for various settings related to shooting. [Figure 5] This is a flowchart of the overhead view setup operation in the first embodiment. [Figure 6] This is a flowchart of the tracking operation in the first embodiment. [Figure 7] This is a diagram illustrating an example of polar coordinate transformation. [Figure 8] This diagram illustrates the longest possible distance between subjects. [Figure 9] This figure shows the state transitions according to the first embodiment. [Figure 10] This is a flowchart of the operation according to the state according to the first embodiment. [Figure 11] This figure shows an example of the configuration of the imaging system according to the second embodiment. [Figure 12] This figure shows an example of the internal configuration of each device in the second embodiment. [Figure 13] This is a flowchart of the automatic selection region setup operation in the second embodiment. [Figure 14] This is a flowchart of the overhead view setup operation in the second embodiment. [Figure 15] This is a flowchart of the tracking operation in the second embodiment. [Figure 16] This is a flowchart of the operation according to the state according to the second embodiment. [Modes for carrying out the invention]

[0010] A preferred embodiment of the present invention will be described in detail based on the accompanying drawings. Each of the following embodiments does not limit the present invention, and not all combinations of the features described in this embodiment are essential for the solution of the present invention. The configuration of this embodiment can be appropriately modified or changed according to the specifications of the device to which the present invention is applied and various conditions (usage conditions, usage environment, etc.). Also, in the following embodiments, the same or similar configurations and processing steps are denoted by the same reference numerals, and duplicate explanations are omitted.

[0011] <First Embodiment> [Configuration of Imaging System] The imaging system of this embodiment is, for example, composed of an imaging device (PTZ camera) capable of changing the imaging direction (pan / tilt direction) and the angle of view (zoom value), an edge AI device, and a PC (Personal Computer). In this embodiment, the edge AI device serves as an imaging control device that controls the PTZ camera. In the first embodiment, the edge AI device detects a desired subject from the captured video of the PTZ camera and controls the imaging direction and the angle of view of the PTZ camera to automatically track the subject. In the following, as an example of the subject to be detected, three people, namely two players and one referee who are engaged in a competitive game, will be described. However, the number of subjects to be detected is not limited to three.

[0012] FIG. 1 is a diagram showing an example of the configuration of the imaging system according to the first embodiment. As shown in FIG. 1, the imaging system includes a PTZ camera 100, an edge AI device 200, and a PC 300, which are connected via a network 400. The network 400 is, for example, a LAN (Local Area Network), but other networks may also be used, and a video cable or the like may be included.

[0013] The PTZ camera 100 corresponds to an example of an imaging unit. The PTZ camera 100 includes a photographing optical system, an image sensor, an image processing unit, etc. The PTZ camera 100 transmits an image (referred to as a captured video) obtained by photographing with the image sensor and passing through image processing by the image processing unit to the edge AI device 200 and the PC 300 via the network 400. Further, the PTZ camera 100 includes a drive unit for driving pan, tilt, and zoom. The drive unit changes the shooting direction (pan / tilt direction) by rotating the PTZ camera in the pan / tilt direction, and changes the angle of view by changing the zoom value of the photographing optical system. Details of the configuration, functions, operations, etc. of the PTZ camera 100 in the present embodiment will be described later.

[0014] The PC 300 transmits information for various settings related to shooting to the edge AI device 200 or displays the captured video received from the PTZ camera 100. Various settings related to shooting include, in addition to general shooting settings in the PTZ camera, settings related to a predetermined target area and settings related to a predetermined composition, which will be described later. The PC 300 generates information on various settings related to shooting based on an input from a user (for example, an operator), and transmits the information on various settings related to shooting to the edge AI device 200. Details of the configuration, functions, operations, etc. of the PC 300 in the present embodiment will be described later.

[0015] The edge AI device 200 performs AI-based inference on the captured video received from the PTZ camera 100 to detect the subject. Based on the subject detected by inference and various shooting settings received from the PC 300, the edge AI device 200 calculates the shooting direction and field of view of the PTZ camera 100 to track the subject. The edge AI device 200 in this embodiment functions as an image control device, generating control signals to control the shooting direction and field of view of the PTZ camera 100 and transmitting them to the PTZ camera 100 via the network 400. As a result, the PTZ camera 100 performs pan-tilt and zoom operations based on the control signals received from the edge AI device 200. As will be described in detail later, the edge AI device 200 in this embodiment controls the automatic subject tracking shooting by the PTZ camera 100, and automatically switches the composition and camera work based on various shooting setting information. Details of the configuration, functions, and operation of the edge AI device 200 in this embodiment will be described later.

[0016] In the shooting system of this embodiment, the PC 300 accesses a web server inside the edge AI device 200 based on user input, and then transmits various shooting-related setting information to the edge AI device 200 based on further user input. The edge AI device 200 then controls the PTZ camera 100 to track the subject and perform actions such as switching to a predetermined composition, as described later. Note that various shooting-related settings can be implemented in various ways, such as accessing the web server inside the edge AI device 200, or by launching an application program on the PC 300, and are not limited to any one of these methods.

[0017] [Internal Configuration of Each Device in the Imaging System] Figure 2 shows an example of the internal configuration of each device included in the imaging system shown in Figure 1: the PTZ camera 100, the edge AI device 200, and the PC 300. First, let's explain the internal configuration of the PTZ camera 100. The PTZ camera 100 includes a CPU 101, RAM 102, ROM 103, video output I / F 104, network I / F 105, image processing unit 106, image sensor 107, drive I / F 108, and drive unit 109, with each component connected by an internal bus 110. The image sensor 107 is connected to the image processing unit 106, and the drive unit 109 is connected to the drive I / F 108.

[0018] The CPU 101 controls the entire PTZ camera 100 and performs various calculations. The CPU 101 executes the program loaded from the ROM 103 into the RAM 102, thereby realizing the operation of the PTZ camera 100 as described later. ROM103 is a non-volatile storage device, such as flash memory, HDD, SSD (Solid State Drive), and SD card. ROM103 is used as a persistent storage area for the OS, various programs, and various data, as well as for short-term data storage. RAM102 is a storage device such as DRAM (Dynamic Random Access Memory). RAM102 is loaded with the OS (operating system), various programs, and various data from ROM103, and is also used as a workspace for the OS and various programs.

[0019] The image sensor 107 has an image sensor such as a CCD or CMOS, and acquires image data of an optical image formed by an imaging optical system (not shown), and outputs it to the image processing unit 106. The image processing unit 106 converts the image data input from the image sensor 107 into a predetermined format and performs image processing, such as compression as necessary, before transferring it to the RAM 102. The image processing performed by the image processing unit 106 also includes image quality adjustment for the image data input from the image sensor 107 and cropping to extract only a predetermined area of ​​the image.

[0020] The video output I / F 104 is an interface (I / F) for outputting captured video, which has been acquired by the image sensor 107 and processed by the image processing unit 106, to an external source. The video output I / F 104 is composed of, for example, SDI (Serial Digital Interface) or HDMI (High-Definition Multimedia Interface) (registered trademark). In this embodiment, the video output I / F 104 is connected to the video input I / F 208 of the edge AI device 200, which will be described later.

[0021] The network interface 105 is an interface for connecting to the aforementioned network 400. The network interface 105 handles communication between the edge AI device 200 and external devices such as the PC 300 via a communication channel such as Ethernet®. In this embodiment, remote camera control of the PTZ camera 100 by the edge AI device 200 is performed via the network interface 105, but it may also be performed via another interface such as a serial communication interface (not shown).

[0022] The drive I / F 108 is the connection point to the drive unit 109 and is responsible for communication, such as sending control signals to the drive unit 109 and receiving information from the drive unit 109. The drive unit 109 has a mechanical drive system and a motor as a drive source as a rotation mechanism for changing the shooting direction (pan / tilt direction) of the PTZ camera 100. The drive unit 109 also has a lens drive system as a mechanism for changing the angle of view (zoom value) and focusing of the PTZ camera 100's shooting optical system. Based on control signals received from the CPU 101 via the drive I / F 108, the drive unit 109 drives the mechanical drive system and the motor as a drive source to move the PTZ camera 100 in the horizontal direction (pan direction) and the vertical direction (tilt direction). Furthermore, based on control signals received from the CPU 101 via the drive I / F 108, the drive unit 109 operates the lens drive system within the shooting optical system to perform zoom and focusing operations to optically change the angle of view.

[0023] Next, we will explain the internal configuration of the edge AI device 200. The edge AI device 200 includes a CPU 201, RAM 202, ROM 203, network interface 204, video output interface 205, user input interface 206, inference unit 207, and video input interface 208, with each component connected by an internal bus 209.

[0024] The CPU 201 controls the entire edge AI device 200 and performs various calculations. The CPU 201 executes programs loaded from the ROM 203 into the RAM 202, thereby enabling the operation of the edge AI device 200 as described later. ROM203 is a non-volatile storage device such as flash memory, HDD, SSD, or SD card. ROM203 is used as a persistent storage area for the OS, various programs, and various data, as well as for short-term data storage. RAM202 is a rewritable, high-speed memory device such as DRAM, into which the OS, various programs, and various data are loaded from ROM203, and which is also used as a workspace for the OS and various programs.

[0025] Network I / F204 is an interface for connecting to network 400, and is responsible for communication with external devices such as PTZ camera 100 and PC300 via network 400. The video output I / F205 is an interface for outputting setting information of the edge AI device 200, which is displayed within the UI (user interface) screen when setting a predetermined target area or composition on the PC300, as described later.

[0026] The User Input I / F206 is an interface for connecting to a mouse, keyboard, and other input devices, and is configured using USB (Universal Serial Bus), etc. The video input interface 208 is an interface for receiving video footage from the aforementioned PTZ camera 100, and consists of SDI, HDMI, etc.

[0027] The inference unit 207 infers the presence or absence of a subject, such as a person, that is a predetermined detection target, from the captured video received via the video input I / F 208, and, if a subject exists, its position. The inference unit 207 is composed of a computing device specialized for image processing and inference processing, such as a GPU (Graphics Processing Unit). While a GPU is generally effective when applied to learning processing, equivalent functionality may be realized with a reconfigurable logic circuit such as an FPGA (Field Programmable Gate Array). Furthermore, the processing of the inference unit 207 may be handled by the CPU 201.

[0028] Next, I will explain the internal configuration of the PC300. The PC300 has a CPU 301, RAM 302, SSD 303, network interface 304, display unit 305, operation unit 306, and device interface 307, and each component is connected by an internal bus 308.

[0029] The CPU 301 controls the entire PC 300 and performs various calculations. The CPU 301 executes programs loaded from the SSD 303 into the RAM 302, thereby enabling the PC 300's operation as described later. The SSD303 is a non-volatile, high-capacity storage device that can be used as a persistent storage area for the OS, various programs, and various data, as well as for short-term storage of various data. RAM302 is a rewritable, high-speed storage device such as DRAM, which loads the OS, various programs, and various data from SSD303, and is also used as a workspace for the OS and various programs.

[0030] Network I / F 304 is an interface for connecting to network 400 and handles communication with external communication devices such as the PTZ camera 100 and edge AI device 200 via network 400. Communication on PC 300 involves sending various setting information related to shooting to the edge AI device 200 and receiving information such as captured video from the PTZ camera 100 and the current pan / tilt values ​​(shooting direction) and zoom values ​​(angle of view) of the PTZ camera 100.

[0031] The display unit 305 is a display device for displaying captured video from the PTZ camera 100, as well as UI screens used for setting predetermined target areas and compositions, as described later. While this example shows the PC 300 having a display device, the configuration is not limited to this; for example, a separate display monitor and controller may exist, each solely for displaying captured video and UI screens.

[0032] The operation unit 306 is an interface for receiving user operations to the PC 300, and examples include a mouse, keyboard, buttons, dial, joystick, touch panel, etc. The operation unit 306 receives user operations and inputs to the UI screen used for setting predetermined target areas and predetermined compositions, as described later. In this embodiment, mouse operation is assumed as user operation to the UI screen, and user pressing operations on buttons etc. displayed on the UI screen, as described later, are assumed to be mouse click operations. However, it is not limited to this, and user operations to the UI screen may be various operations such as touch operations on the screen of a display device equipped with a touch panel. Based on the user operation to the UI screen, the PC 300 generates various setting information related to shooting for setting predetermined target areas and predetermined compositions, as described later, and transmits it to the edge AI device 200 via the network I / F 304. The device I / F307 is an interface for connecting to various input devices and consists of USB (Universal Serial Bus), among others.

[0033] [Explanation of the operation of each device in the imaging system] Next, the operation of each device in the imaging system will be explained with reference to Figures 3 to 8. In this embodiment, the operation of the shooting system is broadly divided into setup operation and tracking operation. Setup operation is an operation to perform various settings related to shooting, such as a predetermined target area and predetermined composition, which will be described later, before the tracking operation starts. Tracking operation is an operation to track the subject to be detected based on the various settings related to shooting set in the setup operation.

[0034] <Setup operation> First, let's explain the setup process. In this embodiment, the setup operation for making various settings related to shooting includes setup for setting a predetermined target area and setup for setting a predetermined composition. In this embodiment, the setup for defining a predetermined target area involves setting an automatic selection area. The automatic selection area is an area within the captured video where the subject to be tracked is automatically selected and detected.

[0035] Furthermore, in this embodiment, the setup for setting a predetermined composition involves setting the camera to shoot with a composition that captures the entire competition area in the center of the frame. Examples of compositions that capture the entire competition area in the center of the frame include wide shots that broadly capture the entire competition area, and in this embodiment, one example of such a composition is a bird's-eye view composition that captures the entire competition area from above (hereinafter referred to as a bird's-eye view composition). In the case of a competition involving two athletes and one referee, as exemplified in this embodiment, the bird's-eye view composition is used, for example, when shooting a scene at the start or end of the competition where the referee is in the center and the athletes are positioned on the left and right. Note that the predetermined composition is not limited to a composition that captures the entire competition area in the center of the frame, wide shots, or bird's-eye view compositions, but may also be a composition arbitrarily set by the user, or a specific composition depending on the type of competition or shooting purpose.

[0036] In the imaging system of this embodiment, when the PC300, edge AI device 200, and PTZ camera 100 are started up, the PC300 establishes a connection with the edge AI device 200 and the PTZ camera 100 and enters a standby state.

[0037] When the standby PC300 receives a setting-up instruction for the automatically selected area from the user via the operation unit 306, it starts the operation of the flowchart shown in Figure 3(a), which will be described later. Furthermore, when the PC300 receives a setting-up instruction for the automatically selected area from the user, it sends a notification to the edge AI device 200. Upon receiving the notification from the PC300, the edge AI device 200 starts the operation of the flowchart shown in Figure 3(b), which will be described later. Furthermore, when the standby PC300 receives a setting instruction for the overhead view from the user via the operation unit 306, it starts the operation of the flowchart shown in Figure 5(b), which will be described later. In addition, when the PC300 receives a setting instruction for the overhead view from the user, it sends a notification to that effect to the edge AI device 200 and the PTZ camera 100. When the PTZ camera 100 receives the notification from the PC300, it starts the operation of the flowchart shown in Figure 5(a), which will be described later. Also, when the edge AI device 200 receives the notification from the PC300, it starts the operation of the flowchart shown in Figure 5(c), which will be described later.

[0038] First, we will explain the operation of the flowchart shown in Figure 3(a) that is executed in the PC300 when it receives a setup instruction for the automatically selected area from the user. In S101, when the CPU 301 of PC300 receives a setting instruction for the automatic selection area from the user, it reads and receives the initial value of the automatic selection area from SSD303. The initial value of the automatic selection area may be, for example, an area selected according to the type of competition from a fixed set of automatic selection areas predetermined for each type of competition, or the automatic selection area used last during the previous operation may be used. The CPU 301 may also query, for example, the edge AI device 200 to obtain information that will be used as the initial value of the automatic selection area.

[0039] In S102, the CPU 301 displays a UI screen on the display unit 305 for the user to configure settings such as the automatically selected area. Figure 4 shows an example of a UI screen used for setting the automatically selected area, etc. Note that the UI screen 500 exemplified in Figure 4 also includes a configuration for the user to adjust and finalize the overhead view, which will be described later.

[0040] As shown in Figure 4, the left area 501 of the UI screen 500 displays the captured video received from the PTZ camera 100, and the automatic selection area 600 is superimposed on the captured video. In the example in Figure 4, the captured video shows two players 700a and 700b competing in the competition area 601, one referee 701, and a person 702, such as a substitute player, who is outside the competition area 601. Although the person 702 outside the competition area 601 is assumed to be a substitute player, it could be another person, such as a spectator. The automatic selection area 600 is an area set by the user to match the competition area 601 through operation of the operation unit 306. For example, after the initial value of the automatic selection area is set by the CPU 301, the user can set any automatic selection area 600 by operating the initial value of the automatic selection area through the operation unit 306, as described later.

[0041] The right-hand area 502 of the UI screen 500 contains a PTZ setting button 800, an automatic selection area confirmation button 801, an overhead composition adjustment start button 802, and an overhead composition confirmation button 803. The automatic selection area confirmation button 801 is pressed by the user to confirm the automatic selection area 600 after user operation has been performed on the automatic selection area 600 in the left-hand area 501 of the UI screen 500. The PTZ setting button 800 includes a cross-shaped button 810 for the user to set the pan and tilt of the PTZ camera 100, and a tele-wide button 811 for the user to set the zoom (angle of view) of the PTZ camera 100. When the user operates the cross-shaped button 810 or the tele-wide button 811 of the PTZ setting button 800, the PC 300 sends pan, tilt, and zoom control commands to the PTZ camera 100 according to the user operation information. This changes the shooting direction and field of view of the PTZ camera 100, and the captured image displayed in the left area 501 of the UI screen 500 changes. The PTZ setting button 800 is also used when adjusting the overhead composition, which will be explained later. The roles of the overhead composition adjustment start button 802 and the overhead composition confirmation button 803, as well as the role of the PTZ setting button 800 when adjusting the overhead composition, will be explained later.

[0042] In this embodiment, an example is given in which the user sets an arbitrary automatic selection area 600 based on the initial automatic selection area, but it is not limited to this. For example, the CPU 301 may detect the competition area 601 from the captured video using AI technology, etc., and automatically set the automatic selection area 600 according to the detected competition area 601. Also, in the case of Figure 4, the automatic selection area 600 is represented as a rectangular area, but it is not limited to this, and any shape that is appropriate for the competition area 601 may be used, such as a polygon or a circle. In this embodiment, as will be described later, the automatic selection area 600 is an area that automatically selects the subject to be tracked in the captured video, so it is possible to distinguish between the subjects to be tracked, such as players and referees, and other subjects such as substitute players. In other words, substitute players and spectators outside the automatic selection area are excluded from the tracking targets, so it is possible to track only the players and referees within the automatic selection area.

[0043] The UI screen 500 described in Figure 4 may be displayed by an application program running on the PC 300. Alternatively, the edge AI device 200 may be equipped with a web server, and the UI screen may be displayed as content downloaded by the PC 300 from there.

[0044] Let's return to the explanation of the flowchart in Figure 3(a). After processing in S102, the CPU 301 loops through processing S103 to S104 until the user presses the automatic selection area confirmation button 801. In S103, the CPU 301 obtains user operations on each of the four vertices of the automatically selected region 600 from the operation unit 306, and sets the automatically selected region 600 based on the position of each vertex operated on by the user. In other words, the user can set any automatically selected region 600 by manipulating the position of each vertex of the automatically selected region 600 via the operation unit 306. The CPU 301 writes the coordinate information of each vertex of the automatically selected region set according to the user operation to the RAM 302. Note that user operations on the positions of the four vertices of the automatically selected region 600 can be achieved by various operations, such as drag-and-drop operations using a mouse, but this embodiment is not limited to any one of these operations.

[0045] In S104, the CPU 301 determines whether the automatic selection area confirmation button 801 has been pressed by the user via the operation unit 306. If the CPU 301 determines that the automatic selection area confirmation button 801 has been pressed, it exits the loop and proceeds to S105. In S105, the CPU 301 reads the coordinate information of the automatically selected area stored in the RAM 302 and transmits it to the edge AI device 200 via the network I / F 304.

[0046] Next, we will explain the flowchart process shown in Figure 3(b) that is executed on the edge AI device 200 during the setup of the automatically selected region. In S201, the CPU 201 of the edge AI device 200 is in a state of waiting to receive coordinate information of the automatically selected area, and receives the coordinate information of the automatically selected area from the PC 300 via the network I / F 204. In S202, CPU201 writes the coordinate information of the automatically selected region that it received to RAM202.

[0047] Next, we will explain the flowchart process shown in Figure 5(b) that is executed in PC300 when it receives instructions from the user to set up an overhead view. As part of the setup operation for the overhead composition, PC300 sets the shooting direction (pan and tilt values) and field of view (zoom value) of the PTZ camera 100 to be set as an overhead composition for the edge AI device 200. In this embodiment, the overhead composition is a composition in which the entire competition area is shown in the center of the field of view, as described above, and is a composition for overhead shooting of a scene at the start or end of the competition in which the referee is positioned in the center and the players are positioned on the left and right. The overhead composition is, for example, the composition of the captured video displayed in the left area 501 of the UI screen 500 in Figure 4, that is, a composition that shows not only the players 700a, 700b and the referee 701 in the competition area 601, but also people 702 such as substitute players outside the competition area 601.

[0048] When the overhead view setup begins, the CPU 301 of PC300 is waiting for user input in response to the overhead view adjustment start button 802 located in the right-hand area of ​​the UI screen shown in Figure 4. In S401, when the CPU 301 receives input from the user by pressing the overhead composition adjustment start button 802, it loops through the processes from S402 to S403 until the overhead composition confirmation button 803 is pressed.

[0049] In the UI screen 500 of Figure 4, the overhead composition adjustment start button 802 located in the right-hand area 502 is pressed when the user instructs to start adjusting the overhead composition. Similarly, the overhead composition confirmation button 803 is pressed when the user instructs to confirm the overhead composition. When the overhead composition adjustment start button 802 is pressed, the PC 300 determines that the user has instructed to start adjusting the overhead composition. Then, when the user operates the cross button 810 and the tele-wide button 811 of the PTZ camera 100, the PC 300 sends a control command to the PTZ camera 100 that includes the drive direction and drive amount of pan, tilt, and zoom according to the user's operation. As a result, the PTZ camera 100 performs overhead composition adjustment by adjusting the pan, tilt, and zoom. As a result of this overhead composition adjustment, if the user decides that the overhead composition is satisfactory and presses the overhead composition confirmation button 803, the PC300 determines the pan, tilt, and zoom values ​​of the PTZ camera 100 at that time as the pan, tilt, and zoom values ​​for the overhead composition. These pan, tilt, and zoom values ​​for the overhead composition are then saved to the edge AI device 200.

[0050] Let's return to the explanation of the flowchart in Figure 5(a). In S402, the CPU 301 waits for user input for the directional buttons 810 and tele-wide buttons 811 of the PTZ setting button 800 shown in Figure 4. When user input is received for the directional buttons 810 and tele-wide buttons 811 of the PTZ setting button 800, the CPU 301 sends pan-tilt-zoom control commands corresponding to the user input information to the PTZ camera 100. For example, if a pan-tilt operation is input using the directional buttons 810, the PC 300 sends a control command to the PTZ camera 100 via the network I / F 304 to drive the PTZ camera to pan-tilt with the corresponding pan-tilt value. Also, for example, if a zoom operation is input using the tele-wide button 811, the CPU 301 sends a control command to the PTZ camera 100 via the network I / F 304 to drive the PTZ camera to zoom with the corresponding zoom value.

[0051] In S403, the CPU 301 determines whether or not the user has pressed the overhead composition selection button 803 via the operation unit 306. If the CPU 301 determines that the user has pressed the overhead composition selection button 803, it exits the loop and proceeds to S404. In S404, the CPU301 sends a command to the PTZ camera100 requesting the transmission of the current pan, tilt, and zoom values.

[0052] In S405, the CPU 301 receives information transmitted from the PTZ camera 100 via the network I / F 304 as a response to the request command sent in S404. Thus, the CPU 301 receives the current pan, tilt, and zoom values ​​of the PTZ camera 100 from the PTZ camera 100. In S406, the CPU 301 transmits the pan, tilt, and zoom values ​​received in S405 to the edge AI device 200 via the network interface 304. These pan, tilt, and zoom values ​​are used by the edge AI device 200 to set the PTZ camera 100 to a shooting direction and field of view corresponding to an overhead composition.

[0053] Next, the operation of the PTZ camera 100 after the pan, tilt, and zoom values ​​for the overhead composition have been determined by the overhead composition setup operation described above will be explained with reference to the flowchart in Figure 5(a). In S301, the CPU 101 of the PTZ camera 100 is in a state of waiting to receive commands sent from PC300, and receives commands from PC300 via the network I / F 105 requesting the transmission of pan, tilt, and zoom values. In S302, CPU101 reads the current pan, tilt, and zoom values ​​stored in RAM102. In S303, the CPU 101 reads the current pan, tilt, and zoom values ​​from RAM 102 and sends them to PC 300 via network I / F 105.

[0054] Next, the operation of the edge AI device 200 after the pan, tilt, and zoom values ​​for the overhead composition have been determined by the overhead composition setup operation described above will be explained with reference to the flowchart in Figure 5(c). In S501, the CPU 201 of the edge AI device 200 is waiting to receive information transmitted from PC 300, and receives pan, tilt, and zoom values ​​for setting the overhead composition from PC 300 via the network I / F 204. In S502, CPU201 writes the received pan, tilt, and zoom values ​​to RAM202 as pan, tilt, and zoom values ​​for an overhead shot.

[0055] [Tracking behavior and switching to an overhead view] The imaging system of this embodiment allows switching between a first control and a second control, which is different from the first control, depending on the distance between subjects, for the control of the PTZ camera 100. The operation will be described below. In this embodiment, the first control is described as operating the PTZ camera 100 to automatically track a subject, and the second control is described as controlling the PTZ camera 100 to an overhead composition.

[0056] In the shooting system of this embodiment, after the setup of the automatic selection area and overhead composition described above is completed, the subject tracking operation and switching to the overhead composition are performed using the various shooting setting information set by these setups. In the shooting system of the first embodiment, the edge AI device 200 detects the subject position from the image captured by the PTZ camera 100 and performs automatic tracking by controlling the pan, tilt, and zoom of the PTZ camera 100 according to the subject position. The edge AI device 200 also obtains the distance between subjects from the multiple subject positions that it has inferred, and switches between automatic tracking and the overhead composition based on that distance between subjects.

[0057] Figure 6(a) shows a flowchart of the tracking operation in the edge AI device 200. When controlling the tracking operation, the edge AI device 200 obtains the distance between subjects from the captured video and decides whether to switch to an overhead view composition based on the obtained distance between subjects. Figure 6(b) shows a flowchart of the operation in the PTZ camera 100.

[0058] First, referring to the flowchart in Figure 6(a), we will explain the control of tracking operations and the switching to an overhead view performed by the edge AI device 200. In the shooting system of this embodiment, the PTZ camera 100 sequentially transmits the captured video from the video output I / F 104 at a predetermined frame rate. The edge AI device 200 sequentially receives the captured video transmitted sequentially from the PTZ camera 100 at a predetermined frame rate via the video input I / F 208 and sequentially stores it in the internal RAM 202. Alternatively, the PTZ camera 100 may also sequentially transmit the captured video from the network I / F 105 at a predetermined frame rate. In this case, the edge AI device 200 sequentially receives the captured video via the network I / F 204 and stores it in the RAM 202. The loop processing from S601 to S611 in Figure 6(a) in the edge AI device 200 is performed for each frame of the captured video.

[0059] In S601, the CPU 201 of the edge AI device 200 sequentially reads the captured video stored in RAM 202 and transfers it to the inference unit 207. In S602, the inference unit 207 detects a subject from the captured video and writes the inference result information to the RAM 202. In this embodiment, the inference unit 207 has a trained model created using machine learning techniques such as deep learning, and acquires the captured video as input data and outputs the inference result as output data. The inference result includes the position information of the person being tracked, such as an athlete or referee, as well as the type of the subject being tracked (for example, whether it is an athlete or a referee), and a score indicating the likelihood of them being identified. The position information of the subject (person) also includes the coordinate information of the four vertices of the top-left, top-right, bottom-left, and bottom-right of the rectangular area surrounding the subject, as well as information such as the width and height of the rectangular area. The inference unit 207 acquires this inference result information as a set.

[0060] In S603, the CPU201 reads the coordinate information indicating the automatically selected region stored in RAM202 in S202 of Figure 3(b) above from RAM202. In S604, the CPU 201 reads the position information of the rectangular area of ​​the subject from the inference results stored in RAM 202 in S602, and counts the number of subjects present in the automatically selected area based on the position information of that rectangular area. In other words, the CPU 201 counts the number of people present in the automatically selected area. In this embodiment, the CPU 201 counts subjects whose center point of the bottom edge of the rectangular area is included within the automatically selected area as subjects present in that automatically selected area.

[0061] Here, in order to determine whether the subject is included within the automatically selected area regardless of the pan / tilt direction or zoom value of the PTZ camera 100, the CPU 201 converts the coordinate system of the coordinate information indicating the center point of the bottom edge of the rectangular area of ​​the subject and the automatically selected area to a predetermined coordinate system. In this embodiment, the coordinate information of the center point of the bottom edge of the rectangular area of ​​the subject and each vertex of the automatically selected area is coordinate information of a Cartesian coordinate system expressed as (x,y) on the captured image. Therefore, the CPU 201 converts this Cartesian coordinate system coordinate information into polar coordinate system coordinate information where the pan / tilt angle when the PTZ camera 100 is facing the front of the competition area is set to 0 degrees, the angle in the pan direction is θq [rad], and the angle in the tilt direction is φq [rad]. As a result, the coordinate information of the subject and the automatically selected area can be expressed as coordinate information that does not depend on the pan / tilt / zoom values ​​of the PTZ camera 100. Therefore, the CPU 201 can determine whether the subject is included within the automatically selected area regardless of the pan / tilt / zoom values ​​of the PTZ camera 100.

[0062] Below, as an example of a method for converting from a Cartesian coordinate system represented by (x,y) to a polar coordinate system, we will explain how to convert the two-dimensional coordinate P(x,y) on the captured video to a three-dimensional coordinate Q(X,Y,Z) with the PTZ camera as the origin, using Figures 7(a) to 7(c).

[0063] Figure 7(a) shows the captured image 1000 from the PTZ camera 100 in a Cartesian coordinate system (x,y), and represents the point (pixel) where the two-dimensional coordinate P(x,y) in the figure is converted to the three-dimensional coordinate Q(X,Y,Z). In Figure 7(a), x [pixels] to the right of the center of the captured image 1000 are positive values, and y [pixels] below the center are positive values. The image size of the captured image 1000 is assumed to be w × h [pixels].

[0064] Figure 7(b) shows a sphere 1001 in a three-dimensional space with the PTZ camera 100 at its origin O, with the radius being the distance from the PTZ camera 100 to the subject in the captured image. For simplicity of explanation, the radius of the sphere 1001 is normalized to 1 in Figure 7(b). As shown in Figure 7(b), when the position of the PTZ camera 100 is represented in a three-dimensional space with the origin O, the captured image 1000 shown in Figure 7(a) can be represented as a two-dimensional image tangent to the sphere 1001 at its center R.

[0065] Figure 7(c) shows the current pan angle θcam and tilt angle φcam of the PTZ camera 100, with the pan-tilt angle set to 0 degrees when the PTZ camera 100 is facing the front of the competition area. The front of the PTZ camera 100 is assumed to be the x-axis direction in Figure 7(c). The pan angle θcam, tilt angle φcam, horizontal zoom field of view ψwcam (not shown), and vertical zoom field of view ψhcam (not shown) can be obtained by the edge AI device 200 requesting the current pan-tilt-zoom values ​​from the PTZ camera 100.

[0066] As shown in Figure 7(b), if xpp is the distance in the x-axis direction from the center R of the captured video 1000 to the three-dimensional coordinate Q(X,Y,Z) and ypp is the distance in the y-axis direction, then these distances xpp and ypp can be calculated using the following equations (1) and (2). Furthermore, the three-dimensional coordinate Q(X,Y,Z) can be calculated using the following equation (3).

[0067]

number

[0068] Since the orientation of the PTZ camera 100 is in the direction of the pan angle θcam and the tilt angle φcam, the three-dimensional coordinate Q(X,Y,Z) can be calculated by rotating the coordinate axes around the Z axis by θcam and around the Y axis by φcam, as shown in equation (3). As a result, the CPU 201 can convert a point P(x,y) on the captured video 1000 into a three-dimensional coordinate Q(X,Y,Z) with the PTZ camera as the origin. Next, CPU201 converts the three-dimensional coordinate Q(X,Y,Z) into the pan angle θq and tilt angle φq as seen from the PTZ camera 100, using equations (4) and (5) below.

[0069]

number

[0070] As described above, the CPU 201 converts the coordinate information of the center point of the bottom edge of the rectangular region of the subject and the four vertices indicating the automatically selected region into the pan angle θq and tilt angle φq as seen from the PTZ camera 100, using equations (1) to (5), and calculates. As a result, even if the pan, tilt, and zoom values ​​of the PTZ camera 100 change, the CPU 201 can perform the processing of S604.

[0071] The aforementioned method for converting to polar coordinates is just one example; any existing calculation method that converts to polar coordinates may be used. Furthermore, in this embodiment, the polar coordinates are converted based on the pan, tilt, and zoom values ​​of the PTZ camera 100. However, in the case of a camera that can only control pan, for example, the polar coordinates can be converted based on the pan value. The same applies to cameras that can only control tilt; the polar coordinates can be converted based on the tilt value.

[0072] Let's return to the explanation of the flowchart in Figure 6(a). In S605, the CPU 201 determines whether the number of subjects counted in S604 satisfies the condition that it is a predetermined number. In this embodiment, as mentioned above, two players competing and one referee are used as an example, so the predetermined number to be determined in S605 is three. If the CPU 201 determines that the number of subjects counted is three, it proceeds to the automatic tracking process from S606 onwards; if it determines that the number is not three, it skips the processes from S606 to S611 and proceeds to the next loop process.

[0073] Furthermore, if, after the tracking operation has started due to the determination that there are three subjects, the CPU 201 determines in S605 of the loop processing that there are not three subjects, the CPU 201 may control the pan, tilt, and zoom values ​​of the PTZ camera 100 to be fixed. In other words, if the number of people in the automatically selected area falls below a predetermined number (less than three) after automatic tracking has started, the CPU 201 stops controlling the automatic tracking. One example of a situation where the number of subjects falls below three is when two of the three athletes leave the automatically selected area. In this case, since the control of automatic tracking is stopped, the system will focus on tracking the one person remaining in the automatically selected area (competition area) (for example, the referee), preventing the two athletes from going out of frame. Subsequently, when the two athletes return to the automatically selected area and the CPU 201 determines in S605 that there are three subjects in the automatically selected area, the process proceeds to S606, and the control of the PTZ camera 100 (automatic tracking) is resumed.

[0074] In S606, CPU201 performs distance acquisition processing, which involves measuring the distance between each subject included in the automatically selected area, and distance determination processing, which determines whether the longest (farthest) distance between subjects is greater than or equal to a predetermined distance. The predetermined distance is a distance threshold set as an appropriate distance depending on the type of competition. For example, in the case of a competition where the positions of the players at the start of the match are generally fixed, such as in judo or sumo, it is preferable to set the distance between the players at the start of the match as the predetermined distance. However, the predetermined distance is not limited to this case and may be various distances depending on the type of competition, or it may be a distance arbitrarily set by the user.

[0075] Here, we will explain the distance between subjects and the longest distance between subjects using Figure 8. Figure 8(a) shows an example of the relative positions of each player and the referee at the start or end of the competition, and Figure 8(b) shows an example of the relative positions of each player and the referee during the competition. In the positional relationship between each player and the referee at the start or end of the competition, as shown in Figure 8(a), the longest subject distance 900a among the subject distances between the two players 700a and 700b and the one referee 701 is the distance between player 700a and player 700b. In contrast, in the positional relationship between each player and the referee during the competition, as shown in Figure 8(b), the subject distances between players 700a and 700b and the referee 701 are often closer. In the example in Figure 8(b), the longest subject distance 900b is, for example, the distance between player 700b and referee 701. Thus, the longest subject distance often differs between the start or end of the competition and during the competition. Therefore, by obtaining the longest subject distance, it is possible to determine whether it is the start or end of the competition, or during the competition.

[0076] In this embodiment, as described above, the distance between players at the start of the match is set as a predetermined distance. This allows, for example, if the longest distance between subjects is less than the predetermined distance, it can be determined that the match is in progress. On the other hand, if the longest distance between subjects is greater than or equal to the predetermined distance, it can be determined that the match has started or ended.

[0077] In S606, if the longest distance between subjects is less than a predetermined distance, CPU201 proceeds to S607. In S607, CPU201 determines that the three subjects detected within the automatically selected area will be tracked, and then calculates the centroid position of these three subjects. For example, CPU201 calculates the centroid position of two or more subjects (e.g., three people) from the average of the center point positions of each rectangular area of ​​those subjects. Note that the method for calculating the centroid position of a subject is not limited to this example; other calculation methods may be used, such as using the center point of the circumscribing rectangular area that encloses all three subjects, or distinguishing between players and referees and using the average of the center point positions of only the players as the centroid position.

[0078] In S608, CPU201 determines whether the centroid position calculated in S607 matches the center position of the field of view in the captured video. If CPU201 determines that the centroid position matches the center position of the field of view, it skips the processing from S609 to S611 and proceeds to the next loop. On the other hand, if CPU201 determines that the centroid position does not match the center position of the field of view, it proceeds to S609.

[0079] In S609, the CPU201 calculates the difference between the center of gravity position calculated in S607 and the center of the field of view position in the captured image, and calculates the pan-tilt angular velocity corresponding to this difference as the pan-tilt adjustment amount. In this embodiment, the difference between the calculated center of gravity position and the center of the field of view position in the captured image is calculated, but the difference may also be calculated in polar coordinate space by performing a conversion to polar coordinates as described above. For example, for calculating the angular velocity, one method is to multiply the distance, which is the difference between the coordinate values ​​in the pan direction and the tilt direction, by a predetermined coefficient, and then determine the rotation direction of the pan-tilt depending on whether the calculated value is positive or negative. Since these techniques are known, a detailed explanation is omitted.

[0080] In S609, CPU201 calculates the zoom adjustment amount so as to keep the size of the subject's rectangular area roughly constant. The size of the subject's rectangular area may be calculated not only from the size of the subject's bounding rectangle, but also by detecting the size of individual body parts, such as the face, and adjusting the zoom so that these sizes remain constant. Alternatively, the size of the subject's rectangular area may be calculated by randomly selecting one subject from those within the automatically selected area, or by calculating the average size of the rectangular areas of three subjects. Alternatively, the zoom adjustment amount may be calculated so that the size of the bounding rectangle encompassing three subjects remains constant.

[0081] The aforementioned method of tracking a subject using a technique that calculates and controls the direction and speed of pan-tilt rotation is just one example. Any method that allows for subject tracking may be used, such as a method that calculates the target position during pan-tilt rotation and tracks the subject.

[0082] In S610, CPU201 converts the results calculated by S609 into control commands according to a predetermined protocol for controlling the PTZ camera 100, and writes them to RAM202. In S611, the CPU 201 reads the control command that was converted in S610 and written to the RAM 202, transmits it to the PTZ camera 100 via the network I / F 204, and then returns to the beginning of the loop process.

[0083] In this embodiment, an example of determining whether the center of gravity position and the center of the viewing angle coincide in S608 has been given. However, for example, a so-called dead zone may be provided such that control of the PTZ camera 100 is not performed if the difference between the center of gravity position and the center of the viewing angle is within a predetermined range. Thereby, for example, it is possible to reduce the situation where the PTZ camera 100 is controlled overly sensitively.

[0084] On the other hand, when it is determined in S606 that the longest subject distance is equal to or greater than a predetermined distance and the process proceeds to S612, the CPU 201 reads each value of the pan / tilt / zoom indicating the bird's-eye view composition written to the RAM 202 in S502. Then, the CPU 201 determines each value of the pan / tilt / zoom as the tracking target position. That is, each value of the pan / tilt / zoom written to the RAM 202 in S502 is the value of the pan / tilt / zoom at the time of the bird's-eye view composition. Therefore, by determining these values as the tracking target position, the composition of the PTZ camera 100 is switched to the bird's-eye view composition.

[0085] In S613, the CPU 201 generates a control command in accordance with a protocol determined in advance as a method for controlling the PTZ camera 100 from each value of the pan / tilt / zoom of the bird's-eye view composition read in S612, and writes it to the RAM 202. In S614, the CPU 201 reads the control command written to the RAM 202 in S613, transmits it to the PTZ camera 100 via the network I / F 204, and then returns to the beginning of the loop process.

[0086] Subsequently, the processing of the flowchart of FIG. 6(b) executed in the PTZ camera 100 during the tracking operation will be described. In S701, the CPU 101 of the PTZ camera 100 receives control commands from the edge AI device 200, which is operating as shown in the flowchart in Figure 6(a), via the network I / F 105. The CPU 101 writes the control commands sent from the edge AI device 200 to the RAM 102.

[0087] In the S702, the CPU 101 reads the drive direction and drive amount values ​​corresponding to the pan-tilt adjustment amount from the control commands stored in the RAM 102. The CPU 101 also reads the lens drive direction and drive amount values ​​corresponding to the zoom adjustment amount from the control commands. In S703, the CPU 101 calculates drive parameters for pan, tilt, and zoom operation based on the values ​​read from RAM 102 in S702. For example, the CPU 101 calculates drive parameters for controlling the motors for pan and tilt operation in the drive unit 109, as well as drive parameters for zoom operation, based on the values ​​read from RAM 102. Alternatively, the CPU 101 may convert the values ​​of the drive direction and drive amount included in the received control command into drive parameters by referring to a conversion table previously stored in ROM 103.

[0088] In S704, the CPU 101 controls the drive unit 109 via the drive I / F 108 based on the drive parameters calculated in S703. The drive unit 109 performs pan, tilt, and zoom drives based on these drive parameters, thereby controlling the shooting direction (pan and tilt direction) and field of view (zoom) of the PTZ camera 100. Therefore, according to the shooting system of this embodiment, the composition and camera work of the PTZ camera can be switched according to the situation in competitive sports, such as at the start of a competition, at the end of a competition, or during a competition.

[0089] <Description of characteristic operation> The characteristic operation of the shooting system, which uses the aforementioned control as its basic operation, will now be explained. This section describes in detail how the edge AI device 200 appropriately controls the PTZ camera 100 when, in a situation where players 700a, 700b and referee 701 have been detected, another subject enters the automatically selected area. The edge AI device 200 has three states, as shown in Figure 9, to appropriately control the PTZ camera 100 according to the situation within the automatically selected area.

[0090] When the edge AI device 200 is powered on by the user and has started up to a state where it can track a subject, it transitions to the tracking standby state (ST101). Subsequently, when tracking begins based on the furthest distance between subjects as explained in S606 of Figure 6, it transitions to the tracking state (ST102), and when the distance between subjects increases, it transitions back to the tracking standby state (ST101). Furthermore, in the tracking state (ST102), if it is determined that the number of subjects within the automatic selection area is greater than a predetermined number, it transitions to the tracking stop state (ST103). The aim of this state transition is to prevent the PTZ camera 100 from being controlled in an unintended direction due to the AI ​​misdetecting and mistracking the subject it should capture if an unexpected subject enters the automatic selection area.

[0091] Furthermore, if the system determines that the number of subjects within the automatically selected area exceeds a predetermined number while in tracking standby mode (ST101), it transitions to tracking stop mode (ST103) to prevent the aforementioned inappropriate control of the PTZ camera 100. The above describes the states and state transition conditions of the edge AI device 200 regarding its characteristic behavior.

[0092] Next, the control flowcharts for the edge AI device 200 in each state will be explained with reference to Figures 10(a) to 10(c). Figures 10(a) to 10(c) are flowcharts based on the subject tracking flowchart in Figure 6, with added control according to the number of detected subjects. Below, we will explain the control flowcharts of the edge AI device 200 in each state from tracking standby state (ST101) to tracking stopped state (ST103), starting with Figure 10(a), which is the control flowchart of the edge AI device 200 in ST101.

[0093] Figure 10(a) is a control flowchart for the tracking standby state (ST101), which is the initial state to which the edge AI device 200 transitions after power-on. Starting with this flowchart, Figures 10(a) to 10(c) continue the predetermined loop processing unless a trigger for a state transition occurs. In S801, the CPU 201 of the edge AI device 200 sequentially reads the captured video stored in RAM 202 and transfers it to the inference unit 207. This process is the same as in S601.

[0094] In S802, the inference unit 207 detects a subject from the captured video and writes the inference result information to the RAM 202. This process is the same as in S602. In S803, the CPU201 reads the coordinate information indicating the automatically selected region stored in RAM202 in S202 of Figure 3(b) above from RAM202. This process is the same as in S603. In S804, the CPU201 reads the position information of the rectangular area of ​​the subject from the inference results stored in RAM202 in S802, and counts the number of subjects present within the automatically selected area based on the position information of that rectangular area. This process is the same as in S604.

[0095] In S805, the CPU201 determines whether the number of subjects counted in S804 satisfies the condition that it is a predetermined number. This process is the same as in S605. Here, if the CPU201 determines that the number of subjects counted is 3, it proceeds to S806, rewrites the state information held by the device to tracking state ST102, and exits the loop. On the other hand, in S805, if the CPU201 determines that the number of subjects counted is not 3, it proceeds to S807.

[0096] In S807, the CPU201 determines whether the number of subjects counted is greater than three. If the CPU201 determines that the number of subjects counted is greater than three, the process proceeds to S808, where the state information held by the device is rewritten to the tracking stopped state ST103, and the loop process is exited. On the other hand, if the CPU201 does not determine in S807 that the number of subjects counted is greater than three, the tracking standby state ST101 is maintained, and the loop process continues.

[0097] Figure 10(b) is a control flowchart when the edge AI device 200 is in tracking mode (ST102). Note that steps S901 to S905 are the same as steps S801 to S805 in Figure 10(a), so their explanation is omitted. In S905, if the CPU 201 determines that the number of subjects counted is 3, it proceeds to S906 to determine whether the longest distance between subjects 900c is greater than or equal to a predetermined distance. If the longest distance between subjects 900c is not greater than or equal to the predetermined distance (i.e., less than the predetermined distance), it proceeds to S907. S907 is a subprocess that executes control to track the 3 detected subjects. This subprocess corresponds to S606 to S611 in Figure 6, and the CPU 201 calculates the pan, tilt, and zoom adjustment amounts based on the centroid positions of the 3 subjects and controls the PTZ camera 100. Once the processing in S907 is complete, it returns to S901 to continue the loop processing.

[0098] On the other hand, if CPU201 determines in S905 that the number of subjects counted is not three, it proceeds to S908 to determine whether the number of subjects is greater than three. If CPU201 determines in S908 that the number of subjects counted is greater than three, it proceeds to S911, and CPU201 stops the subject group tracking control. In S912, the CPU201 overwrites the status information held by the device to the tracking stopped state ST103 and exits the loop. On the other hand, in S908, if the CPU201 does not determine that the number of subjects counted is greater than 3, it proceeds to S909. S909 is a subprocess for controlling the PTZ camera 100 to take an overhead shot. This subprocess corresponds to S612 to S614 in Figure 6, and the CPU201 reads the pan, tilt, and zoom values ​​that indicate the overhead composition which are held in advance, generates a control command based on these values, and sends it to the PTZ camera 100. This allows the PTZ camera 100 to be controlled to an overhead composition that has been adjusted in advance by the user. In S910, the CPU201 overwrites the status information held by the device to the tracking standby state ST101 and exits the loop. Furthermore, after stopping the subject tracking control in S911, a subprocess may be executed to control the aforementioned PTZ camera 100 to perform an overhead shot in order to reset the system state. In addition, the timing of the execution of this subprocess can vary, such as immediately after stopping the subject tracking control or after a predetermined time has elapsed, but it may be changed to be executed at the most optimal timing as appropriate, and the execution timing is not limited.

[0099] Figure 10(c) is a control flowchart for when the edge AI device 200 is in the tracking stop state (ST103). Note that steps S1001 to S1005 are the same as steps S801 to S805 in Figure 10(a), so their explanation is omitted. In S1005, if the CPU201 determines that the number of subjects counted is 3 or less (or less than or equal to a predetermined number), it proceeds to S1006, overwrites the status information held by the device to the tracking standby state ST101, and exits the loop. On the other hand, if the CPU201 determines in S1005 that the number of subjects counted is not 3 or less, it returns to S1001 as part of the loop process. However, if the CPU201 determines in S1005 that the number of subjects counted is 3 or less (or less than a predetermined number), it may change the status information held by the device from tracking standby status ST101 to tracking status ST102.

[0100] In this embodiment, we have shown an example where the threshold for the number of subjects to be counted is set to 3 people in advance. However, for example, the user may be provided with a method to register images of people to the edge AI device 200 in advance, and the number of people registered therein may be used as the threshold. Furthermore, it is conceivable that the inference results of the inference unit 207 may include false detections in the aforementioned count of subjects. For example, the system may output a result indicating that a person is present in a location where no person is present in the image. In such cases, the size of the rectangular region that can be calculated from the subject's location information included in the detection result tends to be extremely small. To avoid including such false detection results in the count, a process may be added to exclude from counting if the size of the rectangular region calculated from the inferred subject's location information is less than or equal to a predetermined size. Also, depending on the AI ​​model used, it is conceivable that the size of the rectangular region resulting from the detection result may be extremely large. In such cases, a threshold may be set to exclude detection results larger than a predetermined size from counting. That is, if the size of the rectangular region that can be calculated from the subject's location information is outside a predetermined range, it can be excluded from the count of subjects.

[0101] Furthermore, regarding the aforementioned method of counting subjects, the inference unit 207 may further provide a function to output a vector representing the visual features of the person regions in the image that it has inferred, and a method using the result may be applied. Specifically, the inference unit 207 can extract feature quantities in each person region, determine whether they are different people based on whether the distance between the feature quantity vectors is a predetermined distance, and count the number of people determined to be different people to count the number of subjects. Furthermore, in this embodiment, an example was described in which subject tracking is stopped when more than a predetermined number of people enter the area. However, if, for example, a second in the example of a competitive sport enters the area, different subject tracking control may be implemented. For example, in order to film the interaction between the second and the athlete, the inference unit 207 may recognize the second and perform control to zoom in on both the second and the athlete.

[0102] As described above, according to this embodiment, it is possible to switch between tracking a subject and stopping subject tracking based on the number of subjects counted. Specifically, this involves updating to the tracking stop state in S808 of Figure 10(a) and S912 of Figure 10(b). These processes reduce the occurrence of unintended PTZ camera shooting direction control when the edge AI device 200 detects more subjects than expected and executes control to capture the desired group of subjects, by misidentifying the subject that should be photographed. In other words, according to this embodiment, unintended shooting can be suppressed when automatically photographing subjects.

[0103] <Second Embodiment> In the first embodiment, an example was described in which the edge AI device 200 detects subjects from the video footage captured by the PTZ camera 100 and decides whether to perform or stop group tracking control according to the number of subjects detected. The second embodiment is a modification in which the processing related to the aforementioned decision, which was performed by the edge AI device 200, is performed inside the PTZ camera 100. The following description will focus on the differences from the first embodiment.

[0104] Figure 11 shows an example of the configuration of the shooting system according to the second embodiment. As shown in Figure 11, in this embodiment, the shooting system has a PTZ camera 1100 and a PC 300 connected via a network 400. In this embodiment, the PTZ camera 1100 detects a subject from the captured image captured by the device and automatically tracks the subject by performing pan, tilt, and zoom operations according to the detection result. In this embodiment, the PTZ camera 1100 itself plays the role of an imaging control device that controls the image processing unit 1106, image sensor 1107, drive I / F 1108, and drive unit 1109, which will be described later. The PTZ camera 1100 in this embodiment also acquires the distance between subjects by measuring it and switches between tracking operation and overhead composition based on the distance between subjects. On the other hand, the PC 300 in this embodiment, similar to the first embodiment, performs various settings related to shooting and transmits the various setting information related to shooting to the PTZ camera 1100.

[0105] Figure 12 shows an example of the internal configuration of the PTZ camera 1100 and PC300 in the imaging system according to the second embodiment. The internal configuration of the PC300 and their operation in this embodiment are generally the same as those of the PC300 in the first embodiment, so a detailed explanation is omitted. However, in this embodiment, the device with which the PC300 communicates via the network I / F 304 is the PTZ camera 1100. Also, the image processing unit 1106, image sensor 1107, drive I / F 1108, and drive unit 1109 of the PTZ camera 1100 correspond to an example of the imaging unit. The CPU 1101 to internal bus 1110 of the PTZ camera 1100 are generally the same as the CPU 101 to internal bus 110 of the PTZ camera 100 in the first embodiment, so a detailed explanation of them is omitted.

[0106] The PTZ camera 1100 has an inference unit 1111. The inference unit 1111 infers the presence or absence of a subject and, if a subject exists, its position, etc., from the image data transferred from the image processing unit 1106 to the RAM 1102. The configuration and inference processing of the inference unit 1111 are generally the same as those of the inference unit 207 in the edge AI device 200 of the first embodiment, so a detailed explanation of them is omitted. Note that the processing of the inference unit 1111 may also be handled by the CPU 1101.

[0107] Next, the operation of each device in the imaging system of this embodiment will be described with reference to Figures 13 to 15. Note that the flowcharts in Figures 13, 14, and 15 correspond to Figures 3, 5, and 6 of the first embodiment, and the processing of each step is generally the same. Therefore, the following description will focus on the processing that differs from the first embodiment.

[0108] Figures 13(a) and 13(b) are flowcharts illustrating the setup process for various settings related to shooting for the automatically selected area in the shooting system of this embodiment. Figure 13(a) shows the operation flowchart of the PTZ camera 1100, and Figure 13(b) shows the operation flowchart of the PC 300. In this embodiment, the PC 300 generates various setting information related to shooting for the automatically selected area based on user operation and transmits it to the PTZ camera 1100. The PTZ camera 1100 then stores the various setting information related to shooting that it received from the PC 300.

[0109] In the flowchart of Figure 13(b) showing the setup operation of the automatically selected area on the PC300 side, the processes S901 to S904 are generally the same as the processes S101 to S104 in Figure 3(a) of the first embodiment, so their explanation is omitted. In S904, the CPU 301 of PC300 determines whether or not there has been input from the user via the operation unit 306, such as pressing the automatic selection area confirmation button 801. If it determines that there has been input, it exits the loop and proceeds to S905. In the S905, the CPU 301 reads coordinate information indicating the automatically selected region from the RAM 302 and transmits it to the PTZ camera 1100 via the network I / F 304.

[0110] Next, as shown in S801 of the flowchart in Figure 13(a), the CPU 1101 of the PTZ camera 1100 receives coordinate information indicating the automatically selected area sent from the PC 300 via the network I / F 1105. In the S802, the CPU 1101 writes the coordinate information indicating the automatically selected region that was received to the RAM 1102.

[0111] Figures 14(a) and 14(b) are flowcharts showing the setup process for various settings related to overhead composition shooting in the shooting system of this embodiment. Figure 14(a) shows the operation flowchart of the PTZ camera 1100, and Figure 14(b) shows the operation flowchart of the PC 300. In this embodiment, the PC 300 generates various setting information related to overhead composition shooting based on user operation and transmits it to the PTZ camera 1100. The PTZ camera 1100 then stores the various setting information related to shooting that it received from the PC 300.

[0112] First, let's explain the operation of the PC300 side by referring to Figure 14(b). The process in S1101 is generally the same as the process in S401 described in Figure 5(b) of the first embodiment, so its explanation will be omitted. The following loop process from S1102 to S1103 is also generally the same as the loop process from S402 to S403 in Figure 5(b) of the first embodiment, so its explanation will be omitted. In S1103, if the CPU 301 determines that the user has pressed the overhead view composition button 803 via the operation unit 306, it exits the loop and proceeds to S1104. In S1104, the CPU 301 sends a command (called a memory command) from the network I / F 304 to the PTZ camera 1100 instructing it to store the pan, tilt, and zoom values.

[0113] Next, we will explain the operation of the PTZ camera 1100 with reference to Figure 14(a). In S1001, the CPU 1101 of the PTZ camera 1100 receives memory commands sent from the PC 300 via the network interface 1105. In S1002, the CPU 1101 writes the pan, tilt, and zoom values ​​of its own device at the time it receives a memory command from the PC 300 to the RAM 1102 as values ​​for the overhead view.

[0114] Figure 15 is a flowchart showing the tracking operation performed by the PTZ camera 1100 after the setup of the automatic selection area and overhead composition described above has been completed in the shooting system of this embodiment. In the shooting system of this embodiment, the PTZ camera 1100 detects the position of the subject from the captured image and performs automatic tracking by panning, tilting, and zooming according to the position of the subject. In addition, the PTZ camera 1100 of this embodiment calculates the distance between subjects from the position of the subjects inferred by the inference unit 1111 and switches between automatic tracking and overhead composition based on that distance between subjects.

[0115] In this embodiment of the PTZ camera 1100, as in the first embodiment, the captured video footage, which is sequentially captured at a predetermined frame rate, is sequentially stored in the internal RAM 1102. The PTZ camera 1100 then detects a subject from the captured video footage stored in the RAM 1102 and performs loop processing to track that subject. The loop processing in S1201 to S1215 of Figure 15 is performed on the captured video footage for each frame.

[0116] In S1201, the CPU 1101 of the PTZ camera 1100 sequentially reads the captured video stored in the RAM 1102 and transfers it to the inference unit 1111. In S1202, the inference unit 1111 detects a subject from the captured video read from the RAM 1102 and writes the inference result information as the detection result back to the RAM 1102. The inference unit 1111 in this embodiment, like the inference unit 207 in the first embodiment, has a trained model created using machine learning techniques such as deep learning. It acquires captured video as input data and outputs the inference result as output data. The inference result, as described above, includes information such as the location, type, and probability of identifying a person (e.g., player or referee). The location information of the subject (person) includes the coordinates of the four vertices of a rectangular area, as well as information such as the width and height of the rectangular area.

[0117] In S1203, the CPU 1101 reads the coordinate information indicating the automatically selected region stored in the RAM 1102 in S802 of Figure 13(a) mentioned above. In S1204, the CPU 1101 reads the position information of the rectangular area of ​​the subject from the inference results stored in RAM 1102 in S1202, and counts the number of subjects present in the automatically selected area based on the position information of that rectangular area. The process of counting the number of people present in the automatically selected area is the same as that of the first embodiment described above.

[0118] In S1205, the CPU 1101 determines whether the number of subjects counted in S1204 is a predetermined number (3 people in this embodiment). If the CPU 1101 determines that the number of subjects counted is 3, it proceeds to S1206; otherwise, it skips the processes from S1206 to S1212 and proceeds to the next loop process.

[0119] In this embodiment, as in the first embodiment, if the number of subjects becomes less than three in S1205 after it has been determined that there are three subjects and tracking has started, the CPU 1101 may fix the pan, tilt, and zoom values. Subsequently, when two players return to the automatic selection area and the CPU 1101 determines in S1205 that there are three subjects in the automatic selection area, the process proceeds to S1206, and control of the PTZ camera 1100 is resumed.

[0120] In S1206, the CPU 1101 obtains the longest distance between subjects included in the automatically selected area and determines whether this longest distance is greater than or equal to a predetermined distance. The predetermined distance is the same distance threshold as in the first embodiment. In S1206, if the longest distance between subjects is less than the predetermined distance, the CPU 1101 proceeds to S1207.

[0121] In S1207, the CPU 1101 determines that the three subjects detected within the automatically selected area are the subjects to be tracked, and calculates the centroid positions of those three subjects, similar to the first embodiment. In S1208, CPU1101 determines whether the centroid position calculated in S1207 matches the center position of the field of view in the captured video. If CPU1101 determines that the centroid position matches the center position of the field of view, it skips the subsequent processing and proceeds to the next loop. On the other hand, if CPU1101 determines that the centroid position does not match the center position of the field of view, it proceeds to S1209.

[0122] In S1209, the CPU 1101 calculates the difference between the center of gravity calculated in S1207 and the center of the field of view on the captured image, and calculates the amount of pan and tilt adjustment corresponding to that difference. The CPU 1101 also calculates the amount of zoom adjustment so as to keep the size of the rectangular area of ​​the subject roughly constant. As in the first embodiment, zoom adjustment may be performed based on the size of a person's organs, such as the size of their face. The size of the rectangular area of ​​the subject may be the size of the rectangular area of ​​one of the subjects present in the automatically selected area, or it may be the average size of each rectangular area. Alternatively, the amount of zoom adjustment may be calculated so that the size of the bounding rectangular area encompassing three subjects remains constant.

[0123] In S1210, CPU1101 calculates the drive value corresponding to the adjustment amount in the pan and tilt directions, and also calculates the lens drive direction and drive amount values ​​corresponding to the adjustment amount in the zoom direction. In S1211, CPU1101 derives (calculates) drive parameters for pan, tilt, and zoom operation based on the values ​​calculated in S1210. In S1212, the CPU 1101 controls the drive unit 1109 via the drive I / F 1108 based on the drive parameters derived in S1211. The drive unit 1109 performs the operation based on these drive parameters, causing the PTZ camera 1100 to change its shooting direction (pan and tilt operation) and also change its field of view. After S1212, the CPU 1101 returns processing to S1201, which is the beginning of the loop processing.

[0124] On the other hand, if in S1206 the longest distance between subjects is determined to be greater than or equal to a predetermined distance and the process proceeds to S1213, the CPU 1101 reads the pan, tilt, and zoom values ​​corresponding to the overhead composition written in S1002 from the RAM 1102. The CPU 1101 then determines these pan, tilt, and zoom values ​​as the tracking target position. In other words, by determining the pan, tilt, and zoom values ​​written to the RAM 1102 in S1002 as the tracking target position, the composition of the PTZ camera 1100 is switched to an overhead composition.

[0125] In S1214, the CPU 1101 derives drive parameters for panning and tilting in a desired direction and at a desired speed, as well as drive parameters for adjusting the field of view, from the pan, tilt, and zoom values ​​indicating the overhead composition read in S1213. In S1215, the CPU 1101 controls the drive unit 1109 via the drive I / F 1108 based on the drive parameters derived in S1214. As a result, the drive unit 1109 drives based on these parameters, causing the PTZ camera 1100 to change its shooting direction and angle of view. After S1215, the CPU 1101 returns to S1201, which is the beginning of the loop processing. Thus, with the shooting system of this embodiment, the composition and camera work of the PTZ camera can be switched according to the situation in competitive sports, such as at the start of a competition, at the end of a competition, or during a competition.

[0126] <Description of characteristic operation> The characteristic operation of the imaging system, which uses the aforementioned control as its basic operation, will be explained with reference to Figure 16. The characteristic operation in this embodiment, like that of the first embodiment, is the operation in which the PTZ camera 1100 changes its tracking state based on the state transition diagram shown in Figure 9. The state transition conditions are the same as in Figure 9, so their explanation will be omitted. Furthermore, the flowchart in Figure 16 corresponds to Figure 10 of the first embodiment, and the processing of each step is generally the same; therefore, the explanation will focus on the processing that differs from the first embodiment.

[0127] Figure 16(a) is a control flowchart of the PTZ camera 1100 in tracking standby mode (ST101). Detailed explanations of each control are omitted as they are the same as those in Figure 10(a). Figure 16(b) is a control flowchart for the PTZ camera 1100 in tracking mode (ST102). Figure 16(b) is executed after the control related to updating the tracking state in S1306 in Figure 16(a) has been performed. The control flowchart in Figure 16(b) is generally similar to that in Figure 10(b), but the differences will be explained below. The subprocess executed in S1407 corresponds to S1206 to S1212 in Figure 15. In the first embodiment, the process corresponding to S1212 was the transmission of a control command to the PTZ camera 100 by the edge AI device 200 shown in S611 in Figure 6(a), but this is modified so that the CPU 1101 controls the drive unit 1109 of the PTZ camera 1100. Also, the subprocess executed in S1409 corresponds to S1213 to S1215 in Figure 15. Here again, in the first embodiment, the process corresponding to S1215 was the transmission of a control command to the PTZ camera 100 by the edge AI device 200 shown in S614 in Figure 6(a), but this is modified so that the CPU 1101 controls the drive unit 1109 of the PTZ camera 1100.

[0128] Figure 16(c) is a control flowchart for the PTZ camera 1100 in the tracking stop state (ST103). Detailed explanations of each control are omitted as they are the same as those in Figure 10(c).

[0129] As described above, by modifying the system to infer the subject position within the PTZ camera 1100 and control the drive unit 1109 within the device itself, an edge AI device becomes unnecessary, and the same effects as the first embodiment can be obtained even with a simpler configuration.

[0130] The present invention can also be implemented by supplying a program that implements one or more of the functions of the above-described embodiments to a system or device via a network or storage medium, and by having one or more processors in the computer of that system or device read and execute the program. Furthermore, it can also be implemented by a circuit (e.g., an ASIC) that implements one or more functions. The above-described embodiments are merely examples of concrete implementations of the present invention, and the technical scope of the invention should not be interpreted as being limited by them. That is, the present invention can be implemented in various forms without departing from its technical concept or its main features.

[0131] In the embodiments described above, the case where the imaging device is a PTZ camera was explained, but the invention is not limited to this case. It is sufficient that at least one of the pan / tilt direction and zoom value can be changed, and the invention is not limited to a PTZ camera.

[0132] Furthermore, the disclosure of this embodiment includes the following configurations, etc. (Composition 1) An acquisition means for acquiring images captured by the imaging unit, A control means for controlling the imaging unit to track a subject included in the image based on the image acquired by the acquisition means, An imaging control device having a counting means for counting the number of subjects included in an image acquired by the acquisition means, The control means is An imaging control device characterized by controlling the imaging unit to switch between tracking the subjects and stopping the tracking of the subjects based on the number of subjects counted by the counting means. (Configuration 2) The control means is The imaging control device according to Configuration 1, characterized in that, when the number of subjects counted by the counting means is greater than a predetermined number, the imaging unit controls the imaging unit to stop tracking the subjects from the state in which it is tracking the subjects. (Composition 3) The control means is The imaging control device according to configuration 1 or 2, characterized in that, if the number of subjects counted by the counting means is greater than a predetermined number, the imaging unit is further controlled to set the shooting direction and field of view of the imaging unit to a predetermined shooting direction and field of view. (Composition 4) The control means is An imaging control device according to any one of configurations 1 to 3, characterized in that when the number of subjects counted by the counting means falls below a predetermined number from a state in which tracking of the subjects is stopped, the imaging unit is controlled to start tracking the subjects again. (Composition 5) The system includes a measuring means for measuring the distance between subjects based on the image acquired by the acquisition means, The control means is An imaging control device according to any one of configurations 1 to 4, characterized in that when the number of subjects counted by the counting means falls below a predetermined number from a state where it is greater than a predetermined number, the imaging unit is controlled to start tracking the subjects from a state where it has stopped tracking the subjects, based on the distance between subjects measured by the measuring means. (Composition 6) The control means is An imaging control device according to any one of configurations 1 to 4, characterized in that when the number of subjects counted by the counting means falls from a state where it is greater than a predetermined number to a predetermined number or less, and the distance between subjects measured by the measuring means becomes smaller than a predetermined distance, the imaging unit is controlled to start tracking the subjects from a state where it has stopped tracking the subjects. (Composition 7) The imaging control device according to configuration 5 or 6, characterized in that the distance between the subjects is the distance between the furthest subjects among the multiple subjects. (Composition 8) The system includes a measuring means for measuring the distance between subjects based on the image acquired by the acquisition means, The control means is An imaging control device according to any one of configurations 1 to 4, characterized in that, based on the number of subjects counted by the counting means and the distance between subjects measured by the measuring means, the imaging unit is controlled to either track the subjects or set the shooting direction and field of view of the imaging unit to a predetermined shooting direction and field of view. (Composition 9) It has a registration means for registering the number of subjects, The imaging control device according to any one of configurations 2 to 5, characterized in that the predetermined number is the number of subjects registered in the registration means. (Composition 10) The system includes a calculation means for calculating the size of a subject based on the image acquired by the acquisition means, The aforementioned counting means is An imaging control device according to any one of configurations 1 to 9, characterized in that if the size of the subject calculated by the calculation means is outside a predetermined range, it is excluded from the number of subjects to be counted. (Composition 11) The system includes an extraction means for extracting feature quantities of a subject based on the image acquired by the acquisition means, The imaging control device according to any one of configurations 1 to 9, characterized in that the counting means counts the number of subjects based on the feature quantities extracted by the extraction means. (method) An acquisition step to acquire images captured by the imaging unit, A control step involves controlling the imaging unit to track a subject included in the image based on the image acquired in the acquisition step, An imaging control method comprising a counting step for counting the number of subjects included in the image acquired by the acquisition step, In the control step described above, An imaging control method characterized by controlling the imaging unit to switch between tracking the subjects and stopping the tracking of the subjects based on the number of subjects counted in the counting step. (program) A program for causing a computer to function as one of the means of an imaging control device described in any one of configurations 1 to 11. [Explanation of symbols]

[0133] 100: PTZ camera 200: Edge AI device 201: CPU 207: Inference unit 300: PC 400: Network

Claims

1. An acquisition means for acquiring images captured by the imaging unit, A control means for controlling the imaging unit to track a subject included in the image based on the image acquired by the acquisition means, An imaging control device having a counting means for counting the number of subjects included in an image acquired by the acquisition means, The control means is An imaging control device characterized by controlling the imaging unit to switch between tracking the subjects and stopping the tracking of the subjects based on the number of subjects counted by the counting means.

2. The control means is The imaging control device according to claim 1, characterized in that, if the number of subjects counted by the counting means is greater than a predetermined number, the imaging unit is controlled to stop tracking the subjects from the state in which it is tracking the subjects.

3. The control means is The imaging control device according to claim 2, characterized in that, if the number of subjects counted by the counting means is greater than a predetermined number, the imaging unit is further controlled to set the shooting direction and field of view of the imaging unit to a predetermined shooting direction and field of view.

4. The control means is The imaging control device according to claim 2, characterized in that when the number of subjects counted by the counting means falls below a predetermined number from a state in which tracking of the subjects is stopped, the imaging unit is controlled to start tracking the subjects again.

5. The system includes a measuring means for measuring the distance between subjects based on the image acquired by the acquisition means, The control means is The imaging control device according to claim 4, characterized in that when the number of subjects counted by the counting means falls below a predetermined number from a state where it is greater than a predetermined number, the imaging unit is controlled to resume tracking the subjects from a state where it has stopped tracking the subjects, based on the distance between subjects measured by the measuring means.

6. The control means is The imaging control device according to claim 5, characterized in that when the number of subjects counted by the counting means falls below a predetermined number from a state where it is greater than a predetermined number, and the distance between subjects measured by the measuring means falls below a predetermined distance, the imaging unit is controlled to start tracking the subjects from a state where it has stopped tracking the subjects.

7. The imaging control device according to claim 5 or 6, characterized in that the distance between the subjects is the distance between the furthest subjects among the multiple subjects.

8. The system includes a measuring means for measuring the distance between subjects based on the image acquired by the acquisition means, The control means is The imaging control device according to claim 1 or 2, characterized in that, based on the number of subjects counted by the counting means and the distance between subjects measured by the measuring means, the imaging unit is controlled to either track the subjects or to set the shooting direction and field of view of the imaging unit to a predetermined shooting direction and field of view.

9. It has a registration means for registering the number of subjects, The imaging control device according to claim 2, characterized in that the predetermined number is the number of subjects registered in the registration means.

10. The system includes a calculation means for calculating the size of a subject based on the image acquired by the acquisition means, The aforementioned counting means is The imaging control device according to claim 1 or 2, characterized in that if the size of the subject calculated by the calculation means is outside a predetermined range, it is excluded from the number of subjects to be counted.

11. The system includes an extraction means for extracting feature quantities of a subject based on the image acquired by the acquisition means, The imaging control device according to claim 1 or 2, characterized in that the counting means counts the number of subjects based on the feature quantities extracted by the extraction means.

12. An acquisition step to acquire images captured by the imaging unit, A control step involves controlling the imaging unit to track a subject included in the image based on the image acquired in the acquisition step, An imaging control method comprising a counting step for counting the number of subjects included in the image acquired by the acquisition step, In the control step described above, An imaging control method characterized by controlling the imaging unit to switch between tracking the subjects and stopping the tracking of the subjects based on the number of subjects counted in the counting step.

13. A program for causing a computer to function as each of the means of the imaging control device described in claim 1.