Video analysis system, video analysis method, and video analysis program
The video analysis system improves accuracy by preprocessing video data through rotation or target area modification, allowing users to select the best analysis result, enhancing usability and efficiency without changing the AI model.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- SAFIE INC
- Filing Date
- 2025-06-28
- Publication Date
- 2026-06-18
AI Technical Summary
Existing video analysis systems rely on AI models that require regeneration to maintain accuracy when the camera installation environment changes, but there is a need to improve analysis accuracy without changing the AI model itself.
A video analysis system that preprocesses video data by rotating or modifying the analysis target area before inputting it into the AI model, allowing users to select the most accurate analysis result based on user input, thereby improving analysis accuracy without altering the AI model.
Enhances video analysis accuracy by preprocessing the input video, enabling users to quickly determine the most suitable analysis result, thus improving usability and reducing analysis time.
Smart Images

Figure 0007876039000001_ABST
Abstract
Description
【Technical Field】 【0001】 The present disclosure relates to a video analysis system, a video analysis method, and a video analysis program. 【Background Art】 【0002】 In recent years, in various industries, digital transformation (DX) of the work site using video data has been progressing. In particular, analysis of video data showing the work site is performed by using an AI model (specifically, an object detection model) constructed by machine learning or deep learning. Improvement of the analysis accuracy of video data is an important factor in the development of DX at the work site, and the analysis accuracy depends on three factors: the attributes of the camera, the installation environment of the camera, and the AI model. In this regard, Patent Document 1 discloses a technique for regenerating an AI model when the installation environment of the camera (particularly, the installation angle of the camera) changes in order to ensure the analysis accuracy of video data. 【Prior Art Documents】 【Patent Documents】 【0003】 【Patent Document 1】 Japanese Patent No. 7159503 【Summary of the Invention】 【Problems to be Solved by the Invention】 【0004】 By the way, in the system disclosed in Patent Document 1, the analysis accuracy of video data is ensured by changing the AI model according to changes in the installation environment of the camera. However, there is a demand to ensure the analysis accuracy of video data without changing the AI model itself. In this regard, a method of improving the analysis accuracy of video by preprocessing the video data input to the AI model without changing the AI model can be considered. Furthermore, when adopting such a method, there is room for earnest consideration regarding the usability of the video analysis system provided to the user. 【0005】 In view of the above, this disclosure aims to provide a video analysis system, a video analysis method, and a video analysis program that can improve the accuracy of video analysis without changing the AI model. [Means for solving the problem] 【0006】 A video analysis system according to one aspect of the present disclosure is a video analysis system that performs video analysis on video captured by a camera by inputting the video into an AI model, the system generates a second video by performing video preprocessing on a first video, outputs a first analysis result of the first video by inputting the first video into the AI model, outputs a second analysis result of the second video by inputting the second video into the AI model, presents the first and second analysis results to the user, selects either the first or second analysis result according to the user's input operation, and performs video analysis on the video corresponding to the selected analysis result by inputting the video to the AI model. 【0007】 According to the above configuration, the user is presented with the first analysis result of the first video and the second analysis result of the second video after preprocessing. The user then selects either the first or second analysis result through user input. Subsequently, the video corresponding to the selected analysis result is input to the AI model, and video analysis is performed on that video. In this way, the accuracy of video analysis can be improved by preprocessing the video input to the AI model without changing the AI model itself. 【0008】 Furthermore, the video analysis system may generate the second video by rotating the first video to a predetermined angle. 【0009】 With the above configuration, the accuracy of video analysis can be improved by applying rotation processing to the video input to the AI model. 【0010】 Alternatively, the second video may be generated by using a portion of the analysis target area of the first video as the analysis target area of the second video. 【0011】 With the above configuration, the accuracy of video analysis can be improved by changing the analysis target area of the video input to the AI model. 【0012】 Furthermore, the first analysis result may be an image showing the detection status of an object within the first video. The second analysis result may be an image showing the detection status of an object within the second video. 【0013】 With the above configuration, the user can compare the detection status of objects in the first video with the detection status of objects in the second video to determine which of the two videos is more suitable for video analysis (more specifically, the video that will produce more accurate analysis results). 【0014】 Furthermore, the first analysis result may be numerical information indicating the detection status of an object in the first video. The second analysis result may also be numerical information indicating the detection status of an object in the second video. 【0015】 With the above configuration, the user can compare numerical information indicating the detection status of an object in the first video with numerical information indicating the detection status of an object in the second video to determine which of the two videos is more suitable for video analysis (more specifically, the video that will produce higher accuracy in the analysis results). 【0016】 Furthermore, the video analysis system may present the first and second analysis results to the user in a manner that allows the user to visually confirm that the accuracy of one of the first and second analysis results is better than the accuracy of the other analysis result. 【0017】 With the above configuration, users can quickly determine which of the first or second video is more suitable for video analysis. 【0018】 The video analysis system may display the first analysis result and the second analysis result in a chat format. 【0019】 With the above configuration, users can determine which of the two videos is more suitable for video analysis by looking at the first and second analysis results displayed in a chat format. In this way, it is possible to provide a video analysis system that can dramatically improve usability. 【0020】 Furthermore, the video analysis system may output a first analysis result of the first video by inputting a portion of the first video into the AI model, and output a second analysis result of the second video by inputting a portion of the second video into the AI model. 【0021】 With the above configuration, it is possible to determine which of the two videos is more suitable for video analysis based on the first analysis result obtained from a portion of the first video and the second analysis result obtained from a portion of the second video. Therefore, it is possible to streamline video analysis using the AI model and reduce the time required for such video analysis. 【0022】 The video analysis system may perform video analysis on the entire selected video by inputting the entire selected video into the AI model. 【0023】 According to the above configuration, the user determines which of the two videos is more suitable for video analysis based on the first analysis result of the first video and the second analysis result of the second video. Subsequently, video analysis is performed on the entire video selected through the user's input. In this way, the user can confirm which of the two videos is more suitable for video analysis before performing video analysis on the entire video. Therefore, video analysis using the AI model can be made more efficient, and the time required for such video analysis can be reduced. 【0024】 The video analysis system may present information regarding the installation of the camera to the user based on the first analysis result and the second analysis result. 【0025】 According to the above configuration, the user can consider the optimal installation angle and position of the camera for improving the analysis accuracy based on the information regarding the installation of the camera. 【0026】 A video analysis system according to an aspect of the present disclosure is a video analysis system that performs video analysis on a video captured by a camera by inputting the video into an AI model, generates a second video by performing video preprocessing on a first video, outputs a first analysis result of the first video by inputting the first video into the AI model, outputs a second analysis result of the second video by inputting the second video into the AI model, calculates the accuracy of the first analysis result and the accuracy of the second analysis result, automatically selects either one of the first analysis result and the second analysis result based on a comparison between the accuracy of the first analysis result and the accuracy of the second analysis result, and performs video analysis on the video corresponding to the selected analysis result by inputting the video into the AI model. 【0027】 According to the above configuration, based on the comparison between the accuracy of the first analysis result and the accuracy of the second analysis result, either one of the first analysis result and the second analysis result is automatically selected, and video analysis on the video corresponding to the selected analysis result is performed by inputting the video into the AI model. Thus, it is possible to improve the accuracy of video analysis through preprocessing of the video input into the AI model without involving the user's visual judgment. 【0028】 Furthermore, the first analysis result may be first numerical information indicating the detection status of an object in the first video. The second analysis result may be second numerical information indicating the detection status of an object in the second video. The video analysis system may acquire third numerical information indicating the correct value for an object shown in the first video through user input operations, calculate the accuracy of the first analysis result based on the first numerical information and the third numerical information, and calculate the accuracy of the second analysis result based on the second numerical information and the third numerical information. 【0029】 With the above configuration, third numerical information is obtained through user input operations, and the accuracy of the first and second analysis results can be calculated based on this third numerical information. 【0030】 A video analysis method according to one aspect of the present disclosure is a video analysis method that performs video analysis on video captured by a camera by inputting the video into an AI model, and includes the steps of: generating a second video by performing video preprocessing on a first video; outputting a first analysis result of the first video by inputting the first video into the AI model; outputting a second analysis result of the second video by inputting the second video into the AI model; presenting the first analysis result and the second analysis result to a user; selecting either the first analysis result or the second analysis result in response to the user's input operation; and performing video analysis on the video corresponding to the selected analysis result by inputting the video into the AI model. 【0031】 According to the above, the accuracy of video analysis can be improved by preprocessing the video input to the AI model without changing the AI model itself. 【0032】 Furthermore, a video analysis program may be provided that causes the video analysis system to execute the aforementioned video analysis method. [Effects of the Invention] 【0033】 According to this disclosure, it is possible to provide a video analysis system, a video analysis method, and a video analysis program that can improve the accuracy of video analysis without changing the AI model. [Brief explanation of the drawing] 【0034】 [Figure 1] This figure shows a video analysis system according to an embodiment of the present disclosure (hereinafter referred to as "this embodiment"). [Figure 2] This figure shows an example of a camera's hardware configuration. [Figure 3] This figure shows an example of a server hardware configuration. [Figure 4] This figure shows an example of the hardware configuration of an information terminal. [Figure 5] (a) is a diagram showing the image before rotation (original image). (b) is a diagram showing the image after rotation. [Figure 6] (a) is a diagram showing a state without distortion (a state in which no distortion aberration occurs). (b) is a diagram showing a state in which barrel distortion occurs. (c) is a diagram showing a state in which pincushion distortion occurs. [Figure 7A] This is a flowchart (part 1) illustrating the video analysis method according to this embodiment. [Figure 7B] This is a flowchart (part 2) illustrating the video analysis method according to this embodiment. [Figure 8] This figure shows an example of the video analysis settings screen. [Figure 9] This figure shows an example of the pre-analysis results screen. [Figure 10] This figure shows an example of a pre-analysis results table. [Figure 11] This is a flowchart illustrating the video analysis method according to the first modified example of this embodiment. [Figure 12] This figure shows an example of a video chat screen. [Figure 13] This is a flowchart illustrating a video analysis method according to a second modified example of this embodiment. [Modes for carrying out the invention] 【0035】 (Configuration of Video Analysis System 1) The video analysis system 1 according to this embodiment will be described below with reference to the drawings. Figure 1 is a diagram of the video analysis system 1 according to this embodiment. As shown in Figure 1, the video analysis system 1 comprises a camera 3, a server 4, and an information terminal 6. Each of these elements is connected to a communication network 7. The communication network 7 consists of at least one of the following: LAN (Local Area Network), WAN (Wide Area Network), the Internet, and a wireless core network. Note that the elements provided in the video analysis system 1 are not limited to these elements, and other elements not shown in Figure 1 may be further provided in the video analysis system 1. Also, the number of cameras 3 and information terminals 6 provided in the video analysis system 1 is not particularly limited. 【0036】 (Camera 3 configuration) Next, the hardware configuration of camera 3 will be described below with reference to Figure 2. Figure 2 is a diagram showing an example of the hardware configuration of camera 3. Camera 3 is configured to acquire video images showing its surrounding environment through shooting. Camera 3 is a network-type surveillance camera that monitors the surrounding environment and is communicated to server 4 via communication network 7. Each camera 3 can transmit video data (video stream) to server 4 in real time via communication network 7. Server 4 stores the video data transmitted by each camera 3. Furthermore, server 4 transmits video data to information terminal 6 in response to a transmission request from information terminal 6. 【0037】 As shown in Figure 3, the camera 3 comprises a control unit 31, a storage device 32, a position information acquisition unit 33, a communication unit 34, an input operation unit 35, an imaging unit 36, a PTZ mechanism 37, and a power supply circuit (not shown). These elements are connected to the bus 30. The camera 3 may also be equipped with a battery (not shown), a microphone, and a speaker. 【0038】 The control unit 31 includes memory and a processor. The memory is configured to store computer-readable instructions (programs). For example, the memory consists of ROM and RAM. The processor consists of at least one of the following: CPU, MPU, GPU, FPGA, and ASIC. The storage device 32 is a storage device such as an HDD, SSD, or flash memory, and is configured to store programs and various data. The location information acquisition unit 33 is configured to acquire location information (longitude and latitude) of the camera 3, and is, for example, a GPS receiver that receives signals from GPS satellites. 【0039】 The communication unit 34 is configured to connect the camera 3 to the communication network 7. The communication unit 34 includes, for example, a wireless communication module and / or a wired communication module. The input operation unit 35 is configured to receive input operations from the operator and to generate operation signals corresponding to the operator's input operations. The imaging unit 36 is configured to image the surrounding environment of the camera 3. In particular, the imaging unit 36 is configured to generate a video signal showing the surrounding environment of the camera 3 and comprises an optical system, an image sensor, and an image processing circuit. The optical system includes, for example, an optical lens and an optical filter. The image sensor is composed of a CCD (Charge-Coupled Device) or CMOS (Complementary MOS), etc. The image processing circuit is configured to process the video signal photoelectrically converted by the image sensor and includes, for example, an amplifier and an AD converter. 【0040】 The PTZ mechanism 37 comprises a pan mechanism, a tilt mechanism, and a zoom mechanism. The pan mechanism is configured to change the orientation of the camera 3 in the horizontal direction. The tilt mechanism is configured to change the orientation of the camera 3 in the vertical direction. The zoom mechanism is configured to enlarge (zoom in) or reduce (zoom out) the image showing the object being captured by changing the angle of view of the camera 3. The zoom mechanism may optically change the angle of view of the camera 3 by changing the focal length of the optical lens included in the imaging unit 36, or it may digitally change the angle of view of the camera 3. In response to an input operation by user U to the information terminal 6, instruction signals instructing pan, tilt, and / or zoom may be transmitted from the information terminal 6 to the camera 3 via the server 4. In this case, the control unit 31 drives the PTZ mechanism 37 in response to the received instruction signals to realize the pan, tilt, and zoom functions of the camera 3 in real time. 【0041】 (Server 4 configuration) Next, the hardware configuration of Server 4 will be described below with reference to Figure 3. Figure 3 is a diagram showing an example of the hardware configuration of Server 4. In this embodiment, Server 4 may be configured as a web server, an application server, and a video data server. Server 4 is configured to transmit data for the display screen shown on the information terminal 6 (e.g., HTML files, CSS files, image / video files, program files, etc.). Server 4 functions as a server for providing SaaS (System as a Service). Server 4 may be built on-premises or as a cloud server. Server 4 is communicated to the camera 3 and the information terminal 6 via the communication network 7. 【0042】 As shown in Figure 3, the server 4 comprises a control unit 41, a storage device 42, an input / output interface 43, and a communication unit 44. These elements are connected to a bus 40. The control unit 41 includes memory and a processor. The memory is configured to store computer-readable instructions. In particular, the memory may store a video analysis program that causes the processor to perform a series of processes to be executed by the server 4. The memory consists of ROM and RAM. The processor consists of at least one of a CPU, MPU, GPU, FPGA, and ASIC. 【0043】 The storage device 42 is, for example, a storage device such as an HDD, SSD, or flash memory, and is configured to store various programs and various data. The storage device 42 stores video data captured by each camera 3, various management tables, and various AI models (especially object detection models). The input / output interface 43 is an interface that enables connection between an external device and the server 4. The communication unit 44 may include various wired communication modules for communicating with external devices on the communication network 7. 【0044】 (Configuration of information terminal 6) Next, the hardware configuration of the information terminal 6 will be described below with reference to Figure 4. Figure 4 is a diagram showing an example of the hardware configuration of the information terminal 6. The information terminal 6 is operated by user U, who analyzes video data. The information terminal 6 may be, for example, a personal computer, a smartphone, a tablet, or a wearable device (e.g., AR / VR / MR glasses) worn by user U. The information terminal 6 has a web browser. The video analysis application provided by server 4 may run on the web browser of the information terminal 6. Alternatively, the video analysis application may be downloaded from server 4 and then installed on the information terminal 6. 【0045】 As shown in Figure 4, the information terminal 6 comprises a control unit 61, a storage device 62, an input / output interface 63, a communication unit 64, an input operation unit 65, a display unit 66, and an audio input / output unit 67. These elements are connected to the bus 60. 【0046】 The control unit 61 includes memory and a processor. The memory is configured to store computer-readable instructions. In particular, the memory may store a program that causes the processor to execute a series of processes performed by the information terminal 6. The memory consists of ROM and RAM. The processor consists of at least one of the following: CPU, MPU, GPU, FPGA, and ASIC. The storage device 62 is a storage device such as an HDD, SSD, or flash memory, and is configured to store programs and various data. 【0047】 The input / output interface 63 is an interface that enables connection between an external device and the information terminal 6. The communication unit 64 is configured to connect the information terminal 6 to the communication network 7. The communication unit 64 includes, for example, a wireless communication module and / or a wired communication module. The input operation unit 65 is, for example, a touch panel, mouse, and / or keyboard placed on top of the video display of the display unit 66, and is configured to receive input operations from user U and generate operation signals corresponding to those input operations. The display unit 66 is composed of, for example, a video display and a video display circuit that drives and controls the video display. The audio input / output unit 67 includes a signal processing circuit, an analog processing circuit, a microphone, and a speaker. 【0048】 (Regarding methods for improving the accuracy of video analysis through video preprocessing) Generally, the accuracy of analyzing video data using an AI model (specifically, an object detection model) depends on three factors: camera attributes, camera installation environment, and the AI model itself. In this regard, a challenge is that the accuracy of analyzing video data using an AI model can decrease depending on the camera attributes and installation environment. For example, while the analysis accuracy may be high when analyzing video captured by a camera with predetermined attributes using a predetermined AI model, the analysis accuracy may be low when analyzing video captured by a camera with different attributes using the same predetermined AI model. 【0049】 On the other hand, the video analysis method according to this embodiment makes it possible to improve the accuracy of video analysis by preprocessing the video data input to the AI model (for example, rotation processing or processing to change the area to be analyzed) without changing or regenerating the AI model itself. In particular, the video analysis system 1 according to this embodiment makes it possible to dramatically improve usability when improving the accuracy of video analysis without changing the AI model. 【0050】 (Regarding rotation processing) One method of preprocessing video to improve analysis accuracy is rotation (see Figure 5). Figure 5(a) shows the video before rotation (original video). Figure 5(b) shows the video after rotation. As shown in Figure 5, the angle θ between the detected object (car) and the horizontal axis of the video in the rotated video is larger than the angle θ between the detected object (car) and the horizontal axis of the video in the video before rotation. If this angle θ is not an appropriate value, the detection accuracy of the object by the AI model may decrease. Therefore, as shown in Figure 5(b), by rotating the video captured by the camera counterclockwise, it is possible to increase this angle θ and improve the detection accuracy of the object by the AI model. By inputting the rotated video into the AI model, it is possible to improve the accuracy of the video data analysis results (for example, the number of objects of each attribute passing through the detection lines set in the video) without changing the AI model itself. 【0051】 An AI model is a pre-trained model configured to detect each object contained within a video. The AI model may be built on a neural network architecture, including a convolutional neural network (CNN). 【0052】 A concrete example of AI model processing is that it extracts input video data frame by frame in a time series, and preprocessing (e.g., resizing, normalization, noise reduction) is performed on each frame. Then, the AI model extracts visual features (edges, textures, shapes, etc.) from each frame using a CNN or similar algorithm, and subsequently generates bounding boxes (rectangular regions) surrounding objects on each frame using an object detection algorithm (e.g., YOLO, SSD, Faster R-CNN, etc.). Furthermore, the AI model calculates a confidence score (an index value similar to probability) for each bounding box, and if the score is above a predetermined threshold, it can be determined that a predetermined object (e.g., a car, truck, bicycle, etc.) exists within that bounding box. 【0053】 Furthermore, the AI model may be constructed through supervised learning using training data that includes image data and annotation information related to object detection (labels, bounding boxes, etc., added to identify and detect objects). Such training data is created by performing annotation work on the image data. 【0054】 Server 4 may also use an AI model to count the number of objects that pass through predefined detection lines in the video data, categorized by attribute. To count the number of objects that pass through the detection lines by attribute, first, the video data is input to the AI model (object detection model), and the detection position (bounding box), class label (attribute), and confidence score of each object are output for each frame. Next, a tracking process (e.g., Kalman filter and ID assignment process based on appearance features) assigns a unique identification number (ID) to each object, and the time-series movement of the objects is tracked. Based on the positional information of each object, such as its center point, it is determined whether or not each object has passed through the detection line (passage determination process). Finally, objects that are determined to have passed through the detection line are classified according to the outputted attribute (car, truck, bicycle, etc.) and counted by attribute (counting process). 【0055】 As an example of the results of analyzing video data, in order to obtain information on the number of objects of each attribute passing through the detection line in the video, object detection processing, tracking processing, passage determination processing, and counting processing are performed using an AI model. 【0056】 Server 4 may also use an AI model to count, by attribute, the number of objects entering a predefined detection area in the video data. In this case, object detection processing, tracking processing, entry determination processing, and counting processing using the AI model are executed. In the entry determination processing, it is determined whether or not each object has entered the detection area. 【0057】 (Change in the analysis target area) Another method of preprocessing video to improve analysis accuracy is to modify the video's analysis area. In particular, narrowing the video's analysis area can improve the accuracy of object detection by the AI model. Here, the video's analysis area refers to the area where the object is detected (in other words, the area to which a bounding box is assigned). 【0058】 In this regard, depending on the characteristics of the lens of camera 3, distortion may occur in the image captured by camera 3. Figure 6(a) shows a state without distortion (a state in which no distortion occurs). Figure 6(b) shows a state in which barrel distortion occurs. Figure 6(c) shows a state in which pincushion distortion occurs. In the barrel distortion shown in Figure 6(b), straight lines appear to be curved outwards from the center to the periphery of the image. This type of barrel distortion is likely to occur in images captured by cameras equipped with wide-angle lenses or fisheye lenses. On the other hand, in the pincushion distortion shown in Figure 6(c), straight lines appear to be curved inwards from the center to the periphery of the image, as if being pulled inwards. This type of pincushion distortion is likely to occur in images captured by cameras equipped with telephoto lenses or zoom lenses. 【0059】 If an object is present in a distorted area of the camera image, there is a possibility that the object may not be detected correctly or that its attributes may be misidentified. Alternatively, an object, such as a car, may be detected in the distorted area even though it is not actually present. Thus, when distortion occurs in the image, the accuracy of object detection decreases, and as a result, the accuracy of image analysis may also decrease. 【0060】 Therefore, when distortion occurs in the captured image (especially when the image is captured with a camera using a lens that causes distortion), it is preferable to make the analysis target areas 82 and 83 of the image smaller than the normal analysis target area 80 (see Figure 6(a)) which is assumed to be distortion-free, as shown in Figures 6(b) and 6(c). In other words, by excluding the peripheral parts of the image that are susceptible to distortion from the analysis target area, a decrease in detection accuracy can be prevented, and the accuracy of object detection by the AI model can be improved. 【0061】 Methods for reducing the area to be analyzed include removing unnecessary areas from the original video by trimming, and disabling areas outside the scope of analysis by masking. Thus, in this embodiment, methods such as rotating the video and changing the area to be analyzed are considered as preprocessing to improve the accuracy of the analysis. In the following description, it will be assumed that video rotation is performed as preprocessing of the video. 【0062】 (Video analysis method according to this embodiment: A series of processing steps performed by the video analysis system 1) Next, the video analysis method according to this embodiment will be described below, mainly with reference to Figures 7A and 7B. Figure 7A is a flowchart (part 1) for explaining the video analysis method according to this embodiment. Figure 7B is a flowchart (part 2) for explaining the video analysis method according to this embodiment. The series of processes shown in Figure 7B is a continuation of the series of processes shown in Figure 7A. 【0063】 As shown in Figure 7A, in step S1, the information terminal 6 sends a request to the server 4 to send the job registration screen 90 (see Figure 8). In response to the request, the server 4 (more specifically, the control unit 41) sends data (e.g., an HTML file, a CSS file, an image file, a program file, etc.) for displaying the job registration screen 90 to the information terminal 6 (step S2). Next, the information terminal 6 displays the job registration screen 90 on a web browser (step S3). 【0064】 As shown in Figure 8, the job registration screen 90 has an area 93 for selecting an AI model, an area 94 for selecting or uploading video to be analyzed, and an area 95 for selecting or uploading a detection area. The job registration screen 90 also has an area 97 for setting video analysis options and a button 98 for completing job registration. 【0065】 In step S4, the information terminal 6 accepts the selection of an AI model (particularly an object detection model) in response to user U's input operation for region 93. Server 4 stores multiple types of AI models, and user U selects one of these multiple AI models. Next, in step S5, the information terminal 6 accepts the selection of a video to be analyzed in response to user U's input operation for region 94. Server 4 stores videos captured by multiple cameras, and user U selects one of these videos as the video to be analyzed. 【0066】 In step S6, the information terminal 6 accepts the selection of a detection area in response to user U's input operation for region 95. The detection area is set in the video. The server 4 counts the number of objects that enter the detection area set in the video. Alternatively, a detection line may be selected instead of a detection area. Next, in step S7, the information terminal 6 accepts the selection of option settings related to video analysis in response to user U's input operation for region 97. Option settings may include the size of the AI model, the algorithm, the skip rate, etc. 【0067】 Next, when user U operates button 98, the information terminal 6 sends job registration information regarding the AI model, video, detection area, and options selected by user U to the server 4 (step S8). The server 4 then stores the transmitted job registration information (step S9). 【0068】 Before performing video analysis on the entire video selected by user U based on the job registration information, server 4 performs video analysis on a portion of the selected video (an example of the first video) and outputs an analysis result (an example of the first analysis result) (step S10). Specifically, server 4 inputs the selected video into the AI model and outputs an video with bounding boxes surrounding the objects shown in the video (i.e., an video showing the detection status of the objects) as the analysis result. The analysis result may be an image showing the detection status of objects that have entered a detection area set in the video, or an image showing the detection status of objects that have passed through a detection line set in the video. Furthermore, the analysis result may also be information on the number of objects of each attribute (cars, motorcycles, etc.) that have entered the detection area, or information on the number of objects of each attribute that have passed through a detection line. 【0069】 Furthermore, if the entire video is several hours long, the "part of the video" may be a video of several tens of seconds or several minutes. In this embodiment, video analysis of a part of the video is performed before video analysis of the entire video is performed. 【0070】 Next, in step S11, server 4 generates a rotated image by performing a rotation process on the selected image. Specifically, server 4 generates a rotated image (an example of a second image) by rotating the selected image by a predetermined angle. The "predetermined angle" may be specified by user U or automatically determined by server 4. In this step, the rotation process may be performed on a portion of the image that was the subject of the prior image analysis. 【0071】 In step S12, server 4 performs video analysis on a portion of the rotated video and outputs an analysis result (an example of a second analysis result) (step S12). Specifically, server 4 inputs a portion of the rotated video into an AI model and outputs an analysis result of a video with bounding boxes surrounding the objects shown in the rotated video. The analysis result may also be information regarding the number of objects of each attribute that entered the detection area set in the rotated video. Here, it is assumed that the playback time intervals of the portion of the video before rotation and the playback time intervals of the portion of the video after rotation coincide. 【0072】 In step S13, server 4 sends data to information terminal 6 to display a pre-analysis results screen 100 (see Figure 9) which shows multiple analysis results (in this example, the analysis results of the video before rotation and the analysis results of the video after rotation). Subsequently, information terminal 6 displays the pre-analysis results screen 100 on a web browser (step S14 in Figure 7B). 【0073】 As shown in Figure 9, the pre-analysis results screen 100 has an area 112 where rotation processing recommendations are displayed, an area 115 where the analysis target area is displayed, and an area 125 where correction processing (such as noise reduction) recommendations are displayed. The pre-analysis results screen 100 also has a button 128 for starting video analysis on the entire video. 【0074】 Area 112 displays the pre-rotation image 113 with bounding boxes BB and the post-rotation image 114 with bounding boxes BB. As shown in Figure 7B, in step S15, the information terminal 6 selects one of the two analysis results (image 113 and image 114) in response to the user U's input to the selection button 120 provided in area 112. In particular, the user U compares the pre-rotation image 113 with bounding boxes BB and the post-rotation image 114 to determine which of the two has higher accuracy. If the post-rotation image 114 has higher accuracy, the user U may select the post-rotation image 114. In this case, the user U operates the selection button 120. On the other hand, if the user U selects the pre-rotation image 113, the user U does not operate the selection button 120. 【0075】 Furthermore, the pre-rotation video 113 and the post-rotation video 114 may be displayed on the area 112 so that user U can visually confirm that the analysis accuracy of one of the pre-rotation video 113 and the post-rotation video 114 is better than the analysis accuracy of the other video. In this regard, if the analysis accuracy of the post-rotation video 114 is better than that of the pre-rotation video 113, a mark (for example, a circle or an X) indicating that the analysis accuracy of the post-rotation video 114 is higher may be shown in the video 114. In this way, by checking the mark, user U can quickly determine which of the pre-rotation video 113 and the post-rotation video 114 is more suitable for video analysis. 【0076】 Next, in step S16, the information terminal 6 accepts the selection of the analysis target area in response to the user U's input operation to the selection button 130 provided in area 115. Specifically, user U selects one of the two guide frames 117 and 118 displayed in area 115. The area enclosed by guide frames 117 and 118 becomes the analysis target area. Guide frame 117 is positioned along the outer edge of the image 113 positioned on the left. On the other hand, guide frame 118 is located inside the outer edge of the image 113 positioned on the right. Thus, the analysis target area indicated by guide frame 118 is smaller than the analysis target area indicated by guide frame 117. If user U selects the analysis target area indicated by guide frame 118, they operate the selection button 130. On the other hand, if user U selects the analysis target area indicated by guide frame 117, they do not operate the selection button 130. 【0077】 The analysis area indicated by the guide frame 118 corresponds to the analysis area in the initial settings. The size of the guide frame 117 (i.e., the size of the analysis area) may be automatically determined according to the lens characteristics (lens distortion characteristics) of the camera 3 that captured the selected image. In particular, the size of the guide frame 117 may be set to decrease as the distortion aberration occurring in the lens of the camera 3 increases. Furthermore, the user U may appropriately change the size and shape of the guide frame 117 through input operations on the guide frame 117 (e.g., drag and drop operations). In this way, the information terminal 6 may determine the analysis area indicated by the guide frame 117 according to the input operations of the user U. 【0078】 Next, in step S17, the information terminal 6 accepts the selection of an image correction process in response to the user U's input operation on the multiple checkboxes 127 provided in area 125. Specifically, the user U selects at least one of the various image correction processes (noise reduction, shadow / light effect correction, image stabilization, etc.) shown in area 125. If the user U does not select any of the checkboxes 127, no image correction process is selected. 【0079】 Next, when user U operates button 128, the information terminal 6 transmits the selected analysis result, analysis target area, and video correction processing information selected by user U to the server 4 (step S18). The server 4 then stores the transmitted selection information (step S19). 【0080】 In step S20, server 4 outputs the analysis results by performing video analysis on the entire video corresponding to the analysis result selected by user U. If the analysis result selected by user U is the rotated video 114, video analysis is performed on the entire rotated video 114. On the other hand, if the analysis result selected by user U is the unrotated video 113, video analysis is performed on the entire unrotated video 113. 【0081】 Specifically, Server 4 inputs the entire video footage into the AI model and outputs the video as an analysis result, with bounding boxes added to surround the objects shown in the video. The analysis result may also contain information about the number of objects of each attribute (such as cars and motorcycles) that entered the detection area set in the video. 【0082】 In step S21, server 4 sends data to information terminal 6 to display an analysis results screen (not shown) showing the analysis results output in step S20. Subsequently, the analysis results screen is displayed on the web browser of information terminal 6 (step S22). 【0083】 According to this embodiment, by comparing a portion of the pre-rotation video 113 with bounding boxes BB applied and a portion of the post-rotation video 114 with bounding boxes BB applied, the user U can determine, via the pre-analysis results screen 100, which of the pre-rotation video 113 and the post-rotation video 114 is more suitable for video analysis. Subsequently, video analysis is performed on the entire video selected through the user U's input operation, and the analysis results screen is displayed on the information terminal 6. In this way, before performing video analysis on the entire video, the user U can confirm which of the pre-rotation video 113 and the post-rotation video 114 is more suitable for video analysis. Therefore, it is possible to provide a video analysis system 1 that can streamline video analysis using an AI model, significantly reduce the time required for the video analysis, and dramatically improve usability. 【0084】 In this embodiment, two images are displayed in area 112 of the pre-analysis results screen 100: an image 113 before rotation and an image 114 after rotation, both of which have bounding boxes BB. However, three or more images may be displayed in area 112. For example, the image 113 before rotation, an image of image 113 rotated by +N degrees (including the bounding box), an image of image 113 rotated by +2N degrees (including the bounding box), and an image of image 113 rotated by +3N degrees (including the bounding box) may be displayed in area 112. Here, N is a natural number. Counterclockwise is the positive direction. User U may select one of the multiple analysis results displayed in area 112. 【0085】 Furthermore, in this embodiment, the analysis results of the video before rotation processing and the analysis results of the video after rotation processing are displayed in area 112 of the pre-analysis results screen 100, but this embodiment is not limited to this. In this regard, the analysis results of the video before a predetermined pre-processing other than rotation processing and the analysis results of the video after the predetermined pre-processing may also be displayed in area 112. After that, user U may select one of the two analysis results displayed side by side in area 112. For example, a predetermined pre-processing may be a process to change the analysis target area (e.g., trimming or masking). In this case, in step S11, server 4 performs trimming or masking (hereinafter referred to as the analysis target area modification process) on the original video so that the analysis target area of the original video becomes smaller. After that, in step S12, server 4 outputs the analysis results by performing video analysis on a part of the video after the analysis target area modification process. After that, the analysis results of the video before the analysis target area modification process and the analysis results of the video after the analysis target area modification process may be displayed in area 112 of the pre-analysis results screen 100. 【0086】 Furthermore, in this embodiment, the area 112 of the pre-analysis results screen 100 displays the pre-rotation video 113 with bounding boxes BB and the post-rotation video 114. However, the analysis results of videos 113 and 114 are not limited to videos 113 and 114 with bounding boxes BB. In this regard, the video analysis results may also be information regarding the number of objects of each attribute that entered the detection area or passed through the detection line in the video. In this case, the pre-analysis results table 200 may be displayed in area 112 (see Figure 10). Figure 10 is a diagram showing an example of the pre-analysis results table 200. The pre-analysis results table 200 shows information regarding the number of objects of each attribute for each rotation angle. For example, the analysis results of a video rotated by +10 degrees relative to video 113 show 42 cars, 14 small trucks, 6 trucks, 4 buses, 0 motorcycles, 0 bicycles, and 0 people. Table 200 of the pre-analysis results shows the analysis results of the video before rotation and the analysis results (number of objects by attribute) of the video rotated by +5 degrees, +10 degrees, +15 degrees, -5 degrees, -10 degrees, and -15 degrees. Here, counterclockwise is the positive direction and clockwise is the negative direction. 【0087】 If the pre-analysis results table 200 is displayed in area 112, in step S10, server 4 performs video analysis on a portion of the selected source video and outputs as analysis results information on the number of objects of each attribute that have entered the detection area set in the video. Here, in order to obtain information on the number of objects of each attribute, object detection processing, tracking processing, entry determination processing, and counting processing using an AI model are performed, respectively. 【0088】 In step S11, server 4 performs rotation processing on the selected original video at each angle (+5 degrees, +10 degrees, +15 degrees, -5 degrees, -10 degrees, -15 degrees) to generate videos rotated by +5 degrees, +10 degrees, +15 degrees, -5 degrees, -10 degrees, and -15 degrees (i.e., six videos after rotation processing). 【0089】 In step S12, Server 4 performs video analysis on each of the six rotated video portions and outputs information regarding the number of objects by attribute at each angle as an analysis result. Based on the information regarding objects by attribute at each angle, Server 4 generates a pre-analysis result table 200 and then transmits a pre-analysis result screen displaying the pre-analysis result table 200 to the information terminal 6 (step S13). 【0090】 User U can determine which angle's analysis result is the most accurate by comparing the information regarding the number of objects by attribute at the seven angles (0 degrees, +5 degrees, +10 degrees, +15 degrees, -5 degrees, -10 degrees, -15 degrees) shown in the pre-analysis results table 200. In this regard, it is assumed that User U has already grasped the information regarding the correct value of the number of objects by attribute by visually inspecting the original video before checking the pre-analysis results table 200. User U selects one of the seven analysis results by selecting one of the seven selection buttons 210, each associated with one of the seven analysis results. 【0091】 Furthermore, the pre-analysis results table 200 may be displayed in such a way that user U can visually see which of the seven analysis results (information on the number of objects by attribute) has the highest analysis accuracy. For example, if the analysis result at 10 degrees has the highest analysis accuracy, that analysis result may be highlighted in the pre-analysis results table 200. In this way, user U can recognize that the analysis result at 10 degrees has the highest analysis accuracy. In this case, the correct information regarding the number of objects by attribute may be transmitted in advance from the information terminal 6 to the server 4. 【0092】 Furthermore, in this embodiment, information regarding the installation of camera 3 may be displayed in a predetermined area of the pre-analysis results screen 100. In this regard, information indicating the recommended installation angle of camera 3 may be displayed in a predetermined area of the pre-analysis results screen 100. For example, if the accuracy of the analysis result at +15 degrees is the highest among the seven angles shown in the pre-analysis results table 200, the installation angle obtained by adding 15 degrees to the current installation angle of camera 3 may be displayed on the pre-analysis results screen 100 as the recommended installation angle. In this way, user U can consider the installation angle and position of camera 3 that will enable improved video analysis accuracy, based on the information regarding the installation of camera 3. 【0093】 In this embodiment, video analysis is performed on a portion of the video before the video analysis of the entire video is performed, but this embodiment is not limited to this. In this regard, in step S10 shown in Figure 7A, the analysis results may be output by performing video analysis on the entire selected video. Similarly, in step S12, the analysis results may be output by performing video analysis on the entire video after rotation processing. 【0094】 (Video analysis method according to the first modified example of this embodiment) Next, the video analysis method according to the first modified example of this embodiment will be described below with reference to Figures 11 and 12. Figure 11 is a flowchart for explaining the video analysis method according to the first modified example of this embodiment. Figure 12 is a diagram showing an example of a video chat screen 300. The first modified example differs from the video analysis method according to the present embodiment described above in that the analysis results of the video before rotation processing and the analysis results of the video after rotation processing are displayed on the video chat screen 300. Furthermore, it is assumed that the processes from steps S1 to S9 shown in Figure 7A have already been executed as a preliminary step before each process shown in Figure 11 is executed. That is, it is assumed that the job registration information has been stored in the server 4. 【0095】 As shown in Figure 11, in step S30, the server 4 sends data to the information terminal 6 to display the video chat screen 300 in response to a transmission request from the information terminal 6. Subsequently, the video chat screen 300 is displayed on the information terminal 6's web browser (step S31). 【0096】 As shown in Figure 12, the video chat screen 300 includes a video viewer area 310, a timeline 320, a slider 330, and a chat display area 340. The video viewer area 310 displays the video selected by user U. The timeline 320 shows the playback time of the video displayed in the video viewer area 310. The slider 330 can be slid along the timeline 320. User U can change the playback time of the video displayed in the video viewer by moving the slider 330 on the timeline 320. 【0097】 The chat display area 340 displays a series of chat exchanges between user U and server 4. When user U enters instructions in the chat display area 340, those instructions are sent from the information terminal 6 to server 4. Server 4 then performs predetermined processing based on the instructions and sends the execution results to the information terminal 6. The execution results are then displayed in the chat display area 340. 【0098】 Returning to Figure 11, in step S32, the information terminal 6 sends a request for analysis results for a selected portion of the video through the chat display area 340 of the video chat screen 300. In step S33, the server 4 performs video analysis on the selected portion of the video in response to the request and outputs the analysis results. Here, the video analysis results may be information regarding the number of objects of each attribute that entered the detection area or passed through the detection line in the video. After that, the server 4 sends the video analysis results to the information terminal 6 (step S34). In this way, the video analysis results are displayed in the chat display area 340 of the video chat screen 300 (step S35). 【0099】 If user U determines that the analysis accuracy of the analysis result is not high, they write a request for a revised analysis result for a portion of the video to the chat display area 340. Subsequently, the information terminal 6 sends a request for a revised analysis result for a portion of the video (step S36). In response to this request, the server 4 generates a rotated video by performing a rotation process on the video (step S37). In step S38, the server 4 outputs the analysis result by performing a video analysis on a portion of the rotated video. Subsequently, the server 4 sends the analysis result of the rotated video portion to the information terminal 6 (step S39). In this way, the analysis result of the rotated video portion is displayed in the chat display area 340 (step S40). Note that in step S37, predetermined preprocessing other than rotation processing may be performed on the video. 【0100】 If user U determines that the analysis result has high accuracy, they write an instruction to perform video analysis on the entire rotated video in the chat display area 340. Subsequently, the information terminal 6 sends the instruction to perform video analysis on the entire rotated video to the server 4 (step S41). The server 4 performs video analysis on the entire rotated video in response to the instruction and outputs the analysis result (step S42). Subsequently, the server 4 sends the analysis result of the entire rotated video to the information terminal 6 (step S43). In this way, the analysis result of the rotated video is displayed in the chat display area 340 (step S44). 【0101】 According to the first modified video analysis method, the analysis results of a portion of the video before rotation and the analysis results of a portion of the video after rotation are presented to user U in a chat format. In this way, user U can easily determine which of the video before or after rotation is more suitable for video analysis through the video chat screen 300. Thus, the usability of the video analysis system 1 can be further improved by the video chat screen 300. 【0102】 (Video analysis method according to a second modified example of this embodiment) Next, with reference to Figure 13, the video analysis method according to the second modified example of this embodiment will be described below. Figure 13 is a flowchart illustrating the video analysis method according to the second modified example of this embodiment. The second modified example differs from this embodiment in that the selection of either the analysis result of the video before rotation processing or the analysis result of the video after rotation processing is automatically performed by the server 4 instead of the user U. Furthermore, it is assumed that the processes from steps S1 to S9 shown in Figure 7A have already been executed as a preliminary step before each process shown in Figure 13 is executed. That is, it is assumed that the job registration information has been stored in the server 4. 【0103】 As shown in Figure 13, in step S50, the information terminal 6 displays a portion of the video selected by user U. In step S51, the information terminal 6 obtains correct value information (an example of third numerical information) regarding the number of objects of each attribute that have entered the detection area or passed through the detection line in the video, through user U's input operation. In this regard, user U inputs correct value information regarding the number of objects of each attribute to the information terminal 6 by visually viewing a portion of the video. 【0104】 In step S52, the information terminal 6 transmits the correct answer information to the server 4. The correct answer information is then stored in the server 4. Next, in step S53, the server 4 performs video analysis on a portion of the selected video and outputs a first analysis result for that portion of the video. Here, the first analysis result is information (an example of first numerical information) regarding the number of objects of each attribute that entered the detection area or passed through the detection line in the video. 【0105】 In step S54, the server 4 calculates the accuracy of the first analysis result by comparing the correct value information with the first analysis result. Next, in step S55, the server 4 generates a rotated image by performing a rotation process on the image. In step S56, the server 4 performs an image analysis on a portion of the rotated image and outputs a second analysis result for a portion of the rotated image. Here, the second analysis result is similarly information (an example of second numerical information) regarding the number of objects of each attribute that entered the detection area or passed through the detection line in the image. 【0106】 In step S57, Server 4 calculates the accuracy of the second analysis result by comparing the correct answer information with the second analysis result. In step S58, Server 4 automatically selects either the first or second analysis result based on a comparison between the accuracy of the first and second analysis results. Specifically, Server 4 selects the one with the higher accuracy between the first and second analysis results. 【0107】 In step S59, the server 4 performs video analysis on the entire video corresponding to the selected analysis result, and outputs the analysis result for the entire video (information on the number of objects of each attribute that entered the detection area or passed through the detection line in the video). For example, if the second analysis result is selected, video analysis is performed on the entire video after rotation processing. In step S60, the server 4 sends data to the information terminal 6 to display an analysis result screen (not shown) showing the analysis result for the entire video. The analysis result screen is then displayed on the web browser of the information terminal 6 (step S61). 【0108】 According to the second modified video analysis method, based on a comparison between the accuracy of the first analysis result and the accuracy of the second analysis result, either the first or second analysis result is automatically selected by the server 4. Subsequently, video analysis is performed on the entire video corresponding to the selected analysis result. In this way, it is possible to improve the accuracy of video analysis through preprocessing of the video input to the AI model without requiring judgment from the user U. 【0109】 In this modified example, in step S55, the server 4 may generate six rotated images (i.e., six images after rotation) by performing rotation processing on the selected source image at each angle (+5 degrees, +10 degrees, +15 degrees, -5 degrees, -10 degrees, -15 degrees). Furthermore, in step S56, the server 4 may output information regarding the number of objects by attribute at each angle as a second analysis result by performing video analysis on each of the six rotated images. Subsequently, the server 4 may calculate the accuracy of the second analysis result at each angle by comparing the correct value information with the second analysis result at each angle, and then automatically select one of the multiple second analysis results and the first analysis result. After that, the server 4 performs video analysis on the entire image corresponding to the automatically selected analysis result, and provides the user U with the analysis result for the entire image. 【0110】 Although embodiments of the present invention have been described above, the technical scope of the present invention should not be interpreted as being limited by the description of these embodiments. These embodiments are examples, and it will be understood by those skilled in the art that various modifications to the embodiments are possible within the scope of the invention described in the claims. The technical scope of the present invention should be determined based on the scope of the invention described in the claims and the scope of its equivalents. [Explanation of symbols] 【0111】 1: Video analysis system, 3: Camera, 4: Server, 6: Information terminal, 7: Communication network, 31: Control unit, 32: Storage device, 33: Location information acquisition unit, 34: Communication unit, 35: Input operation unit, 36: Imaging unit, 37: PTZ mechanism, 41: Control unit, 42: Storage device, 43: Input / output interface, 44: Communication unit, 61: Control unit, 62: Storage device, 63: Input / output interface, 64: Communication unit, 65: Input operation unit, 66: Display unit, 67: Audio input / output unit, 80: Analysis target area, 82: Analysis target area, 83: Analysis target area, 90: Job registration screen, 9 3: Area, 94: Area, 95: Area, 97: Area, 98: Button, 100: Pre-analysis results screen, 112: Area, 113: Video, 114: Video, 115: Area, 117: Guide frame, 118: Guide frame, 120: Select button, 125: Area, 127: Checkbox, 128: Button, 130: Select button, 200: Pre-analysis results table, 210: Select button, 300: Video chat screen, 310: Video viewer area, 320: Timeline, 330: Slider, 340: Chat display area, BB: Bounding box, U: User
Claims
[Claim 1] A video analysis system that performs video analysis on video footage captured by a camera by inputting the footage into an AI model, The second image is generated by rotating the first image to a predetermined angle. By inputting the first video into the AI model, the first analysis result of the first video is output. By inputting the second video into the AI model, the second analysis result of the second video is output. The first analysis result and the second analysis result are presented to the user. In response to the user's input, select either the first analysis result or the second analysis result. The AI model is then used to input the video corresponding to the selected analysis result, thereby performing video analysis on the video. The aforementioned video is a video. Video analysis system. [Claim 2] The first analysis result is an image showing the detection status of the object in the first image, The second analysis result is an image showing the detection status of the object in the second image. The video analysis system according to claim 1. [Claim 3] The first analysis result is numerical information indicating the detection status of the object in the first video. The second analysis result is numerical information indicating the detection status of the object in the second video. The video analysis system according to claim 1. [Claim 4] The first and second analysis results are presented to the user in such a way that it is visually apparent that the accuracy of one of the first and second analysis results is better than the accuracy of the other analysis result. The video analysis system according to claim 1. [Claim 5] The first and second analysis results are displayed in a chat format. The video analysis system according to claim 1. [Claim 6] By inputting a portion of the first video into the AI model, the first analysis result of the first video is output. By inputting a portion of the second video into the AI model, a second analysis result of the second video is output. The video analysis system according to claim 1. [Claim 7] The entire selected video is input into the AI model to perform video analysis on the entire selected video. The video analysis system according to claim 1. [Claim 8] Based on the first and second analysis results, information regarding the installation of the camera is presented to the user. The video analysis system according to claim 1. [Claim 9] A video analysis system that performs video analysis on video footage captured by a camera by inputting the footage into an AI model, The second image is generated by rotating the first image to a predetermined angle. By inputting the first video into the AI model, the first analysis result of the first video is output. By inputting the second video into the AI model, the second analysis result of the second video is output. The accuracy of the first analysis result and the accuracy of the second analysis result are calculated. Based on a comparison between the accuracy of the first analysis result and the accuracy of the second analysis result, either the first analysis result or the second analysis result is automatically selected. The AI model is then used to input the video corresponding to the selected analysis result, thereby performing video analysis on the video. The aforementioned video is a video. Video analysis system. [Claim 10] The first analysis result is first numerical information indicating the detection status of the object in the first video, The second analysis result is second numerical information indicating the detection status of the object in the second video, The aforementioned video analysis system, Through user input, a third numerical information indicating the correct value for the object shown in the first video is obtained. Based on the first numerical information and the third numerical information, the accuracy of the first analysis result is calculated. The accuracy of the second analysis result is calculated based on the second numerical information and the third numerical information. The video analysis system according to claim 9. [Claim 11] A video analysis method that performs video analysis on video footage captured by a camera by inputting the video footage into an AI model, A step of generating a second image by rotating the first image to a predetermined angle, The steps include: inputting the first video into the AI model to output a first analysis result of the first video; The steps include: inputting the second video into the AI model to output a second analysis result of the second video; A step of presenting the first analysis result and the second analysis result to the user, The steps include selecting either the first analysis result or the second analysis result in response to the user's input operation, The steps include: inputting the video corresponding to the selected analysis result into the AI model to perform video analysis on the video; Includes, The aforementioned video is a video. A video analysis method performed by a video analysis system. [Claim 12] A video analysis program that causes a video analysis system to execute the video analysis method described in claim 11.