A video analysis method, system and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By identifying the encoding and encapsulation formats of videos, and flexibly converting video formats, the problem of low efficiency in video format conversion in existing technologies is solved, enabling fast and efficient processing of video structured analysis.

CN116016991BActive Publication Date: 2026-06-16HANGZHOU HIKVISION SYST TECH CO LTD

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: HANGZHOU HIKVISION SYST TECH CO LTD
Filing Date: 2022-12-29
Publication Date: 2026-06-16

Application Information

Patent Timeline

29 Dec 2022

Application

16 Jun 2026

Publication

CN116016991B

IPC: H04N21/2343; H04N21/234

AI Tagging

Application Domain

Selective content distribution

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing technologies are inefficient in video format conversion and cannot meet the needs of fast and efficient video structured analysis, mainly because they require decoding before transcoding and repackaging.

⚗Method used

By identifying the video's encoding and encapsulation formats, it can flexibly convert the video's encoding or encapsulation format, omitting unnecessary conversion steps and directly converting the video into an analyzable format.

🎯Benefits of technology

It improves the efficiency of video format conversion, meeting the needs of fast and efficient video structured analysis.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116016991B_ABST

Patent Text Reader

Abstract

The application discloses a video analysis method, a system and a storage medium, and relates to the technical field of multimedia. The video format is converted into an analyzable video format by identifying the obtained video format, so that video structured analysis is performed. The method comprises the following steps: a cloud storage server acquires a first video, the encapsulation format of the first video is a first encapsulation format, and the encoding format of the first video is a first encoding format; in the case that the first encoding format is a preset encoding format and the first encapsulation format is not a preset encapsulation format, the first encapsulation format is converted into a preset encapsulation format, and a second video is sent to a structured analysis server; in the case that the first encoding format is not a preset encoding format and the first encapsulation format is a preset encapsulation format, a transcoding server converts the first encoding format of the first video into a third video with a preset encoding format; and the structured analysis server is used for performing structured analysis on the second video and performing structured analysis on the third video.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of multimedia technology, and in particular to a video analysis method, system, and storage medium. Background Technology

[0002] Video structured analysis is a technique that organizes video footage and audio into textual information that can be understood by computers and humans. Currently, video structured analysis is mainly applied in fields such as security and transportation. Before performing structured analysis on video, the video format needs to be converted to an analyzable format; therefore, the efficiency of the video format conversion process is crucial for subsequent video structured analysis.

[0003] In existing technologies, the original video encoding format is typically decoded directly, then converted to an analyzable video encoding format, and the original encapsulation format is also converted to an analyzable video encapsulation format—essentially transcoding and re-encapsulating the original video. However, this method involves a lengthy decoding and transcoding / re-encapsulation process, resulting in low efficiency in converting video formats and failing to meet the demands for fast and efficient video structured analysis. Summary of the Invention

[0004] This application provides a video analysis method, system, and storage medium. By identifying the format of the acquired video, the system flexibly converts the video format into an analyzable video format, thereby facilitating video structured analysis of the converted video and meeting the needs for fast and efficient video structured analysis.

[0005] In a first aspect, this application provides a video analysis system, comprising: a cloud storage server for acquiring a first video, wherein the first video has a first encapsulation format and a first encoding format; when the first encoding format is not a preset encoding format but the first encapsulation format is a preset encapsulation format, sending the first video to a transcoding server; when the first encoding format is a preset encoding format but the first encapsulation format is not a preset encapsulation format, converting the first encapsulation format of the first video to a preset encapsulation format, and sending the converted second video in the preset encapsulation format to a structured analysis server; a transcoding server for acquiring the first video from the cloud storage server, converting the first encoding format of the first video to a preset encoding format, and sending the converted third video in the preset encoding format to the cloud storage server; the cloud storage server is further configured to send the converted third video in the preset encoding format to the structured analysis server; the structured analysis server is configured to perform structured analysis on the second video; and the structured analysis server is further configured to perform structured analysis on the third video.

[0006] Understandably, the system's cloud storage server first identifies the encoding and encapsulation formats of the video to be structured analyzed. If the acquired video's encoding format is a preset format but the encapsulation format is not (i.e., not an analyzable encapsulation format), the system can directly convert the video's encapsulation format to the preset format, omitting the process of converting the video's encoding format to the preset format. This direct video-to-encapsulation conversion is highly efficient. Conversely, if the video's encoding format is not the preset format but the encapsulation format is, the transcoding server converts the video's encoding format to the preset format, again omitting the process of converting the video's encapsulation format to the preset format. By first identifying the encoding and encapsulation formats of the video to be structured analyzed and then flexibly converting them, the system facilitates the structured analysis server's performance on the converted video, meeting the demands for fast and efficient video structured analysis.

[0007] In some embodiments, the cloud storage server is further configured to divide the first video into multiple video segments according to a preset size when the first encoding format is not a preset encoding format and the first encapsulation format is a preset encapsulation format; and send the multiple video segments to the transcoding server.

[0008] In some embodiments, the cloud storage server is further configured to divide the first video into multiple video segments according to a preset size if the size of the first video is greater than a preset threshold.

[0009] In some embodiments, the transcoding server is further configured to: obtain multiple video segments from the cloud storage server when the first encoding format of the first video is not a preset encoding format but the first encapsulation format is a preset encapsulation format; convert the first encoding format of the multiple video segments into the preset encoding format respectively; and send the converted multiple video segments in the preset encoding format to the cloud storage server; the cloud storage server is further configured to: receive the multiple video segments and forward the multiple video segments to the structured analysis server; the structured analysis server is further configured to: perform structured analysis on the converted multiple video segments;

[0010] In some embodiments, the transcoding server is further configured to send the target video segment in a pre-defined encoding format to the cloud storage server; the cloud storage server is further configured to receive the target video segment and forward it to the structured analysis server; the structured analysis server is further configured to perform structured analysis on the target video segment in the pre-defined encoding format, wherein the target video segment is at least one of a plurality of video segments, and the time period consisting of the start time and the end time of the target video segment includes a second time period.

[0011] In some embodiments, the structured analysis server is further configured to perform structured analysis on a first time period of the second video, wherein the first time period is the time period between the start and end times of the second video.

[0012] In some embodiments, the structured analysis server is also used to perform structured analysis on the video of a third time period in the third video, where the third time period is the time period between the start and end times of the third video.

[0013] Secondly, this application provides a video analysis method applied to a cloud storage server. The method includes: acquiring a first video, wherein the first video has a first encapsulation format and a first encoding format; if the first encoding format is not a preset encoding format but the first encapsulation format is a preset encapsulation format, sending the first video to a transcoding server; if the first encoding format is a preset encoding format but the first encapsulation format is not a preset encapsulation format, converting the first encapsulation format of the first video to a preset encapsulation format; and sending a second video in the converted preset encapsulation format to a structured analysis server, wherein the second video is used for structured analysis.

[0014] In some embodiments, when the first encoding format is not a preset encoding format and the first encapsulation format is a preset encapsulation format, sending the first video to the transcoding server includes: dividing the first video into multiple video segments according to a preset size when the first encoding format is not a preset encoding format and the first encapsulation format is a preset encapsulation format; and sending the multiple video segments to the transcoding server, wherein the multiple video segments are used for video encoding format conversion.

[0015] In some embodiments, dividing the first video into multiple video segments according to a preset size includes: dividing the first video into multiple video segments according to a preset size when the size of the first video is greater than a preset threshold.

[0016] In some embodiments, the second video is used to perform structured analysis on the video of the first time period, where the first time period is the time period between the start and end times of the second video.

[0017] Thirdly, this application provides a video analysis method applied to a transcoding server. The method includes: obtaining a first video from a cloud storage server when the first encoding format of the first video is not a preset encoding format but the first encapsulation format of the first video is a preset encapsulation format; converting the first encoding format of the first video to a preset encoding format; sending a third video in the converted preset encoding format to the cloud storage server; the cloud storage server sending the third video in the converted preset encoding format to a structured analysis server; and the third video being used for structured analysis.

[0018] In some embodiments, when the first encoding format of the first video is not a preset encoding format but the first encapsulation format is a preset encapsulation format, multiple video segments are obtained from a cloud storage server; the first encoding formats of the multiple video segments are converted to the preset encoding format respectively; the converted video segments in the preset encoding format are sent to the cloud storage server; the cloud storage server is used to send the multiple video segments to the structured analysis server, and the multiple video segments are used for structured analysis.

[0019] In some embodiments, sending multiple video segments in a converted preset encoding format to a cloud storage server, wherein the multiple video segments in the preset encoding format are used for structured analysis, includes: sending a target video segment in a converted preset encoding format to a cloud storage server, wherein the cloud storage server is used to send the target video segment to a structured analysis server; the target video segment is at least one of the multiple video segments, and the time period formed by the start time and end time of the target video segment includes a second time period, wherein the target video segment is used to perform structured analysis on the video of the second time period.

[0020] In some embodiments, the third video is used to perform structured analysis on the video of a third time period, where the third time period is the time period between the start and end times of the third video.

[0021] Fourthly, embodiments of this application provide an electronic device, including: a memory and a processor; the memory and the processor are coupled; the memory is used to store computer program code, the computer program code including computer instructions; wherein, when the processor executes the computer instructions, it causes the video analysis system to perform a video analysis method as described in the first aspect and any of its possible design schemes.

[0022] Fifthly, this application provides a computer-readable storage medium comprising: computer software instructions; which, when executed in a video analysis system, cause the video analysis system to implement the methods of the second and third aspects described above.

[0023] Sixthly, this application provides a computer program product that, when run on a video analysis system, causes the video analysis system to perform the steps of the methods described in the second and third aspects above, so as to implement the methods in the second and third aspects above.

[0024] The beneficial effects of the second to sixth aspects mentioned above can be referred to the corresponding descriptions in the first aspect, and will not be repeated here. Attached Figure Description

[0025] Figure 1 A schematic diagram of the structure of a video analysis device provided in this application;

[0026] Figure 2 A schematic diagram of a video analysis system provided in this application;

[0027] Figure 3 A flowchart illustrating a video analysis method provided in this application;

[0028] Figure 4 A flowchart illustrating yet another video analysis method provided in this application;

[0029] Figure 5 A flowchart illustrating yet another video analysis method provided in this application;

[0030] Figure 6 A flowchart illustrating yet another video analysis method provided in this application;

[0031] Figure 7 A flowchart illustrating yet another video analysis method provided in this application;

[0032] Figure 8 A flowchart illustrating yet another video analysis method provided in this application;

[0033] Figure 9 This is a schematic diagram of the structure of an electronic device provided in this application. Detailed Implementation

[0034] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0035] It should be noted that in the embodiments of this application, the words "exemplarily" or "for example" are used to indicate examples, illustrations, or explanations. Any embodiment or design scheme described as "exemplarily" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design schemes. Specifically, the use of the words "exemplarily" or "for example" is intended to present the relevant concepts in a specific manner.

[0036] To facilitate a clear description of the technical solutions of the embodiments of this application, the terms "first" and "second" are used in the embodiments of this application to distinguish the same or similar items with essentially the same function and effect. Those skilled in the art can understand that the terms "first" and "second" are not intended to limit the quantity or execution order.

[0037] As mentioned earlier, video structured analysis is a technology that organizes video footage and audio into textual information that can be understood by computers and humans. Currently, video structured analysis is mainly applied in fields such as security and transportation. Before performing structured analysis on a video, its format needs to be converted to an analyzable format; therefore, the efficiency of the video format conversion process is crucial for subsequent video structured analysis.

[0038] In existing technologies, the original video encoding format is typically decoded directly, then converted to an analyzable video encoding format, and the original encapsulation format is also converted to an analyzable video encapsulation format—essentially transcoding and re-encapsulating the original video. However, this method involves a lengthy decoding and transcoding / re-encapsulation process, resulting in low efficiency in converting video formats and failing to meet the demands for fast and efficient video structured analysis.

[0039] To address this issue, this application provides a video analysis method, system, and storage medium. The system's cloud storage server first identifies the encoding and encapsulation formats of the video to be structured analyzed. If the acquired video's encoding format is a preset encoding format but the encapsulation format is not (i.e., not an analyzable encapsulation format), the video's encapsulation format can be directly converted to the preset encapsulation format, omitting the process of converting the video's encoding format to the preset encoding format, resulting in higher efficiency. Conversely, if the video's encoding format is not the preset encoding format but the encapsulation format is, the transcoding server converts the video's encoding format to the preset encoding format, again omitting the process of converting the video's encapsulation format to the preset encapsulation format. By first identifying the encoding and encapsulation formats of the video to be structured analyzed and flexibly converting either the encoding or encapsulation format, this system facilitates the structured analysis server's performance of the converted video for structured analysis, meeting the demand for fast and efficient video structured analysis.

[0040] Figure 1 A video analysis device 11 provided in this application embodiment includes a cloud storage unit 101, a transcoding unit 102, and a structured analysis unit 103, which can also be called an intelligent analysis unit.

[0041] The cloud storage unit 101 is used to acquire a first video, the first video being in a first encapsulation format and the first video being in a first encoding format; if the first encoding format is not a preset encoding format but the first encapsulation format is a preset encapsulation format, the first video is sent to the transcoding unit 102; if the first encoding format is a preset encoding format but the first encapsulation format is not a preset encapsulation format, the first encapsulation format of the first video is converted to a preset encapsulation format, and the converted second video in the preset encapsulation format is sent to the structured analysis unit 103.

[0042] The transcoding unit 102 is used to obtain the first video from the cloud storage unit 101, convert the first encoding format of the first video into a preset encoding format, and send the converted third video in the preset encoding format to the cloud storage unit 101.

[0043] The cloud storage unit 101 is also used to send a third video in a pre-defined encoding format to the structured analysis server.

[0044] The structured analysis unit 103 is used to perform structured analysis on the second video.

[0045] The structured analysis unit 103 is also used to perform structured analysis on the third video.

[0046] In some embodiments, the cloud storage unit 101 is further configured to divide the first video into multiple video segments according to a preset size when the first encoding format is not a preset encoding format and the first encapsulation format is a preset encapsulation format; and send the multiple video segments to the transcoding unit 102.

[0047] In some embodiments, the cloud storage unit 101 is further configured to divide the first video into multiple video segments according to a preset size if the size of the first video is greater than a preset threshold.

[0048] In some embodiments, the transcoding unit 102 is further configured to: obtain multiple video segments from the cloud storage unit 101 when the first encoding format of the first video is not a preset encoding format but the first encapsulation format is a preset encapsulation format; convert the first encoding format of the multiple video segments into a preset encoding format respectively; and send the converted multiple video segments in the preset encoding format to the cloud storage unit 101; the cloud storage unit 101 is further configured to: receive the multiple video segments and forward the multiple video segments to the structured analysis unit 103; the structured analysis unit 103 is further configured to: perform structured analysis on the converted multiple video segments.

[0049] In some embodiments, the transcoding unit 102 is further configured to send the target video segment in a pre-defined encoding format after conversion to the cloud storage unit 101; the cloud storage unit 101 is further configured to receive the target video segment and forward it to the structured analysis unit 103; the structured analysis unit 103 is further configured to perform structured analysis on the target video segment in the pre-defined encoding format after conversion, wherein the target video segment is at least one of a plurality of video segments, and the time period consisting of the start time and the end time of the target video segment includes a second time period.

[0050] In some embodiments, the structured analysis unit 103 is further configured to perform structured analysis on a first time period of the second video, wherein the first time period is the time period between the start and end times of the second video.

[0051] In some embodiments, the structured analysis unit 103 is further configured to perform structured analysis on the video of a third time period in the third video, wherein the third time period is the time period between the start time and the end time of the third video.

[0052] For example, such as Figure 2 As shown in the embodiment of this application, a video analysis method can be executed by a video analysis system, which includes an industry platform, a cloud storage server, a transcoding server, and a structured analysis server. The cloud storage server includes a transcoding module, and the structured analysis server can also be called an intelligent analysis server. The video analysis system can perform the following steps:

[0053] (1) The industry platform uploads the first video to the preprocessing module of the cloud storage server in the form of an object, and specifies that the video is transcoded and encapsulated into a preset encapsulation format and a preset encoding format.

[0054] (2) The transcoding module first determines the first encapsulation format and the first encoding format of the first video. If the first encoding format of the first video is the preset encoding format and the first encapsulation format is not the preset encapsulation format, the transcoding module converts the encapsulation format of the first video to the preset encapsulation format.

[0055] (3) If the first encoding format is not the preset encoding format but the first encapsulation format is the preset encapsulation format, the first video is sent to the transcoding server. The transcoding server obtains the first video from the cloud storage server and converts the first encoding format of the first video into the preset encoding format.

[0056] (4) The industry platform periodically checks the progress of the transcoding or repackaging work performed by First Video.

[0057] (5) When the encapsulation module converts the first video to a preset encapsulation format, the cloud storage server returns the progress of the first video's encapsulation format conversion to the industry platform; when the transcoding server converts the first video's first encoding format to a preset encoding format, the cloud storage server returns the progress of the first video's encoding format conversion to the industry platform.

[0058] (6) When the industry platform finds that the first video transcoding is successful or the first video transcoding is successful, it sends a video structured analysis task to the structured analysis server.

[0059] (7) The structured analysis server retrieves the corresponding video for that time period from the first video from the cloud storage server based on the time period of the first video that the user wants to obtain.

[0060] (8) The structured analysis server performs structured analysis on the video and returns the video analysis results to the industry platform.

[0061] Figure 3 This is a flowchart illustrating a video analysis method provided in an embodiment of this application. For example, the video analysis method provided in this application can be applied to... Figure 1 The video analysis device shown and Figure 2 The video analytics system shown.

[0062] like Figure 3 As shown, the video analysis method provided in this application may specifically include the following steps:

[0063] S101, cloud storage server retrieves the first video.

[0064] The first video is the one to be subjected to structured analysis.

[0065] For example, when the first video is related to a person or object, structural analysis of the video can obtain specific information about the people, vehicles, and non-motorized vehicles appearing in the video. For instance, it can obtain information such as the person's physiological characteristics (e.g., gender, age), facial expressions (smiling, normal, angry), and facial accessories (glasses, sunglasses, hat, mask). It can also provide a structured description of the person's clothing, direction of movement, whether they are carrying a backpack, a bag, an umbrella, or riding a bicycle.

[0066] The first video uses a first encapsulation format and a first encoding format. Video encoding means converting the original video format into another video format using compression technology, while video encapsulation means placing the encoded and compressed video into a video file according to a specific encapsulation format.

[0067] In some embodiments, the first encapsulation format of the first video may be MP4, AVI (audio video interleaved), PS (program stream), etc.; the first encoding format of the first video may be H.261, H.263, H.264, etc.

[0068] After obtaining the first video, this application identifies the encoding and encapsulation formats of the first video.

[0069] In some embodiments, if the container format and encoding format of the acquired first video are exactly the preset container format and encoding format, that is, if the original container format and encoding format of the first video are exactly the analyzable container format and encoding format, the first video can be directly stored and structured analysis can be performed. For example, if the format of the acquired first video is exactly the analyzable PS container format plus H.264 encoding format, the first video can be directly stored and structured analysis can be performed.

[0070] If the first encoding format is a preset encoding format and the first encapsulation format is not a preset encapsulation format, proceed to steps S102 to S104. If the first encoding format is not a preset encoding format and the first encapsulation format is a preset encapsulation format, proceed to steps S105 to S109.

[0071] S102. If the first encoding format is a preset encoding format but the first encapsulation format is not a preset encapsulation format, the cloud storage server will convert the first encapsulation format of the first video to the preset encapsulation format.

[0072] After obtaining the first video, this application identifies the encoding and encapsulation formats of the first video.

[0073] In some embodiments, the preset encoding format can be H.264 encoding format, and the preset container format can be PS container format. Common container formats such as MP4 and AVI can generally be converted to PS container format. For example, if the first video is in AVI container format with H.264 encoding format, it can generally be successfully converted to PS container format with H.264 encoding format.

[0074] Understandably, before performing structured analysis on a video, it's generally necessary to decode the original encoding format, convert the encoding format, and then convert the container format. Therefore, when the video's encoding format is a preset format, the process of converting the video's encoding format to the preset format is omitted, and direct conversion of the video to the container format is more efficient. Thus, if the first encoding format is a preset format but the first container format is not, the first container format of the first video can be directly converted to the preset container format, which is the container format of the video to be analyzed.

[0075] In some embodiments, after converting the first encapsulation format of the first video to a preset encapsulation format, the bitstream of the second video in the preset encapsulation format can be stored (i.e., the data flow used by the second video in the preset encapsulation format per unit time can be stored), and the stored bitstream of the second video in the preset encapsulation format can be subjected to structured analysis.

[0076] In some embodiments, the progress of the first video conversion and containerization can be queried periodically (e.g., every 5 seconds) to determine whether the process of the first video conversion and containerization is smooth.

[0077] S103. Send the converted second video in a preset encapsulation format to the structured analysis server; correspondingly, the structured analysis server receives the second video and performs structured analysis.

[0078] S104. The structured analysis server performs structured analysis on the second video.

[0079] In some embodiments, such as Figure 4 As shown, structured analysis can be performed on the first time segment within the second video. The first time segment is the period between the start and end times of the second video, which is the time segment for which the user needs to perform structured analysis.

[0080] In some embodiments, after converting the first container format of the first video to a preset container format, the start time and end time of the second video in the preset container format can be obtained (i.e., the start time and end time of video recording). For example, the start time of the second video is 11:00 AM on October 20, 2021, and the end time of video recording is 9:20 PM on October 20, 2021.

[0081] In some embodiments, when performing structured analysis on the converted second video in a preset encapsulation format, a specific time period can be selected from the time interval between the start and end times of the second video in the above embodiments for structured analysis. For example, if the second video starts shooting at 11:00 AM on October 20, 2021, and ends shooting at 9:20 PM on October 20, 2021, and the first time period is from 2:05 PM to 8:05 PM on October 20, 2021, then the video within the second video from 2:05 PM to 8:05 PM on October 20, 2021 can be selected for structured analysis.

[0082] Among them, step S104 can be made by Figure 1 The structured analysis unit in the video analysis device shown and Figure 2 The structured analysis platform in the video analysis system shown is executed.

[0083] Understandably, by obtaining the time period between the start and end of the second video recording, the video corresponding to the time period that needs to be structured can be directly selected from that time period for structured analysis, without having to analyze the entire time period. This video structured analysis process is more efficient and more targeted.

[0084] S105. If the first encoding format is not the preset encoding format, but the first encapsulation format is the preset encapsulation format, the cloud storage server sends the first video to the transcoding server; correspondingly, the transcoding server retrieves the first video from the cloud storage server.

[0085] S106. The transcoding server converts the first encoding format of the first video to a preset encoding format.

[0086] After obtaining the first video, this application identifies the encoding and encapsulation formats of the first video.

[0087] In some embodiments, the preset encoding format may be H.264. For example, if the first encapsulation format of the first video is PS encapsulation format and the first encoding format is H.261 encoding format, the format of the first video is converted to PS encapsulation format plus H.264 encoding format to obtain the converted third video, and the third video is subjected to structured analysis.

[0088] In other embodiments, if the first encoding format is not a preset encoding format and the first encapsulation format is not a preset encapsulation format, the first encoding format of the first video can be converted to a preset encoding format and the first encapsulation format can be converted to a preset encapsulation format, and then the converted video can be subjected to structured analysis.

[0089] For example, if the first encoding format of the first video is H.261 and the first container format is AVI, the first video is first converted to H.264 encoding format and then to PS container format. The converted video is then obtained and structured analysis is performed on the video.

[0090] In some embodiments, after converting the first encoding format of the first video to a preset encoding format, the bitstream of the third video in the preset encoding format can be stored (i.e., the data flow used by the three videos in the preset encoding format per unit time can be stored), and the stored bitstream of the third video in the preset encoding format can be subjected to structured analysis.

[0091] In some embodiments, the working status of the first video conversion encoding format can be queried periodically (e.g., every 5 seconds) to determine whether the process of the first video conversion encoding format is smooth.

[0092] In some embodiments, if the query fails to convert the encoding format of the first video, the first video in its original format can be stored directly.

[0093] S107. The transcoding server sends the converted third video in a preset encoding format to the cloud storage server; correspondingly, the cloud storage server receives the third video.

[0094] S108. The cloud storage server sends the converted third video in a preset encoding format to the structured analysis server; correspondingly, the structured analysis server receives the third video.

[0095] S109. The structured analysis server performs structured analysis on the third video.

[0096] In some embodiments, such as Figure 5 As shown, structured analysis can be performed on the third time segment within the third video. The third time segment is the period between the start and end times of the second video, which is the time segment for which the user needs to perform structured analysis.

[0097] In some embodiments, the start and end times of a third video in a preset encoding format (i.e., the start and end times of video recording) can be obtained, and a specific time period of video can be directly selected from the time period between the start and end times of the third video for structured analysis.

[0098] For example, if the third video starts shooting at 8:00 PM on November 20, 2021, and ends shooting at 6:30 AM on November 21, 2021, and the first time period is from 9:05 PM on November 20, 2021 to 3:05 AM on November 21, 2021, then the video from 9:05 PM on November 20, 2021 to 3:05 AM on November 21, 2021 within the first video can be directly selected for structured analysis.

[0099] Among them, step S108 can be performed by Figure 1 The structured analysis unit in the video analysis device shown and Figure 2 The structured analysis platform in the video analysis system shown is executed.

[0100] Understandably, by obtaining the time period between the start and end of the third video recording, it is possible to directly select the video corresponding to the time period that needs to be structured for analysis, without having to analyze the entire time period. This video structured analysis process is more efficient and targeted.

[0101] Among them, the above steps S101 to S109 can be performed by Figure 1 The video analysis device shown and Figure 2 The video analytics system shown is being executed.

[0102] Understandably, this method first identifies the encoding and encapsulation formats of the video to be structured analyzed. If the acquired video's encoding format is a preset format but the encapsulation format is not (i.e., not an analyzable encapsulation format), the video's encapsulation format can be directly converted to the preset format, omitting the process of converting the video's encoding format to the preset format. This direct video-to-encapsulation conversion is highly efficient. Conversely, if the video's encoding format is not the preset format but the encapsulation format is, the video's encoding format is converted to the preset format, again omitting the process of converting the video's encapsulation format to the preset format. This method, by first identifying the encoding and encapsulation formats of the video to be structured analyzed and flexibly converting either format, facilitates video structured analysis of the converted video, meeting the need for fast and efficient video structured analysis.

[0103] In some embodiments of this application, in order to improve the efficiency of converting the original encoding format of the video to a preset encoding format before performing structured analysis on the video, a first video with a large video size can be divided into multiple video segments, and then the encoding formats of the multiple videos can be converted to the preset encoding format.

[0104] For example, if the size of the first video is greater than a preset threshold (e.g., the preset threshold could be 200MB, 800MB, 1GB, etc.), the first video can be divided into multiple video segments, and then structured analysis can be performed on these multiple video segments. For example, if the preset threshold for video size is 200MB and the size of the first video is 250MB, the first video can be divided into multiple video segments, each of which is less than 200MB in size.

[0105] For example, such as Figure 6 As shown, converting the first encoding format of the first video to a preset encoding format may also include the following steps Sa1 to Sa6.

[0106] Sa1, the cloud storage server divides the first video into multiple video segments according to a preset size.

[0107] In some embodiments, the preset size can be 100MB, 150MB, etc., and this embodiment does not impose a specific limitation. For example, when the preset size is 100MB and the size of the first video is 350MB, the first video is divided into three 100MB video segments and one 50MB video segment.

[0108] Sa2: The cloud storage server sends multiple video segments to the transcoding server; correspondingly, the transcoding server retrieves multiple video segments.

[0109] Sa3 and the transcoding server convert the first encoding format of multiple video segments into the preset encoding format, respectively.

[0110] In some embodiments, a video segment of a preset size in the first video can be received each time. While receiving new video segments, the first encoding format of the acquired video segments can be converted into the preset encoding format, and the first encoding formats of multiple video segments can be converted into the preset encoding format simultaneously.

[0111] For example, a transcoding task is sent for every 100MB video segment received. The transcoding task converts the received video segment's first encoding format into a preset encoding format. The first video is 250MB in size. After receiving the first 100MB video segment, it is transcoded, and simultaneously, the reception of the second 100MB video segment begins. After receiving the second 100MB video segment, it is transcoded (the first 100MB video segment may still be being transcoded at the same time). Then, the reception of the third 50MB video segment begins. After receiving the third 50MB video segment, it is transcoded (the first or second 100MB video segment may still be being transcoded at the same time).

[0112] Sa4: The transcoding server sends multiple video segments in a preset encoding format to the cloud storage server; correspondingly, the cloud storage server receives multiple video segments.

[0113] In some embodiments, the working status of multiple video segments converted from the first encoding format to the preset encoding format can be queried periodically (e.g., every 5 seconds). If at least one video segment has been converted to the preset encoding format, the successfully converted video segment can be sent directly to the cloud storage server without waiting for all video segments to be converted to the preset encoding format.

[0114] Sa5: The cloud storage server sends multiple video segments in a pre-defined encoding format to the structured analysis server; correspondingly, the structured analysis server receives multiple video segments.

[0115] In some embodiments, the structured analysis server can obtain multiple video segments from the cloud storage server via Video on Demand (VOD) service. The implementation process of VOD service is as follows: When a user issues a video-on-demand request, the video analysis system retrieves the program information stored in the cloud storage server based on the video-on-demand information and transmits it to the structured analysis server as video and audio stream files via a high-speed transmission network.

[0116] Sa6, the structured analysis server performs structured analysis on the converted video segments with preset encoding formats.

[0117] The structured analysis server can perform structured analysis on successfully converted video segments without waiting for all video segments to be converted to the preset encoding format.

[0118] Among them, the above steps Sa1 to Sa6 can be derived from Figure 1 The video analysis device shown and Figure 2The video analytics system shown is being executed.

[0119] Understandably, if the size of the first video exceeds a preset threshold, it can be divided into multiple video segments, and the encoding formats of these segments can be converted concurrently, thereby improving the efficiency of video encoding format conversion. Once the encoding format conversion of at least one video segment is successful, structured analysis can be directly performed on that segment, thus meeting the need for fast and efficient structured analysis of the entire video.

[0120] In some embodiments, such as Figure 7 As shown, the structured analysis of the converted video segment with the preset encoding format may also include the following steps Sb1 to Sb2.

[0121] Sb1: The structured analysis server obtains the target video segment after conversion and preset encoding format.

[0122] The target video segment is at least one of multiple video segments, and the time period consisting of the start and end times of the target video segment includes the first time period.

[0123] In some embodiments, the working status of multiple video segments converted from the first encoding format to the preset encoding format can be queried periodically (e.g., every 5 seconds). If the encoding format of at least one video segment is converted to the preset encoding format, the start time and end time of at least one successfully converted video segment can be obtained.

[0124] In some embodiments, if the first time period is included in the time period consisting of the start and end times of at least one successfully converted video segment, the target video segment including the first time period can be directly obtained.

[0125] Sb2, the structured analysis server performs structured analysis on the second time segment of the target video segment.

[0126] For example, the successfully converted video segments are video segment a, video segment b, and video segment c. Among them, the video time period of video a is from 9:00 to 10:30 on December 10, 2021; the video time period of video b is from 16:30 to 20:00 on December 10, 2021; and the video time period of video c is from 20:00 on December 10, 2021 to 2:20 on December 11, 2021. If the second time period is from 21:00 to 24:00 on December 10, 2021, the video in video c corresponding to 21:00 to 24:00 on December 10, 2021 can be selected for structured analysis.

[0127] Among them, the above steps Sb1 to Sb2 can also be performed by Figure 1The structured analysis unit of the video analysis device shown is executed.

[0128] Understandably, if the first time segment is required within the time period consisting of the start and end times of the target video segment in the converted preset encoding format, the structured analysis can be performed directly on the video of the first time segment of the target video segment, without having to analyze the video of the entire time segment. This video structured analysis process is more efficient and more targeted.

[0129] For example, such as Figure 8 As shown, converting the encoding and container formats of the first video to a preset container format and a preset encoding format can be performed by the following steps S1 to S12:

[0130] S1. Upload the first video;

[0131] S2. Determine whether the format of the first video is PS container format plus H.264 encoding format;

[0132] S3. If the first video is in PS encapsulation with H.264 encoding, save the video directly.

[0133] S4. If the format of the first video is not PS encapsulation plus H.264 encoding, determine whether the first video can be re-encapsulated (the condition for re-encapsulation is that the encoding format of the first video is H.264 encoding). If it can be re-encapsulated, proceed to step S5; otherwise, proceed to S9.

[0134] S5. Convert the first video's container format to PS container format, i.e., re-encapsulate;

[0135] S6. Determine if the repackaging was successful;

[0136] S7. If the conversion is successful, store the converted video;

[0137] S8. If the repackaging fails, execute S9;

[0138] S9. Divide the first video into multiple video segments;

[0139] S10. Send the transcoding task to convert the encoding formats of multiple video segments concurrently to H.264 encoding format, i.e., transcoding, and determine whether the transcoding is successful.

[0140] S11. If the transcoding is successful, save the converted video;

[0141] S12. If transcoding fails, save the first video in its original format.

[0142] Among them, the above steps S1 to S12 can also be performed by Figure 1 The video analysis device shown performs and Figure 2 The video analytics system shown is being executed.

[0143] It is understood that this embodiment includes at least the following beneficial effects:

[0144] (1) First, determine whether the acquired video format is an analyzable video format. If it is not an analyzable video format, then determine whether the video can be repackaged. If the repackaging conditions are met (i.e., the video encoding format is an analyzable encoding format), then it can be directly repackaged. The video does not need to be decoded again, but is directly repackaged, which makes the video format conversion more efficient.

[0145] (2) In the case that transcoding is not possible or transcoding fails, the video is divided into multiple video segments, and the encoding formats of multiple video segments are converted into analyzable encoding formats concurrently, thereby improving the efficiency of the video transcoding process and meeting the needs of fast and efficient video structured analysis.

[0146] Another embodiment of this application provides a schematic diagram of the structure of an electronic device. For example... Figure 9 As shown, the electronic device includes: a processor 302, a communication interface 303, and a bus 304. Optionally, the electronic device may also include a memory 301.

[0147] Processor 302 may implement or execute various exemplary logic blocks, modules, and circuits described in conjunction with the disclosure of this application. Processor 302 may be a central processing unit, a general-purpose processor, a digital signal processor, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute various exemplary logic blocks, modules, and circuits described in conjunction with the disclosure of this application. Processor 302 may also be a combination that implements computing functions, such as including one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

[0148] Communication interface 303 is used to connect to other devices via a communication network. This communication network can be Ethernet, wireless access network, wireless local area network (WLAN), etc.

[0149] The memory 301 may be a read-only memory (ROM) or other type of static storage device capable of storing static information and instructions, random access memory (RAM) or other type of dynamic storage device capable of storing information and instructions, or electrically erasable programmable read-only memory (EEPROM), disk storage medium or other magnetic storage device, or any other medium capable of carrying or storing desired program code in the form of instructions or data structures and accessible by a computer, but is not limited thereto.

[0150] As one possible implementation, the memory 301 can exist independently of the processor 302. The memory 301 can be connected to the processor 302 via a bus 304 and is used to store instructions or program code. When the processor 302 calls and executes the instructions or program code stored in the memory 301, it can implement the video analysis method provided in the embodiments of this application.

[0151] In another possible implementation, the memory 301 can also be integrated with the processor 302.

[0152] Bus 304 can be an extended industry standard architecture (EISA) bus, etc. Bus 304 can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 9 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.

[0153] Through the above description of the implementation methods, those skilled in the art can clearly understand that, for the sake of convenience and brevity, only the division of the above functional modules is used as an example. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the electronic device can be divided into different functional modules to complete all or part of the functions described above.

[0154] Another embodiment of this application provides a computer-readable storage medium storing computer instructions. When the computer instructions are executed on a computer, the computer performs the various steps in the video analysis method flow shown in the above-described method embodiments.

[0155] Another embodiment of this application provides a computer program product including computer instructions. When the computer instructions are executed on a computer, the computer causes the computer to perform the various steps in the video analysis method flow shown in the above method embodiments.

[0156] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented using software programs, implementation can be, in whole or in part, in the form of a computer program product. This computer program product includes one or more computer instructions. When these computer instructions are loaded and executed on a computer, all or part of the flow or function according to the embodiments of this application is generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, computer instructions can be transmitted from one website, computer, server, or data center to another via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium accessible to a computer or a data storage device containing one or more servers, data centers, etc., that can be integrated with the medium. The available media can be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., solid-state disks, SSDs).

[0157] The above description is merely a specific embodiment of this application. Any variations or substitutions conceived by those skilled in the art based on the specific embodiments provided in this application should be covered within the protection scope of this application.

Claims

1. A video analysis system, characterized in that, The system includes: a cloud storage server, a transcoding server, and a structured analysis server; The cloud storage server is used to acquire a first video, wherein the first video has a first encapsulation format and a first encoding format; when the first encoding format is not a preset encoding format but the first encapsulation format is a preset encapsulation format, the first video is divided into multiple video segments according to a preset size; the multiple video segments are sent to the transcoding server; when the first encoding format is a preset encoding format but the first encapsulation format is not a preset encapsulation format, the first encapsulation format of the first video is converted to the preset encapsulation format, and the converted second video in the preset encapsulation format is sent to the structured analysis server. The transcoding server is configured to obtain the plurality of video segments from the cloud storage server. After receiving one video segment, it begins to receive the next video segment and converts the first encoding format of the obtained video segments into the preset encoding format. If the encoding format of at least one video segment is converted into the preset encoding format, the video segment in the converted preset encoding format is sent to the cloud storage server. The cloud storage server is also used to receive the converted video segments with the preset encoding format; The structured analysis server is used to perform structured analysis on the second video; The structured analysis server is further configured to query the working status of the conversion of the first encoding format of the plurality of video segments to the preset encoding format; when at least one video segment's encoding format has been converted to the preset encoding format, obtain the start time and end time of at least one successfully converted video segment; when the time period formed by the start time and end time of at least one successfully converted video segment includes a first time period, obtain a target video segment including the first time period from the cloud storage server; perform structured analysis on the video of the first time period in the target video segment; the first time period represents the time period for which the user needs to perform structured analysis.

2. The system according to claim 1, characterized in that, The structured analysis server is used to perform structured analysis on the second video, specifically including: The structured analysis server is used to perform structured analysis on the second time period of the second video, where the second time period is the time period between the start and end times of the second video.

3. A video analysis method, characterized in that, Applied to cloud storage servers, the method includes: Obtain the first video, wherein the first video is in a first encapsulation format and the first video is in a first encoding format; If the first encoding format is not a preset encoding format, but the first encapsulation format is a preset encapsulation format, the first video is divided into multiple video segments according to a preset size, and the multiple video segments are sent to the transcoding server; so that the transcoding server: obtains the multiple video segments from the cloud storage server, and after receiving each video segment, starts receiving the next video segment, and converts the first encoding format of the obtained video segments into the preset encoding format; if the encoding format of at least one video segment is converted into the preset encoding format, the video segment in the converted preset encoding format is sent to the cloud storage server; The system receives the converted video segments in a preset encoding format, enabling the structured analysis server to: query the working status of the conversion of the first encoding format of the multiple video segments to the preset encoding format; if at least one video segment's encoding format has been converted to the preset encoding format, obtain the start and end times of at least one successfully converted video segment; if the time period formed by the start and end times of at least one successfully converted video segment includes a first time period, obtain a target video segment including the first time period from the cloud storage server; and perform structured analysis on the video of the first time period in the target video segment; where the first time period represents the time period for which the user needs to perform structured analysis. If the first encoding format is a preset encoding format but the first encapsulation format is not a preset encapsulation format, the first encapsulation format of the first video is converted to the preset encapsulation format; The converted second video in the preset encapsulation format is sent to the structured analysis server. The second video is used for structured analysis.

4. The method according to claim 3, characterized in that, The second video is used to perform structured analysis on the video in the second time period, which is the time period between the start and end times of the second video.

5. A video analysis method, characterized in that, Applied to a transcoding server, the method includes: When the first encoding format of the first video is not a preset encoding format, but the first encapsulation format of the first video is a preset encapsulation format, multiple video segments are obtained from the cloud storage server; the multiple video segments are obtained by the cloud storage server dividing the first video according to a preset size when the first encoding format is not the preset encoding format and the first encapsulation format is the preset encapsulation format. Upon receiving each video segment, the system begins receiving the next video segment, converting the first encoding format of the acquired video segments to the preset encoding format. If at least one video segment's encoding format is converted to the preset encoding format, the converted video segment is sent to the cloud storage server. The cloud storage server receives the converted video segment in the preset encoding format, enabling the structured analysis server to: query the working status of the conversion of the first encoding format of the multiple video segments to the preset encoding format; if at least one video segment's encoding format is converted to the preset encoding format, obtain the start and end times of at least one successfully converted video segment; if the time period formed by the start and end times of at least one successfully converted video segment includes a first time period, obtain a target video segment including the first time period from the cloud storage server; and perform structured analysis on the video of the first time period in the target video segment; the first time period represents the time period for which the user needs to perform structured analysis.

6. A computer-readable storage medium, characterized in that, The computer-readable storage medium includes: computer software instructions; when the computer software instructions are executed in the video analysis system, the video analysis system causes the video analysis system to implement the method as described in any one of claims 3-5.