Method and apparatus for generating marked video, method and apparatus for detecting video marking
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2022-01-24
- Publication Date
- 2026-06-26
Smart Images

Figure CN116528020B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of network media technology, and in particular to a method, apparatus, computer device, storage medium, and computer program product for generating tagged videos, as well as a method, apparatus, computer device, storage medium, and computer program product for detecting video tags. Background Technology
[0002] In recent years, with the increasing awareness of copyright among the public, the importance of protecting the copyright of film and television works has become increasingly apparent. Due to its advantages of concealment, ease of traceability, and convenient operation, digital watermarking technology is being increasingly introduced into video copyright protection scenarios.
[0003] Existing video digital watermarking technologies primarily involve adding image watermarks to video frames to identify and display the video's source. However, under strong attacks such as scaling, cropping, degrading, and editing, image watermarks added in this way are easily compromised, making it difficult to effectively protect video copyright. Summary of the Invention
[0004] Therefore, it is necessary to provide a method, apparatus, computer device, storage medium, and computer program product for generating marked videos that can improve the robustness of digital watermarks, as well as a method, apparatus, computer device, storage medium, and computer program product for detecting video marks, in order to address the above-mentioned technical problems.
[0005] On one hand, this application provides a method for generating labeled videos. The method includes:
[0006] Obtain the object identifier and determine the original subtitle file of the target video requested through the object identifier;
[0007] The object identifier is mapped to an identifier sequence, and the subtitle offset type corresponding to each identifier in the identifier sequence is determined.
[0008] Determine the identifier corresponding to each subtitle to be offset in the original subtitle file;
[0009] Obtain the offset subtitle by performing the corresponding subtitle offset on the subtitle to be offset according to the subtitle offset type to which the identifier to be offset belongs;
[0010] A marker caption file is determined based on multiple offset captions; the marker caption file is used together with the target video to form a marker video corresponding to the object identifier.
[0011] On the other hand, this application also provides an apparatus for generating labeled videos. The apparatus includes:
[0012] The acquisition module is used to acquire an object identifier and determine the original subtitle file of the target video requested through the object identifier;
[0013] The mapping module is used to map the object identifier to an identifier sequence and determine the subtitle offset type corresponding to each identifier in the identifier sequence;
[0014] The determination module is used to determine the identifiers corresponding to each subtitle to be offset in the original subtitle file;
[0015] The determining module is further configured to obtain the offset subtitle obtained by performing corresponding subtitle offset on the subtitle to be offset according to the subtitle offset type to which the identifier corresponding to the subtitle to be offset belongs;
[0016] The determining module is further configured to determine a marker subtitle file based on multiple offset subtitles; the marker subtitle file is used together with the target video to form a marker video corresponding to the object identifier.
[0017] On the other hand, this application also provides a computer device. The computer device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to perform the following steps:
[0018] Obtain the object identifier and determine the original subtitle file of the target video requested through the object identifier;
[0019] The object identifier is mapped to an identifier sequence, and the subtitle offset type corresponding to each identifier in the identifier sequence is determined.
[0020] Determine the identifier corresponding to each subtitle to be offset in the original subtitle file;
[0021] Obtain the offset subtitle by performing the corresponding subtitle offset on the subtitle to be offset according to the subtitle offset type to which the identifier to be offset belongs;
[0022] A marker caption file is determined based on multiple offset captions; the marker caption file is used together with the target video to form a marker video corresponding to the object identifier.
[0023] On the other hand, this application also provides a computer-readable storage medium. The computer-readable storage medium stores a computer program thereon, which, when executed by a processor, performs the following steps:
[0024] Obtain the object identifier and determine the original subtitle file of the target video requested through the object identifier;
[0025] The object identifier is mapped to an identifier sequence, and the subtitle offset type corresponding to each identifier in the identifier sequence is determined.
[0026] Determine the identifier corresponding to each subtitle to be offset in the original subtitle file;
[0027] Obtain the offset subtitle by performing the corresponding subtitle offset on the subtitle to be offset according to the subtitle offset type to which the identifier to be offset belongs;
[0028] A marker caption file is determined based on multiple offset captions; the marker caption file is used together with the target video to form a marker video corresponding to the object identifier.
[0029] On the other hand, this application also provides a computer program product. The computer program product includes a computer program that, when executed by a processor, performs the following steps:
[0030] Obtain the object identifier and determine the original subtitle file of the target video requested through the object identifier;
[0031] The object identifier is mapped to an identifier sequence, and the subtitle offset type corresponding to each identifier in the identifier sequence is determined.
[0032] Determine the identifier corresponding to each subtitle to be offset in the original subtitle file;
[0033] Obtain the offset subtitle by performing the corresponding subtitle offset on the subtitle to be offset according to the subtitle offset type to which the identifier to be offset belongs;
[0034] A marker caption file is determined based on multiple offset captions; the marker caption file is used together with the target video to form a marker video corresponding to the object identifier.
[0035] The aforementioned method, apparatus, computer equipment, storage medium, and computer program product for generating marked videos, through a certain logical mapping of the object identifier of the video playback party into a sequence of identifiers, and based on the subtitle offset type corresponding to each identifier in the identifier sequence, obtains the offset subtitles corresponding to each subtitle to be offset in the original subtitle file, and then synthesizes the offset subtitles into a marked subtitle file, thereby constituting a marked video. By modifying the position of each subtitle to be offset in the original subtitle file, the original subtitle file is converted into a marked subtitle file with embedded object identifiers, and this is concealed into the video as a mark or watermark information. Compared with the scheme of embedding in the image dimension, it is more robust and difficult to be destroyed by attacks such as scaling, cropping, degrading, and editing, thereby ensuring the detection rate and accuracy during source tracing.
[0036] On the other hand, this application also provides a method for detecting video tags. The method includes:
[0037] Acquire the video to be detected and determine the subtitle offset type corresponding to each video frame in the video to be detected;
[0038] For each subtitle to be detected in the video to be detected, the subtitle offset type corresponding to each subtitle to be detected is determined based on the subtitle offset type corresponding to at least one video frame corresponding to the subtitle to be detected;
[0039] Based on the subtitle offset type corresponding to each subtitle to be inspected, determine the identifier corresponding to each subtitle to be inspected;
[0040] Based on the identifiers corresponding to each of the subtitles to be inspected, an identifier sequence is determined, and based on the identifier sequence, the object identifier marked in the video to be inspected is determined.
[0041] On the other hand, this application also provides a video tag detection device. The device includes:
[0042] The acquisition module is used to acquire the video to be detected and determine the subtitle offset type corresponding to each video frame in the video to be detected.
[0043] The determination module is used to determine the subtitle offset type corresponding to each subtitle in the video to be detected based on the subtitle offset type corresponding to at least one video frame corresponding to the subtitle.
[0044] The determining module is further configured to determine the identifier corresponding to each of the subtitles to be inspected based on the subtitle offset type corresponding to each subtitle to be inspected.
[0045] The determining module is further configured to determine an identifier sequence based on the identifiers corresponding to each of the subtitles to be inspected, and to determine the object identifier marked in the video to be inspected based on the identifier sequence.
[0046] On the other hand, this application also provides a computer device. The computer device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to perform the following steps:
[0047] Acquire the video to be detected and determine the subtitle offset type corresponding to each video frame in the video to be detected;
[0048] For each subtitle to be detected in the video to be detected, the subtitle offset type corresponding to each subtitle to be detected is determined based on the subtitle offset type corresponding to at least one video frame corresponding to the subtitle to be detected;
[0049] Based on the subtitle offset type corresponding to each subtitle to be inspected, determine the identifier corresponding to each subtitle to be inspected;
[0050] Based on the identifiers corresponding to each of the subtitles to be inspected, an identifier sequence is determined, and based on the identifier sequence, the object identifier marked in the video to be inspected is determined.
[0051] On the other hand, this application also provides a computer-readable storage medium. The computer-readable storage medium stores a computer program thereon, which, when executed by a processor, performs the following steps:
[0052] Acquire the video to be detected and determine the subtitle offset type corresponding to each video frame in the video to be detected;
[0053] For each subtitle to be detected in the video to be detected, the subtitle offset type corresponding to each subtitle to be detected is determined based on the subtitle offset type corresponding to at least one video frame corresponding to the subtitle to be detected;
[0054] Based on the subtitle offset type corresponding to each subtitle to be inspected, determine the identifier corresponding to each subtitle to be inspected;
[0055] Based on the identifiers corresponding to each of the subtitles to be inspected, an identifier sequence is determined, and based on the identifier sequence, the object identifier marked in the video to be inspected is determined.
[0056] On the other hand, this application also provides a computer program product. The computer program product includes a computer program that, when executed by a processor, performs the following steps:
[0057] Acquire the video to be detected and determine the subtitle offset type corresponding to each video frame in the video to be detected;
[0058] For each subtitle to be detected in the video to be detected, the subtitle offset type corresponding to each subtitle to be detected is determined based on the subtitle offset type corresponding to at least one video frame corresponding to the subtitle to be detected;
[0059] Based on the subtitle offset type corresponding to each subtitle to be inspected, determine the identifier corresponding to each subtitle to be inspected;
[0060] Based on the identifiers corresponding to each of the subtitles to be inspected, an identifier sequence is determined, and based on the identifier sequence, the object identifier marked in the video to be inspected is determined.
[0061] The aforementioned video tag detection method, apparatus, computer equipment, storage medium, and computer program product determine the subtitle offset type corresponding to each video frame in the video to be detected, and determine the subtitle offset type corresponding to each subtitle to be detected. Based on the subtitle offset type corresponding to each subtitle to be detected, an identifier corresponding to each subtitle to be detected is determined, and an identifier sequence is obtained. Based on the identifier sequence, the object identifier marked in the video to be detected is determined. This enables the detection of watermark information indirectly embedded in the marked video through subtitle offset, and can determine the video's playback party or leakage source based on the marked object identifier, ensuring the detection rate and accuracy during source tracing. Attached Figure Description
[0062] Figure 1 This is an application environment diagram of a method for generating labeled videos in one embodiment;
[0063] Figure 2 This is a flowchart illustrating a method for generating labeled videos in one embodiment;
[0064] Figure 3 This is a flowchart illustrating a video tag detection method in one embodiment;
[0065] Figure 4A This is a flowchart illustrating the watermark embedding process in one embodiment;
[0066] Figure 4B This is a flowchart illustrating the watermark embedding process in another embodiment;
[0067] Figure 4C This is a flowchart illustrating the watermark detection process in one embodiment;
[0068] Figure 5 This is a schematic diagram of the video alignment process in one embodiment;
[0069] Figure 6 This is a structural block diagram of a device for generating labeled videos in one embodiment;
[0070] Figure 7 This is a structural block diagram of a video marker detection device in one embodiment;
[0071] Figure 8 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation
[0072] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0073] In recent years, with the increasing awareness of copyright among the public, the importance of protecting the copyright of film and television works has become increasingly apparent. Due to its advantages of concealment, ease of traceability, and convenient operation, digital watermarking technology is being increasingly introduced into video copyright protection scenarios.
[0074] Digital watermarking is an information hiding technology that utilizes the limitations of human senses to tightly combine and hide digital signals (such as images, text, symbols, or numbers that can serve as markers) with the original data (such as image, audio, and video data). Digital watermarking can provide complete and reliable evidence of ownership for copyrighted information products.
[0075] Traditional video digital watermarking technology primarily involves overlaying watermarks at the video frame level, which means performing domain transformation on the image and embedding the watermark. However, this method of overlaying watermarks at the frame level has drawbacks such as high computational cost, impact on image quality, and unstable robustness. Under strong attacks such as scaling, cropping, degrading, and editing, the watermark is easily destroyed.
[0076] In view of this, embodiments of this application provide a method for generating marked videos and a corresponding method for detecting video marks, creatively embedding hidden watermarks at the subtitle level. By utilizing video encoding / decoding and computer vision technologies, embodiments of this application propose a method for embedding hidden watermarks during subtitle compression and playback, as well as for detecting watermarks in compressed videos. When a video is leaked or stolen, the source of the leak can be traced based on the hidden watermark, protecting video copyright. Compared to existing methods that embed watermarks at the image level, embodiments of this application embed hidden watermarks at the subtitle level, which not only improves the efficiency of watermark embedding but also has strong resistance to attacks and robustness.
[0077] To facilitate a better understanding of the technical content of this application, the relevant technical terms involved in the embodiments of this application are explained below.
[0078] Subtitles can generally be categorized into several forms, including hard subtitles, soft subtitles, and external subtitles. Hard subtitles are subtitles embedded within the video frame and become part of the image; they are visible as long as the video can be played. Soft subtitles are subtitles and video frame packaged in a single container; the subtitles can be displayed selectively during video playback or separated from the video frame; the subtitles and video frame are separate within the container. External subtitles are subtitles separated from the video container as a separate file; they can be loaded into the video container for playback using a playback tool. The subtitles involved in the embodiments of this application can specifically be soft subtitles or external subtitles, but are not limited to these.
[0079] Subtitles have formats, including but not limited to SRT (SubRipper Text), SSA (SubStation Alpha), and ASS (Advanced SubStation Alpha). Taking SRT format as an example, it consists of: a line of subtitle number, a line of time code, and a line of subtitle data.
[0080] For example: 45
[0082] 00:02:52,184-->00:02:53,617
[0083] A
[0084] This indicates the 45th subtitle, displayed from 2 minutes 52.184 seconds to 2 minutes 53.617 seconds into the video. The subtitle reads: A.
[0085] The solution of this application will be described in detail below. The method for generating marked videos provided in the embodiments of this application can be applied to, for example... Figure 1 In the application environment shown, terminal 102 communicates with server 104 via a communication network. A data storage system can store the data that server 104 needs to process. The data storage system can be integrated onto server 104 or located in the cloud or on another network server. Terminal 102 requests to play a target video from server 104. Server 104 obtains the object identifier corresponding to terminal 102, and based on the identifier sequence mapped to the object identifier, obtains the tagged subtitle file corresponding to the object identifier. Server 104 sends the tagged subtitle file and the target video together to terminal 102 for playback.
[0086] Terminal 102 can be, but is not limited to, one or more of various desktop computers, laptops, smartphones, tablets, smart voice interaction devices, smart home appliances, vehicle terminals, aircraft, etc. For example, terminal 102 can be a smart device capable of providing OTT (Over-The-Top) services, including but not limited to smart TVs and set-top boxes. OTT refers to providing various application services via the internet; typical OTT services include internet TV services and app stores. Terminal 102 can have applications installed, such as video playback applications, browsers, email applications, instant messaging applications, etc., without limitation. Specifically, applications can be standalone applications installed via an installation package, or mini-program applications that can be used without downloading and installation. The terminal can play videos through the installed applications.
[0087] Server 104 can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms. In some embodiments, such as Figure 2 As shown, a method for generating tagged videos is provided. This method can be executed by a server or a terminal, or by both a server and a terminal. This application embodiment applies this method to... Figure 1 Taking the server in the example, the following steps are included:
[0088] Step S202: Obtain the object identifier and determine the original subtitle file of the target video requested through the object identifier.
[0089] The object identifier refers to a unique identifier used to distinguish different objects or the terminals used by those objects. It can be, but is not limited to, one or more of the following: the object's account information (including but not limited to account name, object ID, etc.), the unique encoding information of the video playback application, or the IP (Internet Protocol Address) or MAC (Media Access Control Address) of the terminal with the video playback application installed. For example, if an object's registered account name on a video playback platform is "abc", then the corresponding object identifier is "abc". By obtaining the object identifier, it can be mapped and embedded as a watermark in the subtitles of the video, facilitating future tracing of leaked videos.
[0090] The term "subtitle offset type" refers to the method by which subtitles are offset. This offset type can include, but is not limited to, shifting the position of the subtitles, changing the character spacing, altering the font, color, and size, or not offsetting the subtitles at all (i.e., maintaining the original subtitle style). To minimize impact on the viewing experience, the subtitle offset should be subtle, so that viewers cannot or barely perceive the change. For example, one of the two subtitle offset types might be shifting all subtitles upwards by one pixel, while the other might be shifting all subtitles downwards by one pixel.
[0091] Specifically, a terminal can send a video playback request to a server through a running video playback application to obtain a target video for playback. The video playback request includes an object identifier and video information of the target video requested (indicating which video it is, including but not limited to one or more of video name, video number, etc.). In response to the terminal's video playback request, the server determines the corresponding target video and extracts the object identifier carried in the video playback request. Based on the determined target video, the server finds the corresponding original subtitle file. This original subtitle file can be associated with the target video and stored in a database.
[0092] Step S204: Map the object identifier to an identifier sequence and determine the subtitle offset type corresponding to each identifier in the identifier sequence.
[0093] Because object identifiers vary, converting them into a unified identifier sequence through preset mapping rules enables computer devices to identify them faster and more accurately, thereby improving the efficiency of generating tagged videos. Typically, the identifier sequence has a fixed length and consists of a preset number of identifiers. The identifier specifies the subtitle offset type; one identifier uniquely corresponds to one subtitle offset type. Identifiers can be letters, numbers, or other characters. For example, if there are two preset subtitle offset types, they can be represented by binary digits "0" and "1," or by two different letters "A" and "B." Similarly, if there are more than two subtitle offset types, they can be represented by numbers 0-9 or letters A-Z, or by a combination of letters and numbers.
[0094] The term "subtitle offset type" refers to the method by which subtitles are offset. This offset type can include, but is not limited to, shifting the position of the subtitles, changing the character spacing, altering the font, color, and size, or not offsetting the subtitles at all (i.e., maintaining the original subtitle style). To minimize impact on the viewing experience, the subtitle offset should be subtle, so that viewers cannot or barely perceive the change. For example, one of the two subtitle offset types might be shifting all subtitles upwards by one pixel, while the other might be shifting all subtitles downwards by one pixel.
[0095] Specifically, the server converts the acquired object identifier into an identifier sequence according to a preset mapping rule. For example, if the object ID is "abc", the server can map it to AAAAAAABAABA, where AAAA represents "a", AAAB represents "b", and AABA represents "c". Alternatively, the server can map it to 000110, where 00 represents "a", 01 represents "b", and 10 represents "c". When the identifier sequence obtained from mapping the object identifier is not equal to the preset length, the server can process the identifier sequence according to preset logic, such as truncating or padding. Since the correspondence between identifiers and subtitle offset types is pre-stored, the server can determine the subtitle offset type corresponding to each identifier in the identifier sequence, so that the position of the subtitle to be offset can be modified according to different subtitle offset types to perform subtitle offsetting.
[0096] Step S206: Determine the identifiers corresponding to each subtitle to be offset in the original subtitle file.
[0097] Specifically, after obtaining the original subtitle file of the target video, the server splits the original subtitle file according to the number of subtitles, resulting in multiple subtitles to be offset. For each subtitle to be offset, the server determines an identifier corresponding to each subtitle.
[0098] The server can assign identifiers sequentially according to the order of the subtitles to be offset in the original subtitle file, or it can assign them randomly. For example, the identifier for the first subtitle to be offset can be determined to be "0", the identifier for the second subtitle to be offset can be "1", ... the identifier for the Nth subtitle to be offset can be "1", and so on. Thus, by modifying the position information of the subtitles to be offset, the server converts the original subtitle file into a watermarked subtitle file with embedded object identifier watermark information, i.e., a marked subtitle file.
[0099] Step S208: Obtain the offset subtitle obtained by performing the corresponding subtitle offset on the subtitle to be offset according to the subtitle offset type to which the identifier of the subtitle to be offset belongs.
[0100] Here, "subtitle offset" refers to offsetting subtitles. Corresponding to the subtitle offset type, the methods of offsetting letters include, but are not limited to, shifting the position of the subtitle, changing the character spacing, changing the font, color, and size of the subtitle, and not offsetting the subtitle (i.e., maintaining the original subtitle style), among one or more of these methods. The offset processing (or subtitle offset processing) mentioned in the embodiments of this application can be adding offset subtitles to the original video (or original video segment), or it can be offsetting the original subtitles in the original video.
[0101] Specifically, the server determines the subtitle offset type corresponding to each subtitle to be offset based on its corresponding identifier. For example, the server pre-stores the correspondence between various identifiers and subtitle offset types. For instance, identifier "0" corresponds to a subtitle offset type that shifts the subtitle upwards, identifier "1" corresponds to a subtitle offset type that shifts the subtitle downwards, identifier "A" corresponds to a subtitle offset type that increases the spacing between subtitle characters, identifier "B" corresponds to a subtitle offset type that decreases the spacing between subtitle characters, and so on. The server offsets the corresponding subtitle to be offset according to this subtitle offset type, thus obtaining the offset subtitle. For example, if the identifier corresponding to the subtitle to be offset is "0", the server shifts the subtitle upwards by N pixels; if the identifier corresponding to the subtitle to be offset is "1", the server shifts the subtitle downwards by N pixels, and so on.
[0102] It should be noted that the server can determine the subtitle offset type to which the identifier corresponding to the subtitle to be offset belongs before performing the corresponding subtitle offset on the subtitle to be offset, thereby obtaining the offset subtitle. In some embodiments, the server can also pre-offset each subtitle to be offset in the original subtitle file according to various subtitle offset types to obtain offset subtitles of each type and store them in the storage medium; then in the above steps, the server extracts the offset subtitle of a specific type from the storage medium to obtain the offset subtitle of the subtitle offset type to which the identifier corresponding to the subtitle to be offset belongs.
[0103] Step S210: Determine the marker caption file based on multiple offset captions; the marker caption file is used together with the target video to form a marker video corresponding to the object identifier.
[0104] Specifically, for each subtitle to be offset in the original subtitle file, the server obtains the corresponding offset subtitle, reassembles the offset subtitles, and generates a marked subtitle file. The server can embed the marked subtitle file into the target video frame to generate a marked video; when subsequently sent to the terminal for playback, the terminal directly plays the marked video. Alternatively, the server can package the marked subtitle file and the target video into a container as a marked video corresponding to the object identifier, so that the offset subtitles in the marked subtitle file can be selectively displayed during subsequent video playback. Alternatively, the server can store the marked subtitle file separately from the target video; when subsequently sent to the terminal for playback, the terminal can use a playback tool to load the offset subtitles from the marked subtitle file into the target video for playback.
[0105] In the above-mentioned method for generating tagged videos, the object identifier of the video player is mapped into an identifier sequence composed of identifiers through a certain logic. Based on the subtitle offset type corresponding to each identifier in the identifier sequence, the offset subtitles corresponding to each subtitle to be offset in the original subtitle file are obtained. The offset subtitles are then combined into a tagged subtitle file, thus forming a tagged video. By modifying the position of each subtitle to be offset in the original subtitle file, the original subtitle file is converted into a tagged subtitle file with embedded object identifiers. This is then used as a tagging information or watermark information and is hiddenly embedded in the video. Compared with the scheme of embedding in the image dimension, this method is more robust and is less susceptible to attacks such as scaling, cropping, degrading, and editing, thereby ensuring the detection rate and accuracy during source tracing.
[0106] Typically, an object identifier consists of multiple characters, including but not limited to one or more of numbers, letters, and special symbols. The server pre-defines the mapping relationship between each character and the identifier. For example, the character "a" corresponds to the identifier "0000", or the character "*" corresponds to the identifier "AAAB", etc. In some embodiments, mapping an object identifier to an identifier sequence includes: starting from the first character among the multiple characters, determining the identifier corresponding to each character sequentially; arranging the identifiers corresponding to each character according to a preset format to obtain an identifier sequence of a preset length.
[0107] Specifically, among the multiple characters constituting the object identifier, the server searches for the corresponding identifier sequentially, starting from the first character, according to a preset reading order, thus determining which identifier each character corresponds to. The reading order is not limited; it can be from the first character to the last or vice versa. However, for computer processing efficiency, it is usually set to read from the first character to the last. The server then arranges the identifiers according to a preset format, resulting in an identifier sequence of a certain length. Typically, the determined identifiers are arranged sequentially according to the order of their corresponding characters to form the final identifier sequence.
[0108] In the above embodiments, by mapping the object identifier to a fixed-length identifier sequence, different types of subtitle offset segments can be determined based on the identifier sequence, thereby obtaining a labeled video with indirectly embedded object identifiers. This achieves the embedding of subtitle watermarks into the video, facilitating subsequent source tracing.
[0109] In some embodiments, arranging the identifiers corresponding to each character according to a preset format to obtain an identifier sequence of a preset length includes: arranging the identifiers corresponding to each character according to a preset format; if the number of identifiers corresponding to all characters in the object identifier is less than a preset number, then padding is performed at the end of the arrangement using a pre-set padding identifier to obtain an identifier sequence of a preset length.
[0110] The padding identifier is a pre-defined identifier. To distinguish it from the identifiers corresponding to each subtitle offset type, it is usually set to a different character. For example, the identifiers corresponding to each subtitle offset type can be set to letters or numbers, while the padding identifier can be set to a special symbol, such as... Underscores "_", or unused letters or numbers, etc. Therefore, when the server reads the padding identifier, it can determine that the identifier does not correspond to a certain subtitle offset type, and the server can choose not to offset the subtitle. Of course, the padding identifier can also correspond to a subtitle offset type. For example, a pre-set mapping between padding identifiers and certain subtitle offset types can be used. When the server reads the padding identifier, it can offset the subtitle according to the subtitle offset type corresponding to that identifier.
[0111] Specifically, after arranging the identifiers corresponding to each character according to a preset format, if the resulting identifier sequence is less than a preset length, in other words, the number of identifiers corresponding to all characters in the object identifier is less than a preset number, the server pads the end of the arrangement (i.e., the identifier sequence of a certain length) with a certain number of padding identifiers to make the length of the identifier sequence reach the preset length.
[0112] In the above embodiments, by mapping the object identifier to a fixed-length identifier sequence, it is easier for the server to extract and determine the object identifier during subsequent tracing, thereby improving the accuracy of tracing.
[0113] It should be understood that subtitles have a time attribute. Generally, the subtitles in a subtitle file are arranged sequentially according to time. For example, the subtitle between 1 second and 2 seconds is the first subtitle, the subtitle between 3 seconds and 4 seconds is the second subtitle, and so on. Therefore, in some embodiments, determining the identifier corresponding to each subtitle to be offset in the original subtitle file includes: parsing the original subtitle file, splitting the subtitles in the original subtitle file into multiple subtitles to be offset; and determining the identifier corresponding to each subtitle to be offset in the original subtitle file based on the time sequence of each subtitle to be offset in the original subtitle file and the order of the identifiers in the identifier sequence.
[0114] Specifically, the server parses the original subtitle file according to its format and extracts all subtitles. The server then splits all subtitles in the original subtitle file according to the number of subtitles, obtaining the subtitles to be offset for subsequent subtitle offset processing.
[0115] For each subtitle to be offset, the server matches it one-to-one with the identifiers according to the subtitle's own timing and the order of the identifiers in the identifier sequence converted from the object identifier. This determines the identifier corresponding to each subtitle in the original subtitle file. For example, assuming the identifier sequence converted from the object identifier "abc" is the binary sequence "0000 0001 0010", the server parses the original subtitle file and splits it according to the number of subtitles. For each subtitle to be offset, the server matches the corresponding binary identifier sequentially according to the order of the subtitles and the order of the identifiers in the identifier sequence. For example, the first subtitle matches identifier "0", the second subtitle matches identifier "0", ... the eighth subtitle matches identifier "1", ... the eleventh subtitle matches identifier "1", and the twelfth subtitle matches identifier "0". If the number of subtitles is greater than the length of the identifier sequence, the matching starts again from the first identifier in the identifier sequence until all subtitles in the original subtitle file have been matched.
[0116] In the above embodiments, by assigning corresponding identifiers to each subtitle to be offset according to the timing of the subtitles in the original subtitle file, the efficiency of subtitle offset processing is improved.
[0117] As previously stated, the server can determine the subtitle offset type to which the identifier corresponding to the subtitle to be offset belongs before performing the corresponding subtitle offset on the subtitle to be offset, thereby obtaining the offset subtitle. Therefore, in some embodiments, obtaining the offset subtitle obtained by performing the corresponding subtitle offset on the subtitle to be offset according to the subtitle offset type to which the identifier corresponding to the subtitle to be offset belongs includes: for each subtitle to be offset, determining the subtitle offset type to be offset based on the identifier corresponding to the subtitle to be offset; performing the corresponding subtitle offset on the subtitle to be offset according to the subtitle offset type to be offset, thereby obtaining the offset subtitle belonging to the subtitle offset type.
[0118] Specifically, for each subtitle to be offset, the server determines the subtitle offset type based on the identifier corresponding to the subtitle and the pre-stored correspondence between identifiers and subtitle offset types. For example, suppose there are two pre-set subtitle offset types: identifier "0" corresponds to type A, indicating that the subtitle will be offset upwards by N pixels; identifier "1" corresponds to type B, indicating that the subtitle will be offset downwards by N pixels. For the subtitle to be offset at the 1st second, the server determines that it corresponds to the first identifier "0" in the identifier sequence, and therefore determines that the subtitle offset type corresponding to the subtitle to be offset is type A. The server then performs the corresponding subtitle offset according to this subtitle offset type to obtain an offset subtitle belonging to the specified offset type. For example, the server performs type A subtitle offset on the subtitle to be offset, that is, offsets the subtitle to be offset upwards by N pixels, resulting in an offset subtitle belonging to type A.
[0119] In the above embodiments, the subtitles to be offset are offset according to the identifiers in the identifier sequence obtained by mapping, which is more efficient than the subtitle embedding at the screen dimension; at the same time, the server does not need to pre-store various types of subtitles to be offset, saving storage resources.
[0120] Of course, the server can also pre-offset each subtitle to be offset in the original subtitle file with various subtitle offset types, obtain offset subtitles of each type and store them in the storage medium; then the specific type of offset subtitle can be extracted from the storage medium.
[0121] Therefore, in some embodiments, obtaining the offset subtitle obtained by performing corresponding subtitle offset on the subtitle to be offset according to the subtitle offset type to which the identifier to be offset belongs includes: determining the subtitle offset type corresponding to the identifier according to the identifier to be offset; and extracting the offset subtitle that corresponds to the identifier and belongs to the corresponding subtitle offset type from multiple pre-processed offset subtitles.
[0122] Specifically, the server can pre-process each subtitle in the original subtitle file according to various pre-set subtitle offset types, resulting in multiple pre-processed offset subtitles. For example, suppose there are three pre-set subtitle offset types: identifier "A" corresponds to type X, indicating that the subtitle is offset upwards by N pixels; identifier "B" corresponds to type Y, indicating that the subtitle is offset downwards by N pixels; and identifier "C" corresponds to type Z, indicating that the subtitle is offset to the left by N pixels. The server first parses the original subtitle file and splits it into multiple subtitles according to the number of subtitles. For each subtitle, the server performs three types of subtitle offset processing, i.e., it offsets one subtitle to be offset, resulting in three offset subtitles. The server stores the offset subtitles of each type in the storage medium. Thus, in the above steps, based on the identifier "B" corresponding to a subtitle to be offset with the content "XXX", the server determines the corresponding subtitle offset type to be type Y based on the identifier "B", and then extracts the offset subtitle of type Y corresponding to the identifier "B" from the pre-processed multiple offset subtitles. The content of the offset subtitle is the same as the subtitle to be offset, the only difference is the position.
[0123] In the above embodiments, offset subtitles corresponding to each subtitle offset type are obtained through preprocessing, and the corresponding offset subtitles are directly extracted in the subsequent process, which improves the efficiency of obtaining offset subtitles.
[0124] As mentioned above, since subtitles have a format, in some embodiments, determining a marked subtitle file based on multiple offset subtitles includes: splicing the offset subtitles according to the timing sequence corresponding to each offset subtitle based on a preset subtitle format to generate a marked subtitle file corresponding to the object identifier.
[0125] Specifically, the server concatenates each offset subtitle according to a preset subtitle format and its corresponding time sequence, generating a marked subtitle file corresponding to the object identifier. For example, the SRT subtitle format consists of one line of subtitle number, one line of time code, and one line of subtitle data. Therefore, the server converts the offset subtitles to the SRT format. Then, the server concatenates the converted offset subtitles, for example, by linking them together in chronological order. Finally, a subtitle generation tool performs the reverse subtitle generation process, resulting in an SRT-formatted marked subtitle file. The object identifier is implicitly embedded in the marked subtitle file through subtitle offsets. It should be noted that the format of the marked subtitle file generated by the server after concatenating the offset subtitles may be the same as or different from the original subtitle file.
[0126] In the above embodiments, by splicing the offset subtitles according to the time sequence corresponding to each offset subtitle, a marked subtitle file corresponding to the object identifier is generated. This makes it easier to extract the object identifier from the leaked video by detecting the subtitles in the future, thereby tracing the source of the leaked video.
[0127] After generating the tagged caption file, the server can either embed the tagged caption file into the target video frame to generate a tagged video, or package the tagged caption file and the target video as a tagged video and send it to the terminal for playback. Therefore, in some embodiments, the method further includes: adding each offset caption from the tagged caption file to a video frame of the target video; and generating a tagged video corresponding to the object identifier based on the video frame with the added offset captions.
[0128] Specifically, the server splits the target video into multiple video frames. For example, the server can use FFMPEG (Fast Forward MPEG, a multimedia video processing tool) for video encoding and decoding. FFMPEG is open-source free software capable of recording, converting, and streaming various audio and video formats. The server then adds each offset subtitle from the marker subtitle file to the corresponding video frames in the target video. For instance, if the first offset subtitle appears in frames 1 through 20, the server adds this offset subtitle to each of those frames. Based on the video frames with the added offset subtitles, the server re-encodes each video frame to generate a marker video corresponding to the object identifier.
[0129] In the above embodiments, a marked video is generated by adding each offset subtitle from the marked subtitle file to the target video, which facilitates the extraction of object identifiers from leaked videos by detecting subtitles in the future, thereby tracing the source of the leaked video.
[0130] The server can also package the tagged caption file and the target video into a single container and send it as a tagged video to the terminal. Therefore, in some embodiments, the method further includes sending the tagged caption file and the target video together to the terminal corresponding to the object identifier, so that the terminal can play the tagged video carrying the offset captions.
[0131] Specifically, the server packages the tagged subtitle file and the target video into one file and sends them together to the terminal corresponding to the object identifier. After receiving the file, the terminal parses it to obtain the tagged subtitle file and the target video, and displays the offset subtitle when playing the target video, that is, plays the tagged video carrying the offset subtitle.
[0132] In the above embodiments, by packaging the tagged subtitle file and the target video together and sending them to the terminal, it is easier to extract the complete tagged subtitle file during subsequent detection. This is beneficial for accurately extracting the object identifiers that are hidden and embedded through subtitle offset, thereby tracing the source of the video leak.
[0133] This application also provides an application scenario in which the above-described method for generating marked videos is applied. Specifically, the method for generating marked videos in this scenario is applied as follows: When a target user browses a video list on a video platform, they select a video they wish to watch via a terminal. In response to the target user's selection, the terminal sends a video playback request to the server. The server extracts video information, such as the video name, from the playback request, finds the corresponding target video and the associated original subtitle file in its database. Simultaneously, the server extracts the object identifier carried in the playback request, and based on the identifier sequence obtained from the object identifier, offsets the subtitles in the original subtitle file to obtain a marked subtitle file. The server packages the marked subtitle file and the target video into a marked video and returns the marked video to the terminal for playback. The video can be a pre-stored complete video, such as entertainment videos, educational videos, TV dramas, and short videos, etc., without limitation.
[0134] Based on the same inventive concept, embodiments of this application also provide a method for detecting video tags. In some embodiments, such as Figure 3 As shown, a method for detecting video tags is provided. The method is illustrated using a computer device as an example. Specifically, the computer device can be... Figure 1 The method for detecting video tags in a terminal or server includes the following steps:
[0135] Step S302: Obtain the video to be detected and determine the subtitle offset type corresponding to each video frame in the video to be detected.
[0136] Specifically, the computer device can search for and obtain the video to be tested on the Internet. The computer device then detects the subtitles in each frame of the video to determine the specific subtitle offset type corresponding to each frame. For example, the computer device can use tools such as FFMPEG to decode the video to be tested, obtaining all the video frames, and then determine the subtitle offset type corresponding to each frame.
[0137] Step S304: For each subtitle to be inspected in the video to be inspected, determine the subtitle offset type corresponding to each subtitle to be inspected based on the subtitle offset type corresponding to at least one video frame corresponding to the subtitle to be inspected.
[0138] Specifically, for each subtitle in the video to be detected (referred to as the subtitle to be detected for distinction), since a subtitle may exist in multiple video frames (for example, a subtitle exists in 40 video frames corresponding to the 1st to 3rd second), the server determines the subtitle offset type corresponding to each subtitle to be detected based on the subtitle offset type corresponding to at least one video frame corresponding to the subtitle to be detected.
[0139] Step S306: Based on the subtitle offset type corresponding to each subtitle to be inspected, determine the identifier corresponding to each subtitle to be inspected.
[0140] Specifically, the computer device determines the identifier corresponding to each subtitle to be inspected based on its subtitle offset type and according to the pre-stored correspondence between subtitle offset types and identifiers. For example, for a subtitle to be inspected, its subtitle offset type is type A, and the computer device pre-stores the subtitle offset type A as corresponding to the identifier "0". Therefore, the identifier corresponding to this subtitle to be inspected is determined to be "0". The computer device performs the above processing for each subtitle to be inspected, thereby obtaining the identifier corresponding to each subtitle to be inspected.
[0141] Step S308: Based on the identifiers corresponding to each subtitle to be inspected, determine the identifier sequence, and determine the object identifier marked in the video to be inspected based on the identifier sequence.
[0142] Specifically, the computer device extracts an identifier sequence from each subtitle to be inspected based on its corresponding identifier. Then, based on the extracted identifier sequence, the computer device performs a reverse mapping using pre-set mapping rules to obtain the object identifier, thereby determining the marked object identifier in the video to be inspected.
[0143] For example, the computer device determines that each subtitle to be inspected corresponds to multiple identifiers as "100101010010101001010...". Since the identifier sequence has a preset length and contains a preset number of identifiers, the computer device extracts the fixed-length, repetitive identifier "1001010" and determines it as the identifier sequence. Then, it performs a reverse mapping using a preset mapping rule to obtain the object identifier corresponding to the identifier sequence.
[0144] In the above-mentioned video tag detection method, the subtitle offset type corresponding to each video frame in the video to be detected is determined, and the subtitle offset type corresponding to each subtitle to be detected is determined. Then, based on the subtitle offset type corresponding to each subtitle to be detected, the identifier corresponding to each subtitle to be detected is determined, and the identifier sequence is determined. Based on the identifier sequence, the object identifier marked in the video to be detected is determined. This realizes the detection of watermark information indirectly embedded in the marked video through subtitle offset, and can determine the video playback party or leakage source according to the marked object identifier, ensuring the detection rate and accuracy during source tracing.
[0145] In some embodiments, determining the subtitle offset type corresponding to each video frame in the video to be detected includes: obtaining the original video corresponding to the video to be detected; and, under the same video frame dimension, determining the subtitle offset type corresponding to each video frame in the video to be detected based on the positional relationship between the subtitle to be detected in each video frame in the video to be detected and the original subtitle in the corresponding video frame in the original video.
[0146] Specifically, computer devices can retrieve the original video corresponding to the video to be detected from a copyright database. For example, a computer device can use video fingerprinting technology to search within a copyright video database to find the corresponding original copyrighted video. Video fingerprinting technology is a technique that uses computer vision, audio processing, and other technologies to reduce the dimensionality of video content into vectors, and can be used in scenarios such as video retrieval, video deduplication, and video recommendation.
[0147] Since the transmitted video to be tested may have been edited, scaled, stretched, etc., in order to ensure the accuracy of the detection, the computer equipment detects the position of the subtitle to be tested in each video frame of the video to be tested and the position of the original subtitle in the corresponding video frame of the original video under the same video frame dimension. Based on the positional relationship between the subtitle to be tested and the original subtitle, the subtitle offset type corresponding to each video frame in the video to be tested is determined.
[0148] For example, a computer device uses OCR technology to detect subtitles in video frames and determines the specific offset of the subtitle to be detected relative to the original subtitle. For instance, if the computer device detects that the subtitle to be detected has moved upward by 1 pixel compared to the original subtitle, then the computer device determines that the subtitle offset type corresponding to that video frame is type A; or, if the computer device detects that the subtitle to be detected has moved downward by 1 pixel compared to the original subtitle, then the computer device determines that the subtitle offset type corresponding to that video frame is type B. OCR (Optical Character Recognition) refers to the process of analyzing and recognizing textual materials in image files to obtain text and layout information. Therefore, by detecting the subtitle position in each video frame, the subtitle offset type corresponding to each video frame can be obtained.
[0149] The video frame dimension includes both temporal and spatial dimensions. Accordingly, in some embodiments, after obtaining the original video corresponding to the video to be detected, the method further includes aligning the video to be detected with the original video in both the temporal and spatial dimensions. Specifically, the computer device determines the video frames of the video to be detected and the original video corresponding to the same time along the same time axis for temporal alignment. Furthermore, the computer device also aligns the video frames of the video to be detected and the original video in the spatial dimension according to the same pixel coordinate system, for example, with the top-left corner as the origin.
[0150] In the above embodiments, by aligning the video to be detected and the original video, it is ensured that the video to be detected and the original video are in the same video frame dimension, thus avoiding errors in the position of the detected subtitles.
[0151] As mentioned above, since a subtitle may appear in multiple video frames, in some embodiments, for each subtitle to be detected in the video to be detected, the subtitle offset type corresponding to each subtitle to be detected is determined based on the subtitle offset type corresponding to at least one video frame corresponding to the subtitle to be detected. This includes: for each subtitle to be detected, determining at least one video frame containing the same subtitle content; determining the number of video frames in the at least one video frame that correspond to different subtitle offset types, and taking the subtitle offset type that corresponds to the most frames as the subtitle offset type corresponding to the current subtitle to be detected.
[0152] Specifically, the computer device first determines how many video frames each subtitle corresponds to. That is, the server identifies at least one video frame containing the same subtitle content. For example, if the same subtitle appears between the 3rd and 5th seconds of the video timeline, and there are 60 frames between those 60 seconds, the computer device determines that the subtitle corresponds to 60 video frames. The computer device then determines the number of video frames corresponding to different subtitle offset types within these frames. Specifically, the computer device counts the number of video frames under each type and uses the subtitle offset type with the most frames as the subtitle offset type corresponding to the current subtitle. For example, if type A is the most common, then the subtitle offset type corresponding to this video segment is determined to be type A, and the identifier corresponding to this video segment is determined to be A or an identifier corresponding to A. Thus, the computer device can extract the identifier sequence from the multiple identifiers corresponding to each subtitle, convert the identifier sequence into object identifiers, and thereby determine the source of the video leak.
[0153] In the above embodiments, by statistically analyzing the number of frames with different subtitle offset types to determine the identifier corresponding to the offset subtitle, the position detection of video frames has a certain fault tolerance and improves the accuracy of detecting video markers.
[0154] This application also provides an application scenario in which the above-described video tag detection method is applied. Specifically, the video tag detection method is applied in this scenario as follows: When a terminal plays a tagged video, the target object may perform operations such as video recording, video caching, video downloading, and video forwarding through the terminal. Since the object identifier is indirectly embedded in the tagged video through subtitle offset, the video recorded, cached, downloaded, and forwarded by the object also contains the object identifier.
[0155] Therefore, when a computer device obtains a leaked video, such as when it finds a video already in a copyright library online, the computer device can detect the video and extract the identifier sequence based on the positional relationship between the subtitles and the original subtitles in the copyright library, and convert it into an object identifier. This allows the source of the leaked video on the internet to be determined.
[0156] To better understand this application, a specific product example is provided below. Figures 4A to 4CAs shown, in a specific example, the video tag generation method in this application embodiment can be integrated with the tagged video detection method into a system. This system includes a watermark / subtitle generation module, a playback module, and a watermark detection module. These modules can be deployed on a single device or separately on different devices. For example, the playback module is deployed on a terminal or server, and the watermark / subtitle generation module can be deployed together with the playback module on the terminal or separately on the server. The watermark detection module can be deployed on a terminal or server.
[0157] The playback module receives video playback requests sent by the target device via the terminal and obtains the target device's identifier. This identifier is then passed to the subsequent watermark / subtitle generation module, where it is mapped and embedded as watermark information. Simultaneously, the playback module can also distribute watermarked subtitle files and videos to the terminal for playback. Specifically, for example... Figure 4A As shown, when an object requests to play a target video, the terminal sends the video playback request to the playback module. Based on the received video playback request, the playback module uses the player to request a playback sequence service, such as an M3U8 playback sequence, and extracts the original subtitle file through the playback sequence service so that it can request the subsequent watermark / subtitle generation module to generate the watermark / subtitle. The M3U8 file is essentially a playlist / sequence, which may be a media playlist or a master playlist. Regardless of the type of playlist, its internal text uses UTF-8 encoding. When the M3U8 file is used as a media playlist, its internal information records a series of media segment resources; playing these segments sequentially allows for the complete display of the multimedia resources.
[0158] The watermark subtitle generation module converts the original subtitle file into a watermark subtitle file with embedded object identifier watermark information by modifying the subtitle position information. Specifically, for example... Figure 4BAs shown, the watermark subtitle generation module converts the object identifier into an identifier sequence using a preset mapping rule. For example, the object identifier "abc" is mapped to the identifier sequence "00000001 0010". The module parses the original subtitle file, splits it according to the number of subtitles, and matches each subtitle with its corresponding binary identifier. For example, the first subtitle matches identifier "0", the second subtitle matches identifier "0", ... the 8th subtitle matches identifier "1", ... the 12th subtitle matches identifier "0". If the number of subtitles exceeds the length of the binary sequence, the matching restarts from the first position of the identifier sequence. Based on the identifier corresponding to each subtitle, the module adjusts the subtitle position. For example, for identifier "0", the module shifts the subtitle upwards by N pixels; conversely, for identifier "1", it shifts the subtitle downwards by N pixels. Finally, the module concatenates each adjusted subtitle to regenerate the watermarked subtitle file, i.e., the marked subtitle file.
[0159] The watermark subtitle generation module then sends the generated marked subtitle file to the playback module. The playback module uses the playback sequence service to integrate the watermarked subtitle file (i.e. marked subtitle file) into the M3U8 playback sequence, so that the M3U8 playback sequence can be sent to the player, and the player can send it to the terminal to play the marked video.
[0160] When detecting watermarks later, such as Figure 4C As shown, the watermark detection module uses video fingerprinting technology to search the copyright database for the video to be detected, finding the corresponding original video. This original video can be understood as the original copyrighted video, i.e., a video without subtitle watermarks and with normally compressed subtitles. Then, the watermark detection module aligns the video to be detected with the original video in both temporal and spatial dimensions. For example, the video alignment process can be as follows: Figure 5 As shown, the watermark detection module first uses video fingerprinting technology to retrieve the corresponding original video from the video copyright library, and then aligns the original video on the timeline (i.e., the time dimension). Since the video to be detected and the original video are on the same time dimension, the server aligns them on the spatial dimension to obtain the alignment result.
[0161] After alignment, the watermark detection module sends the frame to be detected in the video to the corresponding frame in the original video to the OCR module for subtitle detection. Then, it compares the positional relationship of the detection frames (top / bottom / unchanged), thereby determining the positional relationship between the subtitle in the video to be detected and the original subtitle in the original video. Based on this positional relationship, the watermark detection module can determine whether a subtitle watermark is embedded in the frame and the corresponding subtitle offset type.
[0162] Continue to refer to Figure 4C Based on the alignment results, the watermark detection module splits the video to be detected and the original video into matching image pairs, that is, the frame to be detected and the frame of the original video corresponding to the same video frame. These image pairs are then sent to the OCR module for subtitle position detection. The detected subtitle position information (i.e., the subtitle position in the frame to be detected and the subtitle position in the original frame) is compared. Based on the positional relationship of the comparison, it can be determined whether the subtitle offset type of the video frame is type A or type B. Then refer to... Figure 4C Next, the watermark detection module performs result fusion for the same subtitle. That is, for multiple video frames corresponding to the same subtitle, a voting process is used to determine the type of subtitle offset that corresponds to the most video frames. Then, the watermark detection module can determine the identifier corresponding to that subtitle based on the pre-stored correspondence between subtitle offset types and identifiers. Based on the identifiers corresponding to each subtitle, the watermark detection module can detect the identifier sequence, and through inverse mapping, determine the object identifier embedded in the video to be detected, thus obtaining the detection result.
[0163] Therefore, by detecting the position of subtitles and extracting object identifiers from marked videos, the process of playing, disseminating, and leaking marked videos can be traced, which is beneficial to the protection of video copyright.
[0164] It should be understood that although the steps in the flowcharts of the above embodiments are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the above embodiments may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.
[0165] Based on the same inventive concept, this application also provides a device for generating marked videos. The solution provided by this device is similar to the solution described in the above method. Therefore, the specific limitations of one or more embodiments of the marked video generation device provided below can be found in the limitations of the marked video generation method above, and will not be repeated here.
[0166] In some embodiments, such as Figure 6As shown, a device for generating labeled videos is provided. This device can be a software module, a hardware module, or a combination of both integrated into a computer device. Specifically, the device includes: an acquisition module 601, a mapping module 602, and a determination module 603, wherein:
[0167] The acquisition module 601 is used to acquire the object identifier and determine the original subtitle file of the target video requested by the object identifier.
[0168] The mapping module 602 is used to map object identifiers to identifier sequences and determine the subtitle offset type corresponding to each identifier in the identifier sequence.
[0169] The determination module 603 is used to determine the identifiers corresponding to each subtitle to be offset in the original subtitle file.
[0170] The determination module 603 is also used to obtain the offset subtitle obtained by performing the corresponding subtitle offset on the subtitle to be offset according to the subtitle offset type to which the identifier corresponding to the subtitle to be offset belongs.
[0171] The determination module 603 is also used to determine the marker subtitle file based on multiple offset subtitles; the marker subtitle file is used together with the target video to form a marker video corresponding to the object identifier.
[0172] In some embodiments, the object identifier consists of multiple characters, and the mapping module is further configured to determine the identifier corresponding to each character sequentially, starting from the first and second characters; and arrange the identifiers corresponding to each character according to a preset format to obtain an identifier sequence of a preset length.
[0173] In some embodiments, the mapping module is further configured to arrange the identifiers corresponding to each character according to a preset format. If the number of identifiers corresponding to all characters in the object identifier is less than the preset number, the identifiers are padded at the end of the arrangement by a pre-set padding identifier to obtain an identifier sequence of a preset length.
[0174] In some embodiments, the determining module is further configured to parse the original subtitle file, split the subtitles in the original subtitle file into multiple subtitles to be offset, and determine the identifier corresponding to each subtitle to be offset in the original subtitle file based on the timing of each subtitle to be offset in the original subtitle file and the order of the identifiers in the identifier sequence.
[0175] In some embodiments, the determining module is further configured to, for each subtitle to be offset, determine the subtitle offset type to be offset based on the identifier corresponding to the subtitle to be offset; and perform corresponding subtitle offset on the subtitle to be offset according to the subtitle offset type to be offset, thereby obtaining the offset subtitle belonging to the subtitle offset type.
[0176] In some embodiments, the determining module is further configured to determine the subtitle offset type corresponding to the identifier based on the identifier corresponding to the subtitle to be offset; and extract the offset subtitle that corresponds to the identifier and belongs to the corresponding subtitle offset type from the pre-processed multiple offset subtitles.
[0177] In some embodiments, the determining module is further configured to splice the offset subtitles according to the timing sequence corresponding to each offset subtitle based on a preset subtitle format, and generate a marked subtitle file corresponding to the object identifier.
[0178] In some embodiments, the above apparatus further includes a first sending module, configured to add each offset subtitle in the marked subtitle file to a video frame of the target video; and generate a marked video corresponding to the object identifier based on the video frame with the added offset subtitles.
[0179] In some embodiments, the above-described apparatus further includes a second sending module, configured to send the tagged subtitle file and the target video together to a terminal corresponding to the object identifier, so that the terminal can play the tagged video carrying offset subtitles.
[0180] Specific limitations regarding the device for generating labeled videos can be found in the limitations on the method for generating labeled videos described above, and will not be repeated here. Each module in the aforementioned device for generating labeled videos can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in hardware or independently of the processor in a computer device, or stored in software in the memory of a computer device, so that the processor can call and execute the operations corresponding to each module.
[0181] Based on the same inventive concept, this application also provides a video tag detection device. The solution provided by this device is similar to the solution described in the above method. Therefore, the specific limitations of one or more video tag detection device embodiments provided below can be found in the limitations of the video tag detection method above, and will not be repeated here.
[0182] In some embodiments, such as Figure 7 As shown, a video marker detection device is provided. This device can be a software module, a hardware module, or a combination of both integrated into a computer device. Specifically, the device includes: an acquisition module 701 and a determination module 702, wherein:
[0183] The acquisition module 701 is used to acquire the video to be detected and determine the subtitle offset type corresponding to each video frame in the video to be detected.
[0184] The determination module 702 is used to determine the subtitle offset type corresponding to each subtitle in the video to be detected based on the subtitle offset type corresponding to at least one video frame corresponding to the subtitle.
[0185] The determination module 702 is also used to determine the identifier corresponding to each subtitle to be inspected based on the subtitle offset type corresponding to each subtitle to be inspected.
[0186] The determination module 702 is also used to determine the identifier sequence based on the identifiers corresponding to each subtitle to be detected, and to determine the object identifier marked in the video to be detected based on the identifier sequence.
[0187] In some embodiments, the determining module is further configured to acquire the original video corresponding to the video to be detected; and, under the same video frame dimension, determine the subtitle offset type corresponding to each video frame in the video to be detected based on the positional relationship between the subtitle to be detected in each video frame in the video to be detected and the original subtitle in the corresponding video frame in the original video.
[0188] In some embodiments, the video frame dimensions include a temporal dimension and a spatial dimension. The apparatus further includes an alignment module for aligning the video to be detected with the original video in the temporal dimension and the spatial dimension, respectively.
[0189] In some embodiments, the determining module is further configured to, for each subtitle to be inspected, determine at least one video frame containing the same subtitle content; determine the number of video frames in the at least one video frame that correspond to different subtitle offset types respectively, and take the subtitle offset type that corresponds to the most frames as the subtitle offset type corresponding to the current subtitle to be inspected.
[0190] Specific limitations regarding the video marker detection device can be found in the limitations of the video marker detection method described above, and will not be repeated here. Each module in the aforementioned video marker detection device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in hardware or independently of the processor in a computer device, or stored in software in the memory of a computer device, so that the processor can call and execute the corresponding operations of each module.
[0191] In some embodiments, a computer device is provided, which may be a terminal or a server, and its internal structure diagram may be as follows: Figure 8As shown, this computer device includes a processor, memory, input / output (I / O) interfaces, and a communication interface. The processor, memory, and I / O interfaces are connected via a system bus, and the communication interface is also connected to the system bus via the I / O interfaces. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides the environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The I / O interfaces are used for exchanging information between the processor and external devices. The communication interface is used for communicating with external terminals via a network connection. When executed by the processor, the computer program implements a method for generating marked videos or a method for detecting video marks.
[0192] Those skilled in the art will understand that Figure 8 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.
[0193] In some embodiments, a computer device is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the various method embodiments corresponding to the above-described method for generating marked videos.
[0194] In some embodiments, a computer device is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the various method embodiments corresponding to the above-described video tag detection method.
[0195] In some embodiments, a computer-readable storage medium is provided storing a computer program that, when executed by a processor, implements the steps in the various method embodiments corresponding to the above-described method for generating marked videos.
[0196] In some embodiments, a computer-readable storage medium is provided storing a computer program that, when executed by a processor, implements the steps in the various method embodiments corresponding to the video tag detection method described above.
[0197] In some embodiments, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps in the various method embodiments corresponding to the above-described method for generating marked videos.
[0198] In some embodiments, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps in the various method embodiments corresponding to the video tag detection method described above.
[0199] It should be noted that the object information (including but not limited to account information, ID, coding information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this application are all information and data authorized by the object or fully authorized by all parties.
[0200] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.
[0201] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0202] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.
Claims
1. A method for generating labeled videos, characterized in that, The method includes: Obtain the object identifier and determine the original subtitle file of the target video requested through the object identifier; the object identifier consists of multiple characters; Starting from the first character among the plurality of characters, determine the identifier corresponding to each character in sequence; The identifiers corresponding to each character are arranged according to a preset format. If the number of identifiers corresponding to all characters in the object identifier is less than the preset number, the identifiers are padded at the end of the arrangement by a pre-set padding identifier to obtain an identifier sequence of a preset length. Determine the subtitle offset type corresponding to each identifier in the identifier sequence; Determine the identifier corresponding to each subtitle to be offset in the original subtitle file; Obtain the offset subtitle by performing the corresponding subtitle offset on the subtitle to be offset according to the subtitle offset type to which the identifier to be offset belongs; A marker caption file is determined based on multiple offset captions; the marker caption file is used together with the target video to form a marker video corresponding to the object identifier.
2. The method according to claim 1, characterized in that, The step of determining the identifiers corresponding to each subtitle to be offset in the original subtitle file includes: The original subtitle file is parsed, and the subtitles in the original subtitle file are split according to the number of lines to obtain multiple subtitles to be offset; Based on the timing of each subtitle to be offset in the original subtitle file and the order of the identifiers in the identifier sequence, the identifier corresponding to each subtitle to be offset in the original subtitle file is determined.
3. The method according to claim 1, characterized in that, The step of obtaining the offset subtitle by performing corresponding subtitle offset on the subtitle to be offset according to the subtitle offset type to which the identifier corresponding to the subtitle to be offset belongs includes: For each subtitle to be offset, the subtitle offset type to be offset is determined based on the identifier corresponding to the subtitle to be offset; According to the subtitle offset type to be offset, the subtitle to be offset is offset accordingly to obtain the offset subtitle belonging to the subtitle offset type.
4. The method according to claim 1, characterized in that, The step of obtaining the offset subtitle by performing corresponding subtitle offset on the subtitle to be offset according to the subtitle offset type to which the identifier corresponding to the subtitle to be offset belongs includes: Based on the identifier corresponding to the subtitle to be offset, determine the subtitle offset type corresponding to the identifier; From the pre-processed multiple offset subtitles, extract the offset subtitles of the subtitle offset type corresponding to the identifier.
5. The method according to claim 1, characterized in that, The process of determining the marked subtitle file based on multiple offset subtitles includes: Based on the preset subtitle format, the offset subtitles are spliced together according to the timing of each offset subtitle to generate a marked subtitle file corresponding to the object identifier.
6. The method according to any one of claims 1 to 5, characterized in that, The method further includes: Each offset subtitle in the marked subtitle file is added to a video frame of the target video; Generate a tagged video corresponding to the object identifier based on the video frames with added offset subtitles.
7. The method according to any one of claims 1 to 5, characterized in that, The method further includes: The tagged subtitle file and the target video are sent together to the terminal corresponding to the object identifier, so that the terminal can play the tagged video carrying the offset subtitles.
8. A method for detecting video tags, characterized in that, The method includes: Obtain the video to be tested; Obtain the original video corresponding to the video to be detected; Under the same video frame dimension, based on the positional relationship between the subtitles to be detected in each video frame of the video to be detected and the original subtitles in the corresponding video frames of the original video, the subtitle offset type corresponding to each video frame of the video to be detected is determined; For each subtitle to be inspected, identify at least one video frame containing the same subtitle content; Determine the number of video frames in the at least one video frame that correspond to different subtitle offset types, and take the subtitle offset type with the most frames as the subtitle offset type corresponding to the current subtitle to be detected; Based on the subtitle offset type corresponding to each subtitle to be inspected, determine the identifier corresponding to each subtitle to be inspected; Based on the identifiers corresponding to each of the subtitles to be inspected, an identifier sequence is determined, and based on the identifier sequence, the object identifier marked in the video to be inspected is determined.
9. The method according to claim 8, characterized in that, The video frame dimensions include a temporal dimension and a spatial dimension. After obtaining the original video corresponding to the video to be detected, the method further includes: aligning the video to be detected with the original video in both the temporal and spatial dimensions.
10. An apparatus for generating labeled videos, characterized in that, The device includes: The acquisition module is used to acquire an object identifier and determine the original subtitle file of the target video requested through the object identifier; the object identifier consists of multiple characters. The mapping module is used to determine the identifier corresponding to each character in sequence, starting from the first character among the multiple characters; arrange the identifiers corresponding to each character according to a preset format; if the number of identifiers corresponding to all characters in the object identifier is less than the preset number, then padding is performed at the end of the arrangement using a pre-set padding identifier to obtain an identifier sequence of preset length; and determine the subtitle offset type corresponding to each identifier in the identifier sequence. The determination module is used to determine the identifiers corresponding to each subtitle to be offset in the original subtitle file; The determining module is further configured to obtain the offset subtitle obtained by performing corresponding subtitle offset on the subtitle to be offset according to the subtitle offset type to which the identifier corresponding to the subtitle to be offset belongs; The determining module is further configured to determine a marker subtitle file based on multiple offset subtitles; the marker subtitle file is used together with the target video to form a marker video corresponding to the object identifier.
11. The apparatus for generating marked video according to claim 10, characterized in that, The determining module is further configured to parse the original subtitle file, split the subtitles in the original subtitle file into multiple subtitles to be offset, and determine the identifier corresponding to each subtitle to be offset in the original subtitle file based on the timing of each subtitle to be offset in the original subtitle file and the order of the identifiers in the identifier sequence.
12. The apparatus for generating marked video according to claim 10, characterized in that, The determining module is also used to determine the subtitle offset type to be offset for each subtitle to be offset, based on the identifier corresponding to the subtitle to be offset; According to the subtitle offset type to be offset, the subtitle to be offset is offset accordingly to obtain the offset subtitle belonging to the subtitle offset type.
13. The apparatus for generating marked video according to claim 10, characterized in that, The determining module is further configured to determine the subtitle offset type corresponding to the identifier based on the identifier corresponding to the subtitle to be offset; and extract the offset subtitle with the subtitle offset type corresponding to the identifier from the pre-processed multiple offset subtitles.
14. The apparatus for generating marked video according to claim 10, characterized in that, The determining module is also used to splice the offset subtitles according to the timing sequence corresponding to each offset subtitle based on a preset subtitle format, and generate a marked subtitle file corresponding to the object identifier.
15. The apparatus for generating marked video according to any one of claims 10 to 14, characterized in that, The device further includes a first sending module, used to add each offset subtitle in the marked subtitle file to a video frame of the target video; and to generate a marked video corresponding to the object identifier based on the video frame with the added offset subtitles.
16. The apparatus for generating marked video according to any one of claims 10 to 14, characterized in that, The device further includes a second sending module, used to send the marked subtitle file and the target video together to a terminal corresponding to the object identifier, so that the terminal can play the marked video carrying the offset subtitle.
17. A device for detecting video markers, characterized in that, The device includes: The acquisition module is used to acquire the video to be detected; acquire the original video corresponding to the video to be detected; and, under the same video frame dimension, determine the subtitle offset type corresponding to each video frame of the video to be detected based on the positional relationship between the subtitle to be detected in each video frame of the video to be detected and the original subtitle in the corresponding video frame of the original video. The determination module is used to determine, for each subtitle to be inspected, at least one video frame containing the same subtitle content; determine the number of video frames in the at least one video frame that correspond to different subtitle offset types respectively, and take the subtitle offset type that corresponds to the most frames as the subtitle offset type corresponding to the current subtitle to be inspected; The determining module is further configured to determine the identifier corresponding to each of the subtitles to be inspected based on the subtitle offset type corresponding to each subtitle to be inspected. The determining module is further configured to determine an identifier sequence based on the identifiers corresponding to each of the subtitles to be inspected, and to determine the object identifier marked in the video to be inspected based on the identifier sequence.
18. The video marker detection device according to claim 17, characterized in that, The video frame dimensions include a time dimension and a spatial dimension. The device also includes an alignment module for aligning the video to be detected with the original video in the time dimension and the spatial dimension, respectively.
19. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 9.
20. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 9.
21. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 9.