A storage, scheduling and detection method and system in a forgery identification scenario

CN116996732BActive Publication Date: 2026-06-19CHINA ACADEMY OF INFORMATION & COMM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA ACADEMY OF INFORMATION & COMM
Filing Date
2023-08-11
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies for detecting spoofing in IDC egress traffic suffer from problems such as large data volume, diverse encoding formats, long processing time, and high concurrency access, making it impossible to effectively monitor live network traffic in real time.

Method used

By constructing a file download verification model and a step matching model, a collection and analysis identifier is generated, data is encapsulated and identified, and real-time detection is performed using a stream processing computing framework and a distributed message middleware. The identification is then performed in conjunction with an audio, video, and image authentication engine cluster, generating an identification identifier and filtering or reporting the data.

Benefits of technology

It enables real-time monitoring of IDC outbound traffic, improves the detection efficiency of forged audio, video and image data, ensures the security and integrity of data storage, reduces storage space waste, and improves system performance and data management efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116996732B_ABST
    Figure CN116996732B_ABST
Patent Text Reader

Abstract

This invention discloses a storage and scheduling detection method and system for forgery identification scenarios, relating to the field of file data analysis technology. The method comprises the following steps: S1, collecting and processing file data from a specified file to generate a file list, the file data including file metadata and multimedia data; S2, constructing a file download verification model to verify the download of the file list, determining the collection result based on the correctness and extent of the collected data in the file list, and generating a collection analysis identifier; S3, constructing a step matching model to select a sending step based on the collection analysis identifier, the sending step including a data encapsulation step and a data error reporting step; S4, storing the file data processed by the data encapsulation and sending step; S5, forwarding the stored file data and performing engine cluster authentication to generate a data authentication identifier; and S6, generating an authentication target based on the data authentication identifier and the collection analysis identifier.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of document data analysis, and more specifically to a storage, scheduling, and detection method and system for forgery identification scenarios. Background Technology

[0002] With the widespread adoption of generative AI on the internet, a large amount of fake audio, video and image data has emerged. This fake audio, video and image data comes from various websites and APP systems, and its backend is hosted in IDC data centers. When users access the site, the relevant data is uniformly output to the client as IDC egress traffic.

[0003] A search revealed a management system for monitoring IDC data security, based on prior art publication CN114884726B. This prior art virtualizes the IDC through a virtual construction module, ensuring user access to and processing of IDC data, facilitating user management, and providing a control measure for data centers that can be registered. By setting up a data acquisition module and deploying traffic probes, it provides a data acquisition method that does not affect the data center's outbound bandwidth, ensuring that data acquisition does not affect the normal operation of the IDC. The data management module ensures effective management of IDC data security. Furthermore, by setting key points in areas with high traffic based on node level and characteristics, redundancy in traffic probe settings is avoided, thus ensuring efficient data acquisition.

[0004] Analysis of existing online audio, video, and image data using comparative document processing techniques revealed the following shortcomings:

[0005] 1. Forgery detection of IDC outbound traffic requires testing the full volume of outbound data collected. This type of data is large in volume, has diverse encoding formats, and spreads rapidly, requiring timely processing to obtain the most accurate detection results.

[0006] 2. In the process of detecting spoofing of traffic at the IDC egress point, store-and-forward, as the core hub controlling the entire data processing flow, suffers from problems such as long processing time and high concurrency access, which makes it impossible to effectively monitor the live network traffic in real time during the detection process. Summary of the Invention

[0007] The purpose of this invention is to provide a storage, scheduling, and detection method and system for counterfeit identification scenarios, in order to address the shortcomings in the prior art.

[0008] To achieve the above objectives, the present invention provides the following technical solution: the storage and scheduling detection method in the forgery identification scenario includes the following steps:

[0009] S1, Collect and process file data for the specified file to generate a file list, wherein the file data includes file metadata and multimedia data;

[0010] S2, construct a file download verification model, verify the download of the file list, determine the collection results based on the correctness and degree of collection of the file data in the file list, and generate collection analysis identifiers;

[0011] S3, construct a step matching model, select the sending steps by collecting and analyzing identifiers, the sending steps include data encapsulation steps and data error reporting steps;

[0012] S4, store the file data that has been processed in the data encapsulation and sending step;

[0013] S5 forwards the stored file data and performs engine cluster authentication to generate a data authentication identifier;

[0014] S6, Generate identification targets based on data identification identifiers and collection and analysis identifiers, wherein the identification targets include real identification targets and false identification targets;

[0015] S7, perform data filtering or upload metadata based on the identification target.

[0016] In a preferred embodiment, the file metadata in the file data specifically refers to data used to describe the file, stored in a corresponding metadata file, which provides information about the file content, attributes, structure, and format, rather than the actual multimedia content. The file metadata provides important information about the file, which can be used for file management, indexing, searching, identification, and recognition.

[0017] The multimedia data in the file data specifically refers to actual audio, video, and image data content, mainly perceptible media content, stored in corresponding multimedia files. The composition of the multimedia data depends on different file types. Audio data contains audio sampling information, video data contains a series of consecutive image frames, each frame consisting of the color and position information of pixels, and image data consisting of the color and position information of pixels. Users can play audio, video, or display images by parsing this multimedia data, and can also use the multimedia data for multimedia authentication.

[0018] In a preferred embodiment, the step of generating the document list is as follows:

[0019] The Python programming language and platform are selected to implement the data monitoring function; the file directory to be monitored is determined, which can be a folder on the local computer or a shared folder on the network; the file monitoring interface provided by the programming language is used to monitor the specified file directory. The interfaces provided by different programming languages ​​and platforms may vary, but they generally support monitoring events such as folder creation, deletion, and modification; metadata files are parsed: when a new metadata file is detected, it is checked whether the filename does not end with TMP. If it does not end with TMP, it means that this is a metadata file that needs to be parsed; for the file data that needs to be parsed, the parsing operation is performed to extract the file list information. The file metadata contains relevant information about the file where the multimedia data to be collected is located, such as the filename and file path;

[0020] Based on the extracted file list information, multimedia data, specifically audio, video, and image data, is collected through file paths and other information. The collected multimedia data is then stored, and the multimedia files are saved to the local disk or uploaded to a network storage service. The above steps are put into a loop to continuously monitor the file directory and collect the number of files.

[0021] Establish a file cleanup cycle; the cleanup frequency can be adjusted based on data growth rate and storage needs. Organize and archive stored files according to certain classification standards, classifying them based on attributes such as file type, creation time, last access time, and size, categorizing files into active data and archived data. Delete expired data; periodically check stored files and delete expired or no longer needed data, including temporary files, log files, or outdated backup files. Back up important data; before cleanup, ensure that important data has been backed up to other storage media or the cloud to prevent accidental deletion or data loss. Compress archived data; compress large archived data to save storage space. Deduplicate files; detect and delete duplicate files to prevent redundant data. Audit files; record files deleted or modified during cleanup for future tracking and review. Set up a file recovery mechanism; ensure a suitable file recovery mechanism is in place before file cleanup.

[0022] In a preferred embodiment, the data acquisition and analysis identifier specifically includes a complete data acquisition identifier and a partial data acquisition identifier, and the steps for generating the data acquisition and analysis identifier are as follows:

[0023] The accuracy of the collected file data is the criterion for determining whether the file data has been downloaded correctly, and the degree of file data collection is the criterion for determining whether the file data has been completely collected.

[0024] The file download verification model analyzes the correctness of the collected file data to generate a data correctness label, analyzes the degree of data collection to generate a degree of data collection label, and combines the data correctness label and degree of data collection label in the same file data to generate a data collection analysis identifier for that file data.

[0025] The data accuracy markers include data accuracy markers and data error markers;

[0026] If the download verification process of the file list outputs the correct download result, the file data is marked as correctly collected data.

[0027] If a download error is output during the download verification of the file list, an error flag for the collected data should be generated for that file.

[0028] The acquisition level markers include complete acquisition markers and incomplete acquisition markers;

[0029] If the data collection progress is not 100% during the download and verification of the file list, an incomplete collection marker will be generated for that file.

[0030] If the data collection progress reaches 100% during the download and verification of the file list, a complete collection mark will be generated for that file's data.

[0031] For the same file data, perform tagging analysis. If the same file list has both a correctly collected data tag and a partially collected data tag, a data error tag and a fully collected data tag, or a data error tag and a partially collected data tag, generate a partially collected data tag for that file data.

[0032] For the same file data, perform tagging analysis. If the same file list has both a correct data collection tag and a complete data collection tag, generate a complete data collection identifier for that file data.

[0033] The matching logic for the sending step is as follows:

[0034] Construct a step-matching model, and identify the collection and analysis identifiers in the file data—matching the steps.

[0035] Specifically, when the file download verification model generates a complete acquisition identifier for the file data, the step matching model matches the data encapsulation steps based on the complete acquisition identifier in the file data;

[0036] When the file download verification model generates an incomplete collection identifier for the file data, the step matching model matches the data error step according to the complete collection identifier in the file data.

[0037] The data encapsulation step involves packaging the file data for subsequent steps, while the data error reporting step directly displays the error on the display terminal.

[0038] In a preferred embodiment, the data storage step is as follows:

[0039] When the file data involved contains multimedia data and file metadata, the multimedia data specifically includes audio, video, and image data. The file data is encoded and converted based on the URL via the FTP protocol, and the sampling rate is adjusted before being temporarily stored locally.

[0040] In a preferred embodiment, the data authentication identifier includes a genuine authentication identifier, an error authentication identifier, and a fake authentication identifier, and the step of generating the data authentication identifier is as follows:

[0041] Based on a stream processing computing framework and a distributed message middleware, the file type corresponding to the notification is determined in real time. The file types are audio, video and image. For different types of files, requests are forwarded to different types of anti-counterfeiting engine clusters for processing and to generate initial identification tags. Through the integration and analysis of the initial identification tags, data identification tags are generated for the file data.

[0042] The initial identification markers include audio identification markers, video identification markers, and image identification markers;

[0043] The audio identification markers include genuine audio markers and fake audio markers;

[0044] The audio data is analyzed by an audio authentication engine cluster. If the result is identified as genuine audio, a genuine audio tag is generated for the file data. If the audio data is analyzed by the audio authentication engine cluster and the result is identified as fake audio, a fake audio tag is generated for the file data.

[0045] The video identification markers include real video markers and fake video markers;

[0046] The video authentication engine cluster analyzes the video data. If the result is a genuine video, a genuine video tag is generated for the file data. If the result is a fake video, a fake video tag is generated for the file data.

[0047] The image identification markers include real image markers and fake image markers;

[0048] The image data is analyzed by an image authentication engine cluster. If the result is a real image, a real image label is generated for the file data. If the image data is analyzed by the image authentication engine cluster and the result is a fake image, a fake image label is generated for the file data.

[0049] The audio identification tags, video identification tags, and image identification tags in the file data are integrated and processed.

[0050] When the file data contains only one type of data: audio, video, or image;

[0051] If the file data contains one of the corresponding types of false audio tags, false video tags, or false image tags, then a false identification identifier is generated for the file data.

[0052] If the file data contains one of the corresponding types of audio authenticity markers, video authenticity markers, or image authenticity markers, then an authenticity identification identifier is generated for the file data.

[0053] When the data in this file contains two types of data, it can be classified as either audio and video, audio and image, or video and image.

[0054] If the file data contains both audio and video authenticity markers, audio and image authenticity markers, or video and image authenticity markers, then an authenticity identification identifier is generated for the file data.

[0055] If one or more of the two types of data in the file have a false initial identification mark, then a false identification mark will be generated for the file data.

[0056] When the file contains audio, video, and image data;

[0057] If the file data contains audio authenticity markers, video authenticity markers, and image authenticity markers simultaneously, an authenticity identification identifier is generated for the file data.

[0058] If only one of the three types of data in the file has a false initial identification mark, an error identification mark is generated for the file data.

[0059] If two or more false initial identification markers are present in the three types of data in the file, a false identification marker will be generated for the file data.

[0060] In a preferred embodiment, the identification targets include true identification targets, error identification targets, and false identification targets, and the generation logic of the identification targets is as follows:

[0061] When the same file contains both a complete collection identifier and a genuine identification identifier, a genuine identification target is generated for that text data.

[0062] When the same file contains both a complete acquisition identifier and an error identification identifier, an error identification target is generated for that text data.

[0063] When the same file data contains an incomplete collection identifier and a true identification identifier, an incomplete collection identifier and an error identification identifier, or an incomplete collection identifier and a false identification identifier, a false identification target is generated for that text data.

[0064] The logic for reporting file metadata is as follows:

[0065] If the generated file data contains a genuine identification target, then file metadata is reported; if the generated file data contains a false identification target or an error identification target, then data filtering is performed and an error is displayed on the display terminal.

[0066] This invention also provides a storage and scheduling detection system for counterfeit identification scenarios, specifically including:

[0067] File data analysis module: used to collect and process file data from specified files and generate a file list;

[0068] Data Acquisition Analysis Identifier Generation Module: Used to determine the accuracy and extent of data acquisition based on the file data in the file list and generate data acquisition analysis identifiers;

[0069] Sending Step Matching Module: Used to select sending steps for the data collection and analysis identifier, wherein the sending steps include data encapsulation steps and data error reporting steps;

[0070] File data storage module: Used to store the file data that has been processed during the data encapsulation and sending steps;

[0071] Data identification identifier generation module: used to forward stored file data and perform engine cluster identification, and generate data identification identifiers;

[0072] Identification Target Generation Module: Generates identification targets based on data identification identifiers and collection and analysis identifiers. The identification targets include real identification targets and false identification targets.

[0073] File data reporting module: Reports file metadata or filters data based on the results generated from the identification target.

[0074] In a preferred embodiment, the file metadata in the file data specifically refers to data used to describe the file, stored in a corresponding metadata file, which provides information about the file content, attributes, structure, and format, rather than the actual multimedia content. The file metadata provides important information about the file, which can be used for file management, indexing, searching, identification, and recognition.

[0075] The multimedia data in the file data specifically refers to actual audio, video, and image data content, mainly perceptible media content, stored in corresponding multimedia files. The composition of the multimedia data depends on different file types. Audio data contains audio sampling information, video data contains a series of consecutive image frames, each frame consisting of the color and position information of pixels, and image data consisting of the color and position information of pixels. Users can play audio, video, or display images by parsing this multimedia data, and can also use the multimedia data for multimedia authentication.

[0076] The steps for generating the file list are as follows:

[0077] Choosing a programming language and platform: Select Python as the programming language and platform to implement the data monitoring function;

[0078] Set file directory: Determine the file directory to be monitored. The file directory can be a folder on the local computer or a shared folder on the network.

[0079] File monitoring: Use the file monitoring interface provided by the programming language to monitor the specified file directory. The interfaces provided by different programming languages ​​and platforms may be different, but they usually support monitoring events such as folder creation, deletion, and modification.

[0080] Parse metadata files: When a new metadata file is detected, check if the filename does not end with TMP. If it does not end with TMP, it means that this is a metadata file that needs to be parsed.

[0081] Extracting the file list: For the file data that needs to be parsed, the parsing operation is performed to extract the file list information. The file metadata contains relevant information about the file where the multimedia data to be collected is located, such as the file name and file path.

[0082] Multimedia data collection: Based on the extracted file list information, multimedia data, specifically audio, video, and image data, is collected through file paths and other information.

[0083] Storing multimedia data: The collected multimedia data is stored, and the multimedia files are saved to the local disk or uploaded to a network storage service.

[0084] Loop monitoring: Put the above steps into a loop to continuously monitor the file directory and collect the number of files;

[0085] Regular file cleanup: Establish a file cleanup cycle; the cleanup frequency can be adjusted based on data growth rate and storage needs. Organize and archive stored files according to certain classification standards, classifying them based on attributes such as file type, creation time, last access time, and size, dividing files into active data and archived data. Delete expired data; regularly check stored files and delete expired or no longer needed data, including temporary files, log files, or outdated backup files. Back up important data; before cleanup, ensure that important data has been backed up to other storage media or the cloud to prevent accidental deletion or data loss. Compress archived data; compress large archived data to save storage space. Deduplicate files; detect and delete duplicate files to prevent redundant data. Audit files; record files deleted or modified during cleanup for future tracking and review. Set up a file recovery mechanism; before file cleanup, ensure there is a suitable file recovery mechanism so that accidentally deleted files can be quickly recovered if needed.

[0086] The data acquisition and analysis identifier specifically includes a complete data acquisition identifier and a partial data acquisition identifier. The steps for generating the data acquisition and analysis identifier are as follows:

[0087] The accuracy of the collected file data is the criterion for determining whether the file data has been downloaded correctly, and the degree of file data collection is the criterion for determining whether the file data has been completely collected.

[0088] The file download verification model analyzes the correctness of the collected file data to generate a data correctness label, analyzes the degree of data collection to generate a degree of data collection label, and combines the data correctness label and degree of data collection label in the same file data to generate a data collection analysis identifier for that file data.

[0089] The data accuracy markers include data accuracy markers and data error markers;

[0090] If the download verification process of the file list outputs the correct download result, the file data is marked as correctly collected data.

[0091] If a download error is output during the download verification of the file list, an error flag for the collected data should be generated for that file.

[0092] The acquisition level markers include complete acquisition markers and incomplete acquisition markers;

[0093] If the data collection progress is not 100% during the download and verification of the file list, an incomplete collection marker will be generated for that file.

[0094] If the data collection progress reaches 100% during the download and verification of the file list, a complete collection mark will be generated for that file's data.

[0095] For the same file data, perform tagging analysis. If the same file list has both a correctly collected data tag and a partially collected data tag, a data error tag and a fully collected data tag, or a data error tag and a partially collected data tag, generate a partially collected data tag for that file data.

[0096] For the same file data, perform tagging analysis. If the same file list has both a correct data collection tag and a complete data collection tag, generate a complete data collection identifier for that file data.

[0097] The matching logic for the sending step is as follows:

[0098] Construct a step-matching model, and identify the collection and analysis identifiers in the file data—matching the steps.

[0099] Specifically, when the file download verification model generates a complete acquisition identifier for the file data, the step matching model matches the data encapsulation steps based on the complete acquisition identifier in the file data;

[0100] When the file download verification model generates an incomplete collection identifier for the file data, the step matching model matches the data error step according to the complete collection identifier in the file data.

[0101] The data encapsulation step involves packaging the file data for subsequent steps, while the data error reporting step directly displays the error on the display terminal.

[0102] The data authentication identifier includes a genuine authentication identifier, an error authentication identifier, and a false authentication identifier. The steps for establishing the data authentication identifier are as follows:

[0103] Based on a stream processing computing framework and a distributed message middleware, the file type corresponding to the notification is determined in real time. The file types are audio, video and image. For different types of files, requests are forwarded to different types of anti-counterfeiting engine clusters for processing and to generate initial identification tags. Through the integration and analysis of the initial identification tags, data identification tags are generated for the file data.

[0104] The initial identification markers include audio identification markers, video identification markers, and image identification markers;

[0105] The audio identification markers include genuine audio markers and fake audio markers;

[0106] The audio data is analyzed by an audio authentication engine cluster. If the result is identified as genuine audio, a genuine audio tag is generated for the file data. If the audio data is analyzed by the audio authentication engine cluster and the result is identified as fake audio, a fake audio tag is generated for the file data.

[0107] The video identification markers include real video markers and fake video markers;

[0108] The video authentication engine cluster analyzes the video data. If the result is a genuine video, a genuine video tag is generated for the file data. If the result is a fake video, a fake video tag is generated for the file data.

[0109] The image identification markers include real image markers and fake image markers;

[0110] The image data is analyzed by an image authentication engine cluster. If the result is a real image, a real image label is generated for the file data. If the image data is analyzed by the image authentication engine cluster and the result is a fake image, a fake image label is generated for the file data.

[0111] The audio identification tags, video identification tags, and image identification tags in the file data are integrated and processed.

[0112] When the file data contains only one type of data: audio, video, or image;

[0113] If the file data contains one of the corresponding types of false audio tags, false video tags, or false image tags, then a false identification identifier is generated for the file data.

[0114] If the file data contains one of the corresponding types of audio authenticity markers, video authenticity markers, or image authenticity markers, then an authenticity identification identifier is generated for the file data.

[0115] When the data in this file contains two types of data, it can be classified as either audio and video, audio and image, or video and image.

[0116] If the file data contains both audio and video authenticity markers, audio and image authenticity markers, or video and image authenticity markers, then an authenticity identification identifier is generated for the file data.

[0117] If one or more of the two types of data in the file have a false initial identification mark, then a false identification mark will be generated for the file data.

[0118] When the file contains audio, video, and image data;

[0119] If the file data contains audio authenticity markers, video authenticity markers, and image authenticity markers simultaneously, an authenticity identification identifier is generated for the file data.

[0120] If only one of the three types of data in the file has a false initial identification mark, an error identification mark is generated for the file data.

[0121] If two or more false initial identification markers are present in the three types of data in the file, a false identification marker will be generated for the file data.

[0122] In a preferred embodiment, the identification targets include true identification targets, error identification targets, and false identification targets, and the generation logic of the identification targets is as follows:

[0123] When the same file contains both a complete collection identifier and a genuine identification identifier, a genuine identification target is generated for that text data.

[0124] When the same file contains both a complete acquisition identifier and an error identification identifier, an error identification target is generated for that text data.

[0125] When the same file data contains an incomplete collection identifier and a true identification identifier, an incomplete collection identifier and an error identification identifier, or an incomplete collection identifier and a false identification identifier, a false identification target is generated for that text data.

[0126] The logic for reporting file metadata is as follows:

[0127] If the generated file data contains a genuine identification target, then file metadata is reported; if the generated file data contains a false identification target or an error identification target, then data filtering is performed and an error is displayed on the display terminal.

[0128] The technical effects and advantages provided by the present invention in the above technical solution are as follows:

[0129] 1. Set up a periodic storage cleanup function during the process of generating the file list to regularly clean up the stored files, keep the system clean, reduce storage space waste, improve system performance and data management efficiency, and ensure the security and integrity of file data;

[0130] 2. Monitoring and collecting IDC outbound traffic, and discovering forged audio, video and image data by accessing storage and scheduling engines, and ensuring data security through real-time file monitoring, load balancing in high-concurrency scenarios through distributed message middleware, real-time forwarding and scheduling through streaming computing framework, and real-time reporting through point-to-point reporting;

[0131] 3. During the file data detection process, a preliminary analysis and error reporting are performed through a step matching model. In the subsequent final detection step, a secondary error reporting is performed through an identification target generation module. The dual error reporting on the display terminal enables more accurate real-time monitoring and feedback of file data identification.

[0132] 4. By generating data symbols such as collection and analysis identifiers, data identification identifiers, and identification targets, and through the matching and integration analysis of these symbols, the system can more accurately and intuitively describe the problems that occur during the data monitoring process and the identification status of the file data. Attached Figure Description

[0133] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this invention. For those skilled in the art, other drawings can be obtained based on these drawings.

[0134] Figure 1 This is a flowchart of a storage and scheduling detection method for a forgery identification scenario according to the present invention;

[0135] Figure 2 This is a schematic diagram of a storage and scheduling detection system for a counterfeit identification scenario according to the present invention;

[0136] Figure 3 This is a schematic diagram illustrating the file data acquisition and processing steps of a storage and scheduling detection system for counterfeit identification scenarios according to the present invention.

[0137] Figure 4 This is a flowchart illustrating an example of the file data acquisition and processing steps in a storage and scheduling detection system for counterfeit identification scenarios according to the present invention.

[0138] Figure 5 This is a schematic diagram of the data storage steps of a storage, scheduling and detection system in a counterfeit identification scenario according to the present invention;

[0139] Figure 6 This is a program diagram illustrating the data storage steps of a storage and scheduling detection system for a forgery identification scenario according to the present invention. Detailed Implementation

[0140] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0141] Example 1:

[0142] Please see Figure 1 As shown in this embodiment, a storage and scheduling detection method for a forgery detection scenario includes the following steps:

[0143] S1, Collect and process file data for the specified file to generate a file list, wherein the file data includes file metadata and multimedia data;

[0144] S2, construct a file download verification model, verify the download of the file list, determine the collection results based on the correctness and degree of collection of the file data in the file list, and generate collection analysis identifiers;

[0145] S3, construct a step matching model, select the sending steps by collecting and analyzing identifiers, the sending steps include data encapsulation steps and data error reporting steps;

[0146] S4, store the file data that has been processed in the data encapsulation and sending step;

[0147] S5 forwards the stored file data and performs engine cluster authentication to generate a data authentication identifier;

[0148] S6, Generate identification targets based on data identification identifiers and collection and analysis identifiers, wherein the identification targets include real identification targets and false identification targets;

[0149] S7, perform data filtering or upload metadata based on the identification target.

[0150] The file metadata in the file data specifically describes the data of the file and is stored in the corresponding metadata file. It provides information about the file content, attributes, structure and format, rather than the actual multimedia content. The file metadata provides important information about the file, which can be used for file management, indexing, searching, identification and recognition.

[0151] The multimedia data in the file data specifically refers to actual audio, video, and image data content, mainly perceptible media content, stored in corresponding multimedia files. The composition of the multimedia data depends on different file types. Audio data contains audio sampling information, video data contains a series of consecutive image frames, each frame consisting of the color and position information of pixels, and image data consisting of the color and position information of pixels. Users can play audio, video, or display images by parsing this multimedia data, and can also use the multimedia data for multimedia authentication.

[0152] Reference Figure 3 and Figure 4 The steps for generating the file list are as follows:

[0153] The Python programming language and platform are selected to implement the data monitoring function; the file directory to be monitored is determined, which can be a folder on the local computer or a shared folder on the network; the file monitoring interface provided by the programming language is used to monitor the specified file directory. The interfaces provided by different programming languages ​​and platforms may vary, but they generally support monitoring events such as folder creation, deletion, and modification; metadata files are parsed: when a new metadata file is detected, it is checked whether the filename does not end with TMP. If it does not end with TMP, it means that this is a metadata file that needs to be parsed; for the file data that needs to be parsed, the parsing operation is performed to extract the file list information. The file metadata contains relevant information about the file where the multimedia data to be collected is located, such as the filename and file path;

[0154] Based on the extracted file list information, multimedia data, specifically audio, video, and image data, is collected through file paths and other information. The collected multimedia data is then stored, and the multimedia files are saved to the local disk or uploaded to a network storage service. The above steps are put into a loop to continuously monitor the file directory and collect the number of files.

[0155] Establish a file cleanup cycle; the cleanup frequency can be adjusted based on data growth rate and storage needs. Organize and archive stored files according to certain classification standards, classifying them based on attributes such as file type, creation time, last access time, and size, categorizing files into active data and archived data. Delete expired data; periodically check stored files and delete expired or no longer needed data, including temporary files, log files, or outdated backup files. Back up important data; before cleanup, ensure important data has been backed up to other storage media or the cloud to prevent accidental deletion or data loss. Compress archived data; compress large archived data to save storage space. Deduplicate files; detect and delete duplicate files to prevent redundant data. Audit files; record files deleted or modified during cleanup for future tracking and review. Set up a file recovery mechanism; before file cleanup, ensure a suitable file recovery mechanism is in place so that accidentally deleted files can be quickly recovered if needed.

[0156] By setting up a periodic storage cleanup function during the process of generating the file list, the stored files are cleaned regularly to keep the system tidy, reduce storage space waste, improve system performance and data management efficiency, and ensure the security and integrity of file data.

[0157] It should be noted that the specific implementation details may vary depending on the programming language and platform used. You need to refer to the documentation of the relevant programming language and platform to understand how to implement file monitoring and file operations. At the same time, you need to pay attention to handling exceptions during real-time monitoring and file operations to ensure the stability and reliability of the program.

[0158] The data acquisition and analysis identifier specifically includes a complete data acquisition identifier and a partial data acquisition identifier. The steps for generating the data acquisition and analysis identifier are as follows:

[0159] The accuracy of the collected file data is the criterion for determining whether the file data has been downloaded correctly, and the degree of file data collection is the criterion for determining whether the file data has been completely collected.

[0160] The file download verification model analyzes the correctness of the collected file data to generate a data correctness label, analyzes the degree of data collection to generate a degree of data collection label, and combines the data correctness label and degree of data collection label in the same file data to generate a data collection analysis identifier for that file data.

[0161] The data accuracy markers include data accuracy markers and data error markers;

[0162] If the download verification process of the file list outputs the correct download result, the file data is marked as correctly collected data.

[0163] If a download error is output during the download verification of the file list, an error flag for the collected data should be generated for that file.

[0164] The acquisition level markers include complete acquisition markers and incomplete acquisition markers;

[0165] If the data collection progress is not 100% during the download and verification of the file list, an incomplete collection marker will be generated for that file.

[0166] If the data collection progress reaches 100% during the download and verification of the file list, a complete collection mark will be generated for that file's data.

[0167] For the same file data, perform tagging analysis. If the same file list has both a correctly collected data tag and a partially collected data tag, a data error tag and a fully collected data tag, or a data error tag and a partially collected data tag, generate a partially collected data tag for that file data.

[0168] For the same file data, perform tagging analysis. If the same file list has both a correct data collection tag and a complete data collection tag, generate a complete data collection identifier for that file data.

[0169] Data with a fully collected identifier is more authentic than data with a partially collected identifier.

[0170] The steps for constructing the file download verification model are as follows:

[0171] Get File List: Retrieves a list of files from a specified data source, including metadata information such as file name, path, size, and modification time, as well as information related to multimedia data (audio, video, and images);

[0172] File download: Based on the path and filename in the file list, download the file from the data source to the local storage device via FTP protocol or other appropriate download methods;

[0173] File verification: Verify the downloaded files to ensure their integrity and correctness. Calculate the hash value of the file using a hash algorithm (such as MD5 or SHA-256) and compare it with the hash value in the file list to verify whether the file was downloaded correctly. Generate a data collection correctness identifier based on the degree of correct file download.

[0174] Multimedia data processing: Encoding conversion, upsampling / downsampling rate adjustment, and other processing of downloaded multimedia data (audio, video, images) to meet the needs of subsequent analysis;

[0175] Collection Degree Determination: Based on the metadata information in the file list and the processing results of multimedia data, the collection degree is determined. A collection degree marker is generated based on the degree of complete collection. Specifically, the collection degree is evaluated based on indicators such as file integrity, correctness, and multimedia data quality.

[0176] Matching analysis generates acquisition analysis identifiers: The acquisition degree marker and the acquisition data correctness marker are combined and analyzed to generate acquisition analysis identifiers; the acquisition degree and data correctness markers are analyzed through matching rules to obtain complete acquisition identifiers and incomplete acquisition identifiers.

[0177] The matching logic for the sending step is as follows:

[0178] Construct a step-matching model, and identify the collection and analysis identifiers in the file data—matching the steps.

[0179] Specifically, when the file download verification model generates a complete acquisition identifier for the file data, the step matching model matches the data encapsulation steps based on the complete acquisition identifier in the file data;

[0180] When the file download verification model generates an incomplete collection identifier for the file data, the step matching model matches the data error step according to the complete collection identifier in the file data.

[0181] The data encapsulation step involves packaging the file data for subsequent steps, while the data error reporting step directly displays the error on the display terminal.

[0182] The specific construction steps of the step matching model are as follows:

[0183] Data preparation: The data collection and analysis identifier is used as the target variable, and the data encapsulation steps and data error reporting steps are used as feature variables. These are organized into a dataset. Each sample represents a data collection task, which includes the data encapsulation steps and data error reporting steps of the task, as well as the data collection and analysis identifier of the task.

[0184] Feature engineering: Perform feature engineering on the dataset, which may include feature selection and data preprocessing to ensure that the data format is suitable for model training.

[0185] Build a decision tree model: Use the prepared dataset to train the decision tree model. The decision tree model will determine the category of the target variable based on the value of the feature variable, that is, collect and analyze the label.

[0186] Model evaluation: The trained decision tree model is evaluated using a test set, and metrics such as accuracy, precision, and recall are calculated to ensure the model's reliability and generalization ability.

[0187] Model application: Input the data encapsulation and data error reporting steps of the new acquisition task into the trained decision tree model. The model will give an acquisition analysis identifier based on the feature value, and select the corresponding sending step, i.e., the data encapsulation step or the data error reporting step, based on this identifier.

[0188] It should be noted that the data encapsulation step is the process of packaging or encoding raw data into a specific format to facilitate transmission, storage or further processing. Specifically, the data encapsulation step may include data encoding, data compression, data encryption, data packaging, adding data header information, error detection and verification, and data format conversion.

[0189] The data encapsulation step is to process and organize raw data to enable more efficient and reliable data transmission and management during network transmission, storage, or processing. The specific data encapsulation steps will vary depending on the application scenario and requirements.

[0190] Reference Figure 4 and Figure 6 The data storage steps are as follows:

[0191] When the file data involved contains multimedia data and file metadata, where the multimedia data specifically includes audio, video, and image data, the file data is encoded and converted using the URL via the FTP protocol, and the sampling rate is adjusted before being temporarily stored locally. The specific steps are as follows:

[0192] Determine FTP server information: Obtain login information such as the FTP server's address, port number, username, and password in order to connect to the server via the FTP protocol;

[0193] Establishing an FTP connection: Use an FTP client library or tool to establish a connection with the FTP server via the FTP protocol. This can be done using existing FTP client libraries (such as Python's ftplib or Java's FTPClient) or FTP client software (such as FileZilla; here, we specifically use Python's ftplib).

[0194] List multimedia files and metadata files on the FTP server: Through the FTP connection, list the media files and their corresponding metadata files on the FTP server so that you can select the files to pull. You can use the FTP command LIST or NLST to get the file list.

[0195] Download multimedia files and metadata files: Select the multimedia files and corresponding metadata files you want to download, and use the FTP protocol to download these files from the server to your local computer. You can use the FTP command RETR to download the files;

[0196] Perform audio and video encoding conversion: Use appropriate audio / video processing libraries or tools (such as FFmpeg) to convert the encoding format of the downloaded audio and video files for subsequent processing or analysis;

[0197] Image processing: Processing downloaded image files, such as cropping, resizing, applying filters, etc., to meet specific needs;

[0198] Sampling rate processing: Performing upsampling and downsampling operations on audio files to adjust the audio's sampling rate;

[0199] Temporarily store processed files: Save the processed multimedia files and metadata files to a local temporary directory for subsequent processing or analysis. Use a temporary folder to store these files, and you can choose to delete the temporary files after processing is complete.

[0200] Close FTP connection: After completing the retrieval and processing of media files, close the connection with the FTP server to release resources.

[0201] The data authentication identifier includes a genuine authentication identifier, an error authentication identifier, and a fake authentication identifier. The steps for generating the data authentication identifier are as follows:

[0202] Based on a stream processing computing framework and a distributed message middleware, the file type corresponding to the notification is determined in real time. The file types are audio, video and image. For different types of files, requests are forwarded to different types of anti-counterfeiting engine clusters for processing and to generate initial identification tags. Through the integration and analysis of the initial identification tags, data identification tags are generated for the file data.

[0203] The initial identification markers include audio identification markers, video identification markers, and image identification markers;

[0204] The audio identification markers include genuine audio markers and fake audio markers;

[0205] The audio data is analyzed by an audio authentication engine cluster. If the result is identified as genuine audio, a genuine audio tag is generated for the file data. If the audio data is analyzed by the audio authentication engine cluster and the result is identified as fake audio, a fake audio tag is generated for the file data.

[0206] The video identification markers include real video markers and fake video markers;

[0207] The video authentication engine cluster analyzes the video data. If the result is a genuine video, a genuine video tag is generated for the file data. If the result is a fake video, a fake video tag is generated for the file data.

[0208] The image identification markers include real image markers and fake image markers;

[0209] The image data is analyzed by an image authentication engine cluster. If the result is a real image, a real image label is generated for the file data. If the image data is analyzed by the image authentication engine cluster and the result is a fake image, a fake image label is generated for the file data.

[0210] The audio identification tags, video identification tags, and image identification tags in the file data are integrated and processed.

[0211] When the file data contains only one type of data: audio, video, or image;

[0212] If the file data contains one of the corresponding types of false audio tags, false video tags, or false image tags, then a false identification identifier is generated for the file data.

[0213] If the file data contains one of the corresponding types of audio authenticity markers, video authenticity markers, or image authenticity markers, then an authenticity identification identifier is generated for the file data.

[0214] When the data in this file contains two types of data, it can be classified as either audio and video, audio and image, or video and image.

[0215] If the file data contains both audio and video authenticity markers, audio and image authenticity markers, or video and image authenticity markers, then an authenticity identification identifier is generated for the file data.

[0216] If one or more of the two types of data in the file have a false initial identification mark, then a false identification mark will be generated for the file data.

[0217] When the file contains audio, video, and image data;

[0218] If the file data contains audio authenticity markers, video authenticity markers, and image authenticity markers simultaneously, an authenticity identification identifier is generated for the file data.

[0219] If only one of the three types of data in the file has a false initial identification mark, an error identification mark is generated for the file data.

[0220] If two or more false initial identification markers are present in the three types of data in the file, a false identification marker will be generated for the file data.

[0221] Among them, the authenticity of the genuine identification mark is higher than that of the error identification mark, and so on;

[0222] Based on a stream processing computing framework and distributed message middleware, the specific process for forwarding requests to different types of file authentication engine clusters to generate data authentication identifiers for different file types is as follows:

[0223] Data reception and parsing: The stream processing computing framework is used to receive real-time file data, parse the data, and extract file metadata information and multimedia data;

[0224] File type identification and routing: Based on the parsed file data, determine whether the file type is audio, video, or image. Depending on the file type, distribute the audio, video, and image data to the corresponding authentication engine clusters for processing. The corresponding authentication engine clusters include audio authentication engine clusters, video authentication engine clusters, and image authentication engine clusters. Distributed message middleware can be used to achieve real-time data forwarding and routing.

[0225] Distributed message middleware monitoring: By monitoring the message queue backlog in the distributed message middleware, the data processing status between nodes can be obtained;

[0226] Data processing pressure balancing: The data processing rate is dynamically adjusted based on the backlog in the distributed message middleware and the data processing pressure of the engine cluster nodes. When the data processing pressure of a certain engine cluster node is too high or the message backlog is serious, the data processing rate of that node can be reduced to maintain the stable operation and high concurrency capability of the entire cluster.

[0227] Real-time monitoring and feedback: Continuously monitor data processing status and provide real-time feedback to the system administrator so that system configuration and resource allocation can be adjusted in a timely manner to cope with constantly changing load conditions;

[0228] Data processing result merging: The analysis results processed by each anti-counterfeiting engine cluster are merged and analyzed to generate the final data authentication identifier.

[0229] It should be noted that the above stream processing computing framework can use Apache Flink, but is not limited to Apache Flink, and the distributed message middleware can use Apache Kafka, but is not limited to Apache Kafka.

[0230] The identification targets include true identification targets, error identification targets, and false identification targets. The generation logic of the identification targets is as follows:

[0231] When the same file contains both a complete collection identifier and a genuine identification identifier, a genuine identification target is generated for that text data.

[0232] When the same file contains both a complete acquisition identifier and an error identification identifier, an error identification target is generated for that text data.

[0233] When the same file data contains an incomplete collection identifier and a true identification identifier, an incomplete collection identifier and an error identification identifier, or an incomplete collection identifier and a false identification identifier, a false identification target is generated for that text data.

[0234] The false identification target is more likely to be forged than the error identification target, and so on. When an error identification target is found, the above steps are repeated.

[0235] The logic for reporting file metadata is as follows:

[0236] If the generated file data contains a genuine identification target, then file metadata is reported; if the generated file data contains a false identification target or an error identification target, then data filtering is performed and an error is displayed on the display terminal.

[0237] Example 2:

[0238] like Figure 2 As shown in this embodiment, a storage and scheduling detection system for a forgery identification scenario includes:

[0239] File data analysis module: used to collect and process file data from specified files and generate a file list;

[0240] Data Acquisition Analysis Identifier Generation Module: Used to determine the accuracy and extent of data acquisition based on the file data in the file list and generate data acquisition analysis identifiers;

[0241] Sending Step Matching Module: Used to select sending steps for the data collection and analysis identifier, wherein the sending steps include data encapsulation steps and data error reporting steps;

[0242] File data storage module: Used to store the file data that has been processed during the data encapsulation and sending steps;

[0243] Data identification identifier generation module: used to forward stored file data and perform engine cluster identification, and generate data identification identifiers;

[0244] Target generation module: Generates targets for identification based on data identification identifiers and data collection and analysis identifiers;

[0245] File data reporting module: Reports file metadata or filters data based on the results generated from the identification target.

[0246] The file metadata in the file data specifically describes the data of the file and is stored in the corresponding metadata file. It provides information about the file content, attributes, structure and format, rather than the actual multimedia content. The file metadata provides important information about the file, which can be used for file management, indexing, searching, identification and recognition.

[0247] The multimedia data in the file data specifically refers to actual audio, video, and image data content, mainly perceptible media content, stored in corresponding multimedia files. The composition of the multimedia data depends on different file types. Audio data contains audio sampling information, video data contains a series of consecutive image frames, each frame consisting of the color and position information of pixels, and image data consisting of the color and position information of pixels. Users can play audio, video, or display images by parsing this multimedia data, and can also use the multimedia data for multimedia authentication.

[0248] The steps for generating the file list are as follows:

[0249] Choosing a programming language and platform: Select Python as the programming language and platform to implement the data monitoring function;

[0250] Set file directory: Determine the file directory to be monitored. The file directory can be a folder on the local computer or a shared folder on the network.

[0251] File monitoring: Use the file monitoring interface provided by the programming language to monitor the specified file directory. The interfaces provided by different programming languages ​​and platforms may be different, but they usually support monitoring events such as folder creation, deletion, and modification.

[0252] Parse metadata files: When a new metadata file is detected, check if the filename does not end with TMP. If it does not end with TMP, it means that this is a metadata file that needs to be parsed.

[0253] Extracting the file list: For the file data that needs to be parsed, the parsing operation is performed to extract the file list information. The file metadata contains relevant information about the file where the multimedia data to be collected is located, such as the file name and file path.

[0254] Multimedia data collection: Based on the extracted file list information, multimedia data, specifically audio, video, and image data, is collected through file paths and other information.

[0255] Storing multimedia data: The collected multimedia data is stored, and the multimedia files are saved to the local disk or uploaded to a network storage service.

[0256] Loop monitoring: Put the above steps into a loop to continuously monitor the file directory and collect the number of files;

[0257] Regular file cleanup: Establish a file cleanup cycle; the cleanup frequency can be adjusted based on data growth rate and storage needs. Organize and archive stored files according to certain classification standards, classifying them based on attributes such as file type, creation time, last access time, and size, dividing files into active data and archived data. Delete expired data; regularly check stored files and delete expired or no longer needed data, including temporary files, log files, or outdated backup files. Back up important data; before cleanup, ensure that important data has been backed up to other storage media or the cloud to prevent accidental deletion or data loss. Compress archived data; compress large archived data to save storage space. Deduplicate files; detect and delete duplicate files to prevent redundant data. Audit files; record files deleted or modified during cleanup for future tracking and review. Set up a file recovery mechanism; before file cleanup, ensure there is a suitable file recovery mechanism so that accidentally deleted files can be quickly recovered if needed.

[0258] The data acquisition and analysis identifier specifically includes a complete data acquisition identifier and a partial data acquisition identifier. The steps for generating the data acquisition and analysis identifier are as follows:

[0259] The accuracy of the collected file data is the criterion for determining whether the file data has been downloaded correctly, and the degree of file data collection is the criterion for determining whether the file data has been completely collected.

[0260] The file download verification model analyzes the correctness of the collected file data to generate a data correctness label, analyzes the degree of data collection to generate a degree of data collection label, and combines the data correctness label and degree of data collection label in the same file data to generate a data collection analysis identifier for that file data.

[0261] The data accuracy markers include data accuracy markers and data error markers;

[0262] If the download verification process of the file list outputs the correct download result, the file data is marked as correctly collected data.

[0263] If a download error is output during the download verification of the file list, an error flag for the collected data should be generated for that file.

[0264] The acquisition level markers include complete acquisition markers and incomplete acquisition markers;

[0265] If the data collection progress is not 100% during the download and verification of the file list, an incomplete collection marker will be generated for that file.

[0266] If the data collection progress reaches 100% during the download and verification of the file list, a complete collection mark will be generated for that file's data.

[0267] For the same file data, perform tagging analysis. If the same file list has both a correctly collected data tag and a partially collected data tag, a data error tag and a fully collected data tag, or a data error tag and a partially collected data tag, generate a partially collected data tag for that file data.

[0268] For the same file data, perform tagging analysis. If the same file list has both a correct data collection tag and a complete data collection tag, generate a complete data collection identifier for that file data.

[0269] The matching logic for the sending step is as follows:

[0270] Construct a step-matching model, and identify the collection and analysis identifiers in the file data—matching the steps.

[0271] Specifically, when the file download verification model generates a complete acquisition identifier for the file data, the step matching model matches the data encapsulation steps based on the complete acquisition identifier in the file data;

[0272] When the file download verification model generates an incomplete collection identifier for the file data, the step matching model matches the data error step according to the complete collection identifier in the file data.

[0273] The data encapsulation step involves packaging the file data for subsequent steps, while the data error reporting step directly displays the error on the display terminal.

[0274] The steps for generating the data identification identifier are as follows:

[0275] Based on a stream processing computing framework and a distributed message middleware, the file type corresponding to the notification is determined in real time. The file types are audio, video and image. For different types of files, requests are forwarded to different types of anti-counterfeiting engine clusters for processing and to generate initial identification tags. Through the integration and analysis of the initial identification tags, data identification tags are generated for the file data.

[0276] The initial identification markers include audio identification markers, video identification markers, and image identification markers;

[0277] The audio identification markers include genuine audio markers and fake audio markers;

[0278] The audio data is analyzed by an audio authentication engine cluster. If the result is identified as genuine audio, a genuine audio tag is generated for the file data. If the audio data is analyzed by the audio authentication engine cluster and the result is identified as fake audio, a fake audio tag is generated for the file data.

[0279] The video identification markers include real video markers and fake video markers;

[0280] The video authentication engine cluster analyzes the video data. If the result is a genuine video, a genuine video tag is generated for the file data. If the result is a fake video, a fake video tag is generated for the file data.

[0281] The image identification markers include real image markers and fake image markers;

[0282] The image data is analyzed by an image authentication engine cluster. If the result is a real image, a real image label is generated for the file data. If the image data is analyzed by the image authentication engine cluster and the result is a fake image, a fake image label is generated for the file data.

[0283] The audio identification tags, video identification tags, and image identification tags in the file data are integrated and processed.

[0284] When the file data contains only one type of data: audio, video, or image;

[0285] If the file data contains one of the corresponding types of false audio tags, false video tags, or false image tags, then a false identification identifier is generated for the file data.

[0286] If the file data contains one of the corresponding types of audio authenticity markers, video authenticity markers, or image authenticity markers, then an authenticity identification identifier is generated for the file data.

[0287] When the data in this file contains two types of data, it can be classified as either audio and video, audio and image, or video and image.

[0288] If the file data contains both audio and video authenticity markers, audio and image authenticity markers, or video and image authenticity markers, then an authenticity identification identifier is generated for the file data.

[0289] If one or more of the two types of data in the file have a false initial identification mark, then a false identification mark will be generated for the file data.

[0290] When the file contains audio, video, and image data;

[0291] If the file data contains audio authenticity markers, video authenticity markers, and image authenticity markers simultaneously, an authenticity identification identifier is generated for the file data.

[0292] If only one of the three types of data in the file has a false initial identification mark, an error identification mark is generated for the file data.

[0293] If two or more false initial identification markers are present in the three types of data in the file, a false identification marker will be generated for the file data.

[0294] Among them, the authenticity of the genuine identification mark is higher than that of the error identification mark, and so on.

[0295] The identification targets include true identification targets, error identification targets, and false identification targets. The generation logic of the identification targets is as follows:

[0296] When the same file contains both a complete collection identifier and a genuine identification identifier, a genuine identification target is generated for that text data.

[0297] When the same file contains both a complete acquisition identifier and an error identification identifier, an error identification target is generated for that text data.

[0298] When the same file data contains an incomplete collection identifier and a true identification identifier, an incomplete collection identifier and an error identification identifier, or an incomplete collection identifier and a false identification identifier, a false identification target is generated for that text data.

[0299] The false identification target is more forged than the error identification target, and so on. When an error identification target is found, the above steps are repeated.

[0300] The logic for reporting file metadata is as follows:

[0301] If the generated file data contains a genuine identification target, then file metadata is reported; if the generated file data contains a false identification target or an error identification target, then data filtering is performed and an error is displayed on the display terminal.

[0302] The above formulas are all dimensionless calculations. The formulas are derived from software simulations based on a large amount of collected data to obtain the most recent real-world results. The preset parameters in the formulas are set by those skilled in the art according to the actual situation.

[0303] It should be understood that the term "and / or" in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. A and B can be singular or plural. Additionally, the character " / " in this article generally indicates an "or" relationship between the preceding and following related objects, but it can also represent an "and / or" relationship. Please refer to the context for a more accurate understanding.

[0304] In this application, "at least one" means one or more, and "more than one" means two or more. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or multiple items. For example, at least one of a, b, or c can mean: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple.

[0305] It should be understood that in the various embodiments of this application, the order of the above-mentioned processes does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.

[0306] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0307] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0308] In addition, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0309] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A storage, scheduling detection method in a forgery identification scene, characterized in that: The method includes the following steps: S1, Collect and process file data for the specified file to generate a file list, wherein the file data includes file metadata and multimedia data; S2, construct a file download verification model, verify the download of the file list, determine the collection results based on the correctness and degree of collection of the file data in the file list, and generate collection analysis identifiers; S3, construct a step matching model, select the sending steps by collecting and analyzing identifiers, the sending steps include data encapsulation steps and data error reporting steps; S4, store the file data that has been processed in the data encapsulation and sending step; S5, forward the stored file data and perform engine cluster authentication to generate a data authentication identifier. The data authentication identifier includes a genuine authentication identifier, an error authentication identifier, and a false authentication identifier. The steps for generating the data authentication identifier are as follows: Based on a stream processing computing framework and a distributed message middleware, the file type corresponding to the notification is determined in real time. The file types include audio, video and images. For different types of files, requests are forwarded to different types of anti-counterfeiting engine clusters for processing and to generate initial identification tags. Through the integration and analysis of the initial identification tags, data identification tags are generated for the file data. The initial identification markers include audio identification markers, video identification markers, and image identification markers; The audio identification markers include genuine audio markers and fake audio markers; The audio data is analyzed by an audio authentication engine cluster. If the result is identified as genuine audio, a genuine audio tag is generated for the file data. If the audio data is analyzed by an audio authentication engine cluster and the result is identified as fake audio, a fake audio tag is generated for the file data. The video identification markers include real video markers and fake video markers; The video authentication engine cluster analyzes the video data. If the result is a real video, a real video tag is generated for the file data. If the video authentication engine cluster analyzes the video data and the result is a fake video, a fake video tag is generated for the file data. The image identification markers include real image markers and fake image markers; The image data is analyzed by an image authentication engine cluster. If the result is a real image, a real image tag is generated for the file data. If the image data is analyzed by an image authentication engine cluster and the result is a fake image, a fake image tag is generated for the file data. The audio identification tags, video identification tags, and image identification tags in the file data are integrated and processed. When the file data contains only one type of data: audio, video, or image; If the file data contains one of the corresponding types of false audio tags, false video tags, or false image tags, then a false identification identifier is generated for the file data. If the file data contains one of the corresponding types of audio authenticity markers, video authenticity markers, or image authenticity markers, then an authenticity identification identifier is generated for the file data. When the file contains two types of data, it is classified as either audio and video, audio and image, or video and image. If the file data contains both audio and video authenticity markers, audio and image authenticity markers, or video and image authenticity markers, then an authenticity identification identifier is generated for the file data. If one or more of the two types of data in the file have a false initial identification mark, then a false identification mark will be generated for the file data. When the file contains audio, video, and image data; If the file data contains audio authenticity markers, video authenticity markers, and image authenticity markers simultaneously, an authenticity identification identifier is generated for the file data. If only one of the three types of data in the file has a false initial identification mark, an error identification mark is generated for the file data. If the three types of data in the file have two or more false initial identification markers, a false identification marker will be generated for the file data. S6, generate identification targets based on data identification identifiers and collection and analysis identifiers. The identification targets include true identification targets, error identification targets, and false identification targets. The generation logic for the identification targets is as follows: When the same data file contains both a complete collection identifier and a genuine identification identifier, a genuine identification target is generated for that data file. When the same data file contains both a complete acquisition identifier and an error identification identifier, an error identification target is generated for that data file. When the same file data contains an incomplete acquisition identifier and a true identification identifier, an incomplete acquisition identifier and an error identification identifier, or an incomplete acquisition identifier and a false identification identifier, a false identification target is generated for that file data. If the generated file data contains a genuine identification target, then the file metadata is reported; if the generated file data contains a false identification target or an erroneous identification target, then data filtering is performed and an error is displayed on the display terminal. S7, perform data filtering or upload metadata based on the identification target.

2. The storage, scheduling and detection method in a forgery identification scene according to claim 1, characterized in that, The file metadata in the file data specifically describes the data of the file and is stored in the corresponding metadata file. It provides information about the file content, attributes, structure and format, rather than the actual multimedia content. The file metadata provides important information about the file, which can be used for file management, indexing, searching, identification and recognition. The multimedia data in the file data specifically refers to the actual audio, video, and image data content, which is stored in the corresponding multimedia files. The composition of the multimedia data depends on the different file types. The audio data contains audio sampling information, the video data contains a series of consecutive image frames, each frame is composed of the color and position information of the pixels, and the image data is composed of the color and position information of the pixels.

3. The storage and scheduling detection method in a forgery identification scenario according to claim 2, characterized in that, The steps for generating the file list are as follows: The Python programming language and platform are selected to implement the data monitoring function; the file directory to be monitored is determined, which can be a folder on the local computer or a shared folder on the network; the file monitoring interface provided by the programming language is used to monitor the specified file directory. Different programming languages ​​and platforms provide different interfaces, but all support monitoring the creation, deletion, and modification events of folders; metadata files are parsed: when a new metadata file is detected, it is checked whether the filename does not end with TMP. If it does not end with TMP, it means that this is a metadata file that needs to be parsed. For the file data that needs to be parsed, a parsing operation is performed to extract the file list information. The file metadata contains relevant information about the file where the multimedia data to be collected is located, including the file name and file path. Based on the extracted file list information, multimedia data, specifically audio, video, and image data, is collected through the file path information. The collected multimedia data is then stored, and the multimedia files are saved to the local disk or uploaded to a network storage service. The above steps are put into a loop to continuously monitor the file directory and collect the number of files; Establish a file cleanup cycle, and adjust the cleanup frequency based on the data growth rate and storage needs; The stored files are organized and archived according to classification criteria. They are classified according to file type, creation time, last access time, and size attributes, and the files are divided into active data and archived data. Expired data is deleted. The stored files are checked regularly, and expired or no longer needed data is deleted. The expired data includes some temporary files, log files, or outdated backup files. Back up important data. Before performing the cleaning process, ensure that important data has been backed up to other storage media or the cloud to prevent accidental deletion or data loss. Compress archived data to save storage space, especially for large archived data. Perform file deduplication, detect and delete duplicate files to prevent redundant data; perform file auditing, record files deleted or modified during the cleaning process for future tracking and review. Configure a file recovery mechanism before performing file cleaning.

4. The storage, scheduling and detection method in a forgery identification scene according to claim 3, characterized in that, The data acquisition and analysis identifier specifically includes a complete data acquisition identifier and a partial data acquisition identifier. The steps for generating the data acquisition and analysis identifier are as follows: The accuracy of the collected file data is the criterion for determining whether the file data has been downloaded correctly, and the degree of file data collection is the criterion for determining whether the file data has been completely collected. The file download verification model analyzes the correctness of the collected file data to generate a data correctness label, analyzes the degree of data collection to generate a degree of data collection label, and combines the data correctness label and degree of data collection label in the same file data to generate a data collection analysis identifier for that file data. The data accuracy markers include data accuracy markers and data error markers; If the download verification process of the file list outputs the correct download result, the file data is marked as correctly collected data. If a download error is output during the download verification of the file list, an error flag for the collected data should be generated for that file. The acquisition level markers include complete acquisition markers and incomplete acquisition markers; If the data collection progress is not 100% during the download and verification of the file list, an incomplete collection marker will be generated for that file. If the data collection progress reaches 100% during the download and verification of the file list, a complete collection mark will be generated for that file's data. For the same file data, perform tagging analysis. If the same file list has both a correctly collected data tag and a partially collected data tag, a data error tag and a fully collected data tag, or a data error tag and a partially collected data tag, generate a partially collected data tag for that file data. For the same file data, perform tagging analysis. If the same file list has both a correct data collection tag and a complete data collection tag, generate a complete data collection identifier for that file data. The matching logic for the sending step is as follows: Construct a step-matching model, and identify the collection and analysis identifiers in the file data—matching the steps. Specifically, when the file download verification model generates a complete acquisition identifier for the file data, the step matching model matches the data encapsulation steps based on the complete acquisition identifier in the file data. When the file download verification model generates an incomplete collection identifier for the file data, the step matching model matches the data error step according to the complete collection identifier in the file data. The data encapsulation step involves packaging the file data for subsequent steps, while the data error reporting step directly displays the error on the display terminal.

5. The storage, dispatch detection method in a forgery identification scene according to claim 4, characterized in that, The data storage steps are as follows: When the file data involved contains multimedia data and file metadata, the multimedia data specifically includes audio, video, and image data. The file data is encoded and converted based on the URL via the FTP protocol, and the sampling rate is adjusted before being temporarily stored locally.

6. A storage, dispatching and detection system in a forgery identification scenario, for implementing the method according to any one of claims 1 to 5, characterized in that, The specific modules include: File data analysis module: used to collect and process file data from specified files and generate a file list; Data Acquisition Analysis Identifier Generation Module: Used to determine the accuracy and extent of data acquisition based on the file data in the file list and generate data acquisition analysis identifiers; Sending Step Matching Module: Used to select sending steps for the data collection and analysis identifier, wherein the sending steps include data encapsulation steps and data error reporting steps; File data storage module: Used to store the file data processed during the data encapsulation and sending steps; Data identification identifier generation module: used to forward stored file data and perform engine cluster identification, and generate data identification identifiers; Target generation module: Generates targets for identification based on data identification identifiers and data collection and analysis identifiers; File data reporting module: Reports file metadata or filters data based on the results generated from the identification target.