Tamper-proof anchoring mechanism-based streaming media automatic forensics method and system
By extracting cross-temporal pathological features of streaming media and generating physical hash fingerprints, combined with the mandatory constraints and closed-loop verification of a large language model, the problem of misjudgment of fault attribution in streaming media communication is solved, and a tamper-proof evidence chain is generated, achieving efficient and accurate fault diagnosis and evidence collection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHENGDOU HUAQIYUN TECH CO LTD
- Filing Date
- 2026-05-06
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies cannot accurately distinguish between network physical congestion and device encoder jamming in streaming media communication, and large language model diagnosis lacks objective evidence support, leading to misjudgments and artificial intelligence illusions.
By extracting cross-temporal pathological features spanning the transport and application layers, an irreversible physical hash fingerprint is generated. This fingerprint is then used to enforce constraints on evidence citations before reasoning in the large language model. Combined with closed-loop reverse verification, a tamper-proof evidence chain is generated.
It enables accurate attribution of streaming media failures, generates a solid chain of electronic evidence, avoids misjudgments, and improves diagnostic efficiency and the objectivity of evidence.
Smart Images

Figure CN122247831A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of streaming media technology, and in particular to an automatic evidence collection method and system for streaming media based on an anti-tampering anchoring mechanism. Background Technology
[0002] In modern IoT, security monitoring, and distributed microservice architectures, streaming media communication based on underlying network protocols such as RTSP, RTMP, and RTP is widely used and has become a key foundation for cross-device and cross-vendor collaboration. When network communication anomalies occur, such as video stream stuttering, image freezing, or device crashes, accurately determining the attribution of fault responsibility—for example, distinguishing between problems caused by excessive client-side streaming concurrency and anomalies caused by server-side encoder resource exhaustion—has become a pressing challenge in the operations and maintenance field.
[0003] Currently, existing technical methods for addressing such streaming media communication disputes and automated fault attribution mainly suffer from the following defects and shortcomings: I. Existing network troubleshooting tools lack the ability to perform "cross-protocol layer" pathological analysis specific to streaming media. Traditional network analysis tools, such as Wireshark or general AIOps systems, primarily focus on general metrics at the transport layer, such as packet loss rate and retransmission rate. However, in complex streaming media scenarios, the true root cause of failures often lies deep within the cross-timing relationships between the transport and application layers.
[0004] Existing tools cannot automatically establish a joint feature matrix between "TCP receive window slippage curve" and "RTP sequence number jump fault" or "RTCP control message timestamp jitter". Therefore, it is difficult to accurately distinguish between "network physical congestion" and "device encoder jamming", two very different types of faults, which can easily lead to misjudgment when assigning supplier responsibility.
[0005] Second, the analysis method based on large language models suffers from the "illusion" defect and lacks valid supporting evidence. In recent years, the industry has begun to explore the use of large language models or intelligent agents in the analysis of network packet capture data. For example, PCAP can be converted into text by parsing scripts and then input into a large language model for question answering. However, large language models are essentially probabilistic generative models, which are prone to "artificial intelligence illusion" when processing massive amounts of low-level signaling, i.e., fabricating non-existent network interruption flags or TCP reset events.
[0006] In B2B business disputes or legal arbitration scenarios, diagnostic conclusions based solely on semantic probability lack any evidentiary value due to the absence of a mandatory anchoring and verification mechanism with objective raw data. Existing technologies generally lack a low-level mechanism for mandatory locking and reverse verification between the semantic reasoning conclusions of large models and the original binary network physical frames.
[0007] In summary, there is an urgent need in this field for an automatic forensics method and system for streaming media based on an anti-tampering anchoring mechanism to address the technical problem that existing network fault analysis methods, whether traditional tools or conventional large-scale model applications, cannot achieve a balance between the depth of extraction of underlying pathological features of streaming media, diagnostic efficiency, and objectivity of evidence. Summary of the Invention
[0008] The purpose of this invention is to overcome the shortcomings of the prior art and provide an automated evidence collection scheme that can deeply extract the unique state characteristics of streaming media and fundamentally eliminate the illusion of artificial intelligence through the underlying physical fingerprint anchoring mechanism, thereby generating a conclusive electronic evidence chain with tamper-proof effectiveness.
[0009] To achieve the above objectives, this application proposes an automatic forensics method for streaming media based on an anti-tampering anchoring mechanism, comprising the following steps: Data ingestion and feature extraction steps: Receive and extract cross-temporal pathological features spanning the transport layer and application layer from the raw network streaming media data packets; wherein, the cross-temporal pathological features include at least a joint feature matrix between the transport layer connection status flag and the application layer media stream sequence number jump; Physical hash fingerprint generation and mapping steps: For each key signaling used to locate anomalies in the cross-temporal pathological features, a unique and irreversible physical hash fingerprint is generated based on one or more underlying physical attributes of the original network packet where the key signaling is located, using a one-way hash algorithm. A secure mapping table is then established locally from the physical hash fingerprint to the absolute physical offset of the original network packet. Constrained large model reasoning steps: Assemble the cross-temporal pathological features and the associated physical hash fingerprints into a prompt word template with mandatory constraint instructions; input the prompt word template into the large language model, forcing the large language model to only use the physical hash fingerprints provided in the input as reasoning evidence when outputting fault attribution conclusions; Closed-loop reverse verification and illusion interception steps: Obtain the fault attribution conclusion output by the large language model, parse and extract all referenced physical hash fingerprints in the fault attribution conclusion; compare and verify each extracted physical hash fingerprint with the physical hash fingerprint in the security mapping table; if any referenced physical hash fingerprint does not exist in the security mapping table, determine that the current fault attribution conclusion is an artificial intelligence illusion and intercept it. Anti-tampering report generation steps: After all the physical hash fingerprints referenced in the fault attribution conclusion have been verified, the underlying physical frame of the corresponding original network packet is extracted from the security mapping table based on the verified physical hash fingerprints. The underlying physical frame is used as anti-tampering evidence and packaged with the fault attribution conclusion to generate an automated digital forensics report.
[0010] As a further solution, the streaming media data packet is a video streaming communication data packet based on the RTSP, RTMP, or RTP protocol.
[0011] As a further solution, the cross-temporal pathological features extracted in the data ingestion and feature extraction steps specifically include: the joint matrix between the TCP receive window slippage curve and the RTP sequence number jump fault, and the joint matrix between the RTCP control message timestamp jitter and the TCP retransmission rate.
[0012] As a further solution, in the physical hash fingerprint generation and mapping step, the underlying physical attributes used to generate the physical hash fingerprint include the precise timestamp of the original network packet, the network communication quintuple, and protocol-specific key flag bits.
[0013] As a further solution, the mandatory constraint instructions in the constrained large model reasoning steps are used to stipulate that when the large language model outputs a fault attribution conclusion, it must and can only use the physical hash fingerprint provided in the input data as evidence, and prohibits fabricating or referencing any external knowledge.
[0014] As a further solution, the closed-loop reverse verification and hallucination interception steps also include: when the current fault attribution conclusion is determined to be an artificial intelligence hallucination and intercepted, a retry mechanism is triggered, and the instruction to remove the current virtual conclusion or add hallucination warning is reassembled with the original feature data and input into the large language model for reasoning again.
[0015] As a further solution, the digital forensics report generated in the anti-tampering report generation step specifically includes: a verified natural language conclusion on the cause of the failure, a list of all physical hash fingerprints referenced in the conclusion, and a copy of the underlying physical frame of the original network packet anchored to each of the physical hash fingerprints.
[0016] On the other hand, the present invention also provides an automatic streaming media forensics system based on an anti-tampering anchoring mechanism, for executing the automatic streaming media forensics method based on an anti-tampering anchoring mechanism as described above, the system comprising: Data ingestion and feature extraction module: receives and extracts cross-temporal pathological features spanning the transport layer and application layer from raw network streaming media data packets; wherein, the cross-temporal pathological features include at least a joint feature matrix between transport layer connection status flags and application layer media stream sequence number jumps; Physical hash fingerprint generation and mapping module: For each key signaling used to locate anomalies in the cross-temporal pathological features, based on one or more underlying physical attributes of the original network packet where the key signaling is located, a unique and irreversible physical hash fingerprint is generated using a one-way hash algorithm, and a secure mapping table is established locally from the physical hash fingerprint to the absolute physical offset of the original network packet. The constrained large model reasoning module assembles the cross-temporal pathological features and the associated physical hash fingerprints into a prompt word template with mandatory constraint instructions; inputs the prompt word template into the large language model, forcing the large language model to only use the physical hash fingerprints provided in the input as reasoning evidence when outputting fault attribution conclusions; Closed-loop reverse verification and illusion interception module: Obtain the fault attribution conclusion output by the large language model, parse and extract all referenced physical hash fingerprints in the fault attribution conclusion; compare and verify each extracted physical hash fingerprint with the physical hash fingerprint in the security mapping table; if any referenced physical hash fingerprint does not exist in the security mapping table, determine that the current fault attribution conclusion is an artificial intelligence illusion and intercept it. Anti-tampering report generation module: After all the physical hash fingerprints referenced in the fault attribution conclusion have been verified, the underlying physical frame of the corresponding original network packet is extracted from the security mapping table based on the verified physical hash fingerprints. The underlying physical frame is used as anti-tampering evidence and packaged with the fault attribution conclusion to generate an automated digital forensics report.
[0017] Compared with related technologies, the automatic forensic method and system for streaming media based on an anti-tampering anchoring mechanism provided by this invention has the following advantages: 1. This invention creatively proposes a "physical hash fingerprint generation and closed-loop reverse verification" mechanism. Before inputting into the large language model, a unique and irreversible physical hash fingerprint is generated for each key network signaling and a secure mapping table is established. After the large language model is output, it forcibly verifies whether each physical hash fingerprint referenced in its conclusion truly exists in the local mapping table and intercepts any fictitious fingerprints.
[0018] This mechanism fundamentally locks and reverse-verifies the diagnostic logic of a large language model based on probability generation with undeniable objective physical frame data, so that the final output of the fault attribution conclusion is no longer a vague semantic speculation, but a tamper-proof evidence chain precisely anchored to a specific timestamp and binary message sequence, giving the automated diagnostic results objective validity in commercial arbitration and legal disputes.
[0019] 2. This invention establishes a method for extracting cross-temporal pathological features specific to streaming media, spanning the transport and application layers. It can automatically build joint feature matrices between features such as "TCP receive window slippage curve" and "RTP sequence number jump discontinuity," and "RTCP control message timestamp jitter" and "TCP retransmission rate." This enables the system to discern the true root causes of faults hidden deep within the protocol interaction timing, accurately distinguishing between anomalies of vastly different natures, such as "network physical congestion" and "device encoder jamming." It fundamentally solves the problem of misjudgment caused by the single analytical dimension of existing tools, achieving expert-level fault diagnosis accuracy.
[0020] 3. This invention employs a strategy of "dimensionality reduction extraction followed by controlled inference." The system only extracts key cross-temporal features and physical hash fingerprints as input to the large language model, rather than analyzing the entire network packet. This mechanism brings two benefits: First, it compresses the massive PCAP investigation process, which traditionally requires hours or even days of manual analysis, into a few minutes of automated completion, achieving an extreme leap in efficiency; second, by reducing data dimensionality and redundancy, the context length processed by the large language model is reduced, avoiding memory overflow problems caused by token exceeding limits when analyzing GB-level files, while significantly reducing the calling cost, enabling the system to run stably with extremely low computing power.
[0021] 4. The automated process of this invention completely encapsulates complex protocol analysis, pathological feature extraction, evidence anchoring, and report generation. Frontline maintenance or evidence collection personnel only need to input the task intent described in natural language through the interactive interface, and the system can automatically schedule all subsequent expert-level processing steps. This enables personnel who are not proficient in the underlying streaming media protocols to quickly obtain a conclusive attribution report, significantly improving the tool's universality and usability. Attached Figure Description
[0022] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.
[0023] To more clearly illustrate the technical solutions in the embodiments of this application or related technologies, the accompanying drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, those skilled in the art can obtain other drawings based on these drawings without creative effort.
[0024] Figure 1 A schematic diagram illustrating the steps of an automatic forensics method for streaming media based on an anti-tampering anchoring mechanism provided by the present invention; Figure 2 A schematic diagram of the structure of an automatic streaming media forensics system based on an anti-tampering anchoring mechanism provided by the present invention; The purpose, features, and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation
[0025] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. The components of the embodiments of the present invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations.
[0026] Example 1 Please see Figure 1 This embodiment provides an automatic forensics method for streaming media based on an anti-tampering anchoring mechanism, including the following steps: Data ingestion and feature extraction steps: Receive and extract cross-temporal pathological features spanning the transport layer and application layer from the raw network streaming media data packets; wherein, the cross-temporal pathological features include at least a joint feature matrix between the transport layer connection status flag and the application layer media stream sequence number jump; Physical hash fingerprint generation and mapping steps: For each key signaling used to locate anomalies in the cross-temporal pathological features, a unique and irreversible physical hash fingerprint is generated based on one or more underlying physical attributes of the original network packet where the key signaling is located, using a one-way hash algorithm. A secure mapping table is then established locally from the physical hash fingerprint to the absolute physical offset of the original network packet. Constrained large model reasoning steps: Assemble the cross-temporal pathological features and the associated physical hash fingerprints into a prompt word template with mandatory constraint instructions; input the prompt word template into the large language model, forcing the large language model to only use the physical hash fingerprints provided in the input as reasoning evidence when outputting fault attribution conclusions; Closed-loop reverse verification and illusion interception steps: Obtain the fault attribution conclusion output by the large language model, parse and extract all referenced physical hash fingerprints in the fault attribution conclusion; compare and verify each extracted physical hash fingerprint with the physical hash fingerprint in the security mapping table; if any referenced physical hash fingerprint does not exist in the security mapping table, determine that the current fault attribution conclusion is an artificial intelligence illusion and intercept it. Anti-tampering report generation steps: After all the physical hash fingerprints referenced in the fault attribution conclusion have been verified, the underlying physical frame of the corresponding original network packet is extracted from the security mapping table based on the verified physical hash fingerprints. The underlying physical frame is used as anti-tampering evidence and packaged with the fault attribution conclusion to generate an automated digital forensics report.
[0027] It should be noted that the core of the method in this embodiment lies in constructing a closed-loop automatic evidence collection process that "first anchors physical evidence, then constrains AI reasoning, and finally verifies authenticity in reverse".
[0028] First, the system receives network streaming media packet capture files and doesn't simply calculate the packet loss rate. Instead, it delves into the cross-temporal pathological features that span the transport and application layers, such as the correlation between TCP window changes and RTP sequence number jumps. For each key signaling message within these features, the system generates a unique, irreversible physical hash fingerprint based on the underlying physical attributes of the original packet using a one-way hash algorithm. It then establishes a secure mapping table locally from this fingerprint to the absolute physical offset of the packet. This step is equivalent to pre-signing an unforgeable "electronic signature" to each piece of objective evidence before diagnosis and forcibly binding it to the original data location.
[0029] Subsequently, the system encapsulates the extracted pathological features and their associated physical hash fingerprints into a prompt word template with mandatory constraints. This requires the large language model to cite only the given physical hash fingerprint as evidence when outputting any fault attribution conclusion. After the large model returns a conclusion, the system enters the core verification stage: parsing all fingerprints cited in the conclusion and performing a reverse comparison one by one with the local security mapping table.
[0030] If any fingerprint is found not to exist in the mapping table, it is determined that the large model has generated an AI illusion, the conclusion is intercepted, and a retry is triggered; only a conclusion where all fingerprints are successfully matched is considered a valid inference based on real physical evidence. Finally, based on the verified fingerprints, the system backtracks from the mapping table to extract the corresponding original network packet's underlying physical frame, uses it as a tamper-proof digital attachment, and packages it with the attribution conclusion to generate a legally valid automated evidence collection report.
[0031] Further, the streaming media data packet is a video stream communication data packet based on the RTSP, RTMP or RTP protocol.
[0032] Specifically, for streaming media communication (such as RTSP / RTMP / RTP), the system specifically extracts the cross-temporal characteristics at the transport layer and the application layer (for example: "TCP receive window sliding curve", "RTP sequence number jump fault", and "RTCP control message timestamp jitter"). At the same time, for each extracted key signaling or abnormal feature, the system uses a one-way hashing algorithm (such as SHA-256) combined with timestamps, five-tuples, and specific flag bits to generate an irreversible underlying physical hash fingerprint (Hash_ID), and constructs a security mapping table of "<Hash_ID, absolute physical offset of the original message>" locally.
[0033] This embodiment breaks through the limitation of traditional network tools that only look at TCP packet loss, and extracts the core pathological features that define "network congestion" and "encoder freeze"; and by generating physical fingerprints in advance, it provides underlying data anchors for subsequent countering AI hallucinations.
[0034] Further, the cross-temporal pathological features extracted in the data ingestion and feature extraction steps specifically include: the joint matrix between the TCP receive window sliding curve and the RTP sequence number jump fault, and the joint matrix between the RTCP control message timestamp jitter and the TCP retransmission rate.
[0035] Specifically, in the fault diagnosis of streaming media communication, indicators at a single level are very likely to be misleading: for example, only looking at the increase in the TCP retransmission rate may be caused by network physical congestion or may be triggered due to the receive window being full due to insufficient client processing capabilities; only looking at the RTP sequence number jump may be due to network packet loss or may be due to a data stream interruption caused by an encoder freeze.
[0036] This embodiment creatively proposes a method of performing joint matrix analysis on two types of features - on the one hand, extracting the sliding curve of the TCP receive window, and on the other hand, synchronously tracking the RTP sequence number jump fault. Through the correlation change of the two on the time axis, the system can accurately locate the protocol layer and trigger source where the fault occurs.
[0037] Furthermore, this embodiment incorporates the timestamp jitter of RTCP control messages and the TCP retransmission rate into the joint analysis matrix. As a streaming media control protocol, RTCP's timestamp jitter directly reflects abnormal data processing rhythm on the terminal device side, while the TCP retransmission rate characterizes the transmission quality of the link. When the system simultaneously detects periodic and severe jitter in the RTCP timestamp while the TCP retransmission rate remains normal, it can be determined that the fault originates from the device encoder side rather than the network transmission link, and vice versa. This cross-layer joint feature matrix upgrades the fault attribution of this invention from traditional "guessional troubleshooting" to "deterministic reasoning," providing high-precision pathological feature input for subsequent large language model controlled inference and physical hash fingerprint anchoring.
[0038] Furthermore, in the physical hash fingerprint generation and mapping step, the underlying physical attributes used to generate the physical hash fingerprint include the precise timestamp of the original network packet, the network communication quintuple, and protocol-specific key flag bits.
[0039] Specifically, the generation of physical hash fingerprints is the logical starting point of the anti-tampering mechanism of this invention, and the selection of its input source directly determines the uniqueness, unforgeability, and traceability of the fingerprint. This embodiment specifies that the underlying physical attributes on which the generation of physical hash fingerprints is based include: the precise timestamp of the original network packet, the network communication quintuple (source IP, destination IP, source port, destination port, transport layer protocol), and protocol-specific key flag bits.
[0040] The combination of these three attributes reflects sophisticated technical design considerations. Precise timestamps lock the fingerprint into a microsecond-level time dimension, ensuring that even if two messages have identical content, their fingerprints will be different if they occur at different times, providing a naturally unique identifier for each communication event. The network communication quintuple then forcibly binds the fingerprint to a specific communication session, spatially preventing cross-session forgery or obfuscation.
[0041] Protocol-specific key flags, such as the SETUP / DESCRIBE method type in RTSP or the Marker bit in RTP, endow the fingerprint with semantic-level identification capabilities of the streaming media protocol, enabling the system to accurately locate specific control or data events within a session. The physical hash fingerprint, generated by combining these three elements using a one-way hash algorithm, becomes an immutable, cryptographically unique identifier for the message at a specific time, in a specific session, and under specific protocol operations, providing a precise and reliable comparison benchmark for subsequent closed-loop reverse verification.
[0042] Furthermore, the closed-loop reverse verification and hallucination interception steps also include: when the current fault attribution conclusion is determined to be an artificial intelligence hallucination and intercepted, a retry mechanism is triggered, and the instruction to remove the current virtual conclusion or add hallucination warning is reassembled with the original feature data and input again into the large language model for reasoning.
[0043] Specifically, in this invention, the role of the large language model is reshaped from an "open generator" to a "controlled inference engine." The key means to achieve this reshaping is the two hard constraints specified in this embodiment: First, when the large model outputs a fault attribution conclusion, it must cite the physical hash fingerprint provided in the input data as evidence; Second, the large model can only cite these given fingerprints and is strictly prohibited from citing any external knowledge or fabricating evidence identifiers on its own.
[0044] These two constraints have profound technical significance. "Must-cite" forces large models to bind their reasoning process to pre-locked physical evidence from the system, preventing the model from outputting only vague semantic conclusions without providing verifiable evidence. "Can only cite" cuts off the model's access to information that may exist in its massive pre-trained parameters and is irrelevant to the current specific network session, strictly limiting its reasoning space to the feature data provided by the system in this instance and anchored by hashes.
[0045] This dual constraint, implemented at the prompt word level, standardizes the comparison basis for subsequent closed-loop verification steps—the system only needs to compare the output fingerprint with the local mapping table, without having to deal with inconsistent formats or non-standardized external references introduced by the model. This greatly improves the accuracy and efficiency of verification and is a pre-guarantee mechanism that is closely coupled with subsequent illusion interception steps.
[0046] Furthermore, in the step of generating the anti-tampering report, the packaged digital forensics report specifically includes: a verified natural language conclusion on the cause of the failure, a list of all physical hash fingerprints referenced in the conclusion, and a copy of the underlying physical frame of the original network packet anchored to each of the physical hash fingerprints.
[0047] Specifically, when the referenced physical hash fingerprint fails the comparison verification of the local security mapping table, the fault attribution conclusion will be judged as an AI hallucination and intercepted. This embodiment adds crucial subsequent processing logic: the system does not simply discard the conclusion and terminate the task, but automatically triggers a retry mechanism, reassembling the instruction to remove the current virtual conclusion or add a hallucination warning with the original cross-temporal pathological features and physical hash fingerprint data, and then re-inputting it into the large language model for inference.
[0048] This retry mechanism is designed with a clear technical focus. When reassembling input data, the system can guide the large model to correct biases in two ways: first, by directly removing erroneous conclusions from the previous output that contain fictitious fingerprints, preventing the model from continuing erroneous patterns in the context; second, by adding explicit phantom warning information to the prompt words, informing the model which specific physical hash fingerprint was judged as fake in the previous output, guiding it to avoid similar fictitious behavior during re-inference.
[0049] This closed-loop retry design with corrective feedback enables the large language model to have a certain self-correction capability within the constrained reasoning space, significantly improving the system's fault tolerance and task completion rate when facing single hallucinations, ensuring the robustness and integrity of the automated evidence collection process, and preventing the entire diagnostic task from being interrupted or failed due to occasional AI hallucinations.
[0050] Example 2 Please see Figure 2 The present invention also provides an automatic streaming media forensics system based on an anti-tampering anchoring mechanism, used to execute the automatic streaming media forensics method based on an anti-tampering anchoring mechanism as described above, the system comprising: Data ingestion and feature extraction module: receives and extracts cross-temporal pathological features spanning the transport layer and application layer from raw network streaming media data packets; wherein, the cross-temporal pathological features include at least a joint feature matrix between transport layer connection status flags and application layer media stream sequence number jumps; Physical hash fingerprint generation and mapping module: For each key signaling used to locate anomalies in the cross-temporal pathological features, based on one or more underlying physical attributes of the original network packet where the key signaling is located, a unique and irreversible physical hash fingerprint is generated using a one-way hash algorithm, and a secure mapping table is established locally from the physical hash fingerprint to the absolute physical offset of the original network packet. The constrained large model reasoning module assembles the cross-temporal pathological features and the associated physical hash fingerprints into a prompt word template with mandatory constraint instructions; inputs the prompt word template into the large language model, forcing the large language model to only use the physical hash fingerprints provided in the input as reasoning evidence when outputting fault attribution conclusions; Closed-loop reverse verification and illusion interception module: Obtain the fault attribution conclusion output by the large language model, parse and extract all referenced physical hash fingerprints in the fault attribution conclusion; compare and verify each extracted physical hash fingerprint with the physical hash fingerprint in the security mapping table; if any referenced physical hash fingerprint does not exist in the security mapping table, determine that the current fault attribution conclusion is an artificial intelligence illusion and intercept it. Anti-tampering report generation module: After all the physical hash fingerprints referenced in the fault attribution conclusion have been verified, the underlying physical frame of the corresponding original network packet is extracted from the security mapping table based on the verified physical hash fingerprints. The underlying physical frame is used as anti-tampering evidence and packaged with the fault attribution conclusion to generate an automated digital forensics report.
[0051] This embodiment provides an automated streaming media forensics system based on an anti-tampering anchoring mechanism. Deployed on a standard server, the system receives raw streaming media packet capture files (PCAP format) and user-defined natural language fault descriptions via a network interface, automatically completing a closed-loop process from pathological feature extraction to the generation of an anti-tampering forensics report.
[0052] This system consists of four core software modules: a streaming media-specific feature extraction and underlying fingerprint generation module, an intelligent agent scheduling and mandatory constraint prompting module, an underlying fingerprint closed-loop verification and illusion interception module, and an automated evidence collection and anti-tampering report generation module. These modules work collaboratively, and the specific implementation process is as follows.
[0053] Phase 1: Feature Extraction and Physical Fingerprint Generation In a typical troubleshooting scenario, the maintenance personnel suspected that the frequent video freezing issues on a certain video surveillance platform were the responsibility of streaming media equipment supplier B. They input a 2.8GB PCAP file covering ten minutes before and after the failure, along with the natural language command "Investigate the responsibility for the streaming media freezing issue and pinpoint whether it is a problem with client A or device B" into the system.
[0054] The streaming media-specific feature extraction and underlying fingerprint generation module first traverses the entire PCAP file, stripping away the audio and video binary payloads and retaining only the protocol header information and signaling data. Based on the five-tuple information, this module divides the data stream into multiple sessions and, for each session, synchronously extracts the transport layer TCP receive window size change curve, TCP retransmission message sequence, application layer RTP sequence number evolution sequence, and timestamp field from the RTCP sender report on a timeline. The module calculates the sequence number difference between adjacent RTP messages to construct a sequence number jump fault map; simultaneously, it records the deviation between the RTCP timestamp and the local reference clock, generating a timestamp jitter sequence. By correlating the TCP window slippage curve with the RTP sequence number jump fault on a unified timeline, the module automatically identifies that during the fault period, the RTP sequence number experiences continuous large-span jumps, while the TCP receive window remains at a large value, and the TCP retransmission rate is extremely low. This joint feature matrix indicates that the problem is not caused by network congestion, but is most likely due to abnormal encoder output at the device end.
[0055] For each key signaling event identified in the analysis, such as "Client A sends an RTSP PLAY request," "Server B returns the RTP sequence number transition start point," and "TCP window first shrinks," the module extracts the precise microsecond-level timestamp, source IP address, destination IP address, source port, destination port, and transport protocol type of the original network packet containing the signaling as input parameters. Combining these with protocol-specific key flags, the module uses the SHA-256 one-way hash algorithm to generate a unique, irreversible 64-bit hexadecimal string as a physical hash fingerprint for each key signaling event, denoted as Hash_ID. The module constructs and maintains a key-value pair secure mapping table in local memory, representing "Hash_ID → absolute physical offset of the original packet," thus pre-anchoring the objective physical evidence.
[0056] Phase Two: Constrained Large Model Scheduling and Inference The intelligent agent scheduling and mandatory constraint prompt module receives the structured feature text and Hash_ID list from the first stage output. This module has a pre-set mandatory constraint prompt template, the core immutable part of which is: "You are a network failure attribution expert. Below are all the key signaling events extracted in this analysis and their unique Hash_IDs. Please perform causal reasoning based on these events and output the failure attribution conclusion. In the conclusion, every piece of evidence supporting your viewpoint must and can only cite the following provided Hash_IDs. Citing any Hash_IDs not appearing in the list is strictly prohibited; violators will be considered invalid output." The module strictly fills in the TCP / RTP / RTCP joint pathological feature descriptions extracted in the first stage with the associated Hash_ID according to the template to construct a complete prompt word. For example, the filled prompt word includes: "[Event 1] Client A initiates an RTSP PLAY request at time T1, Hash_ID: a1b2c3d4...; [Event 2] Server B returns an RTP stream at time T2, first packet sequence number 0x05A3, Hash_ID: e5f6g7h8...; [Event 3] At time T3, the RTP sequence number jumps from 0x05A3 to 0x06D0, with an adjacent difference of 301, the TCP window size is 65535 bytes, no retransmission, Hash_ID: i9j0k1l2...". The system sends this prompt word to the large language model via API, enables zero-temperature parameters to ensure output determinism, and completes controlled inference.
[0057] Phase 3: Closed-Loop Counter-Verification and Illusion Interception The underlying fingerprint closed-loop verification and illusion interception module intercepts the fault attribution conclusion text returned by the large language model. For example, the large model outputs the conclusion: "Analysis shows that the root cause of the fault is that the encoder resources of server device B are exhausted, causing the RTP sequence number to abnormally jump at time T3 (Hash_ID: i9j0k1l2...). At this time, the TCP window is normal, ruling out network congestion as the cause." The module uses regular expressions to quickly extract all referenced Hash_ID strings in the conclusion. In this embodiment, Hash_ID: i9j0k1l2 is extracted.
[0058] The module performs a precise comparison of the Hash_ID with all Hash_IDs in the security mapping table established in the first phase. Upon querying, Hash_ID: i9j0k1l2 exists in the mapping table. Since all Hash_IDs referenced in the large model's returned conclusion have passed verification, the module determines that the conclusion is based on real physical evidence and allows it to proceed. If the large model references a fictitious fingerprint such as Hash_ID: xyz000abc in the next processing and return conclusion, the module's reverse comparison will immediately detect that this fingerprint does not exist in the security mapping table, determining that this output is an AI illusion, intercepting the conclusion, and triggering a retry mechanism—reassembling the original feature data with an additional illusion warning, and rescheduling the large model's inference until a conclusion with all fingerprints verified is output.
[0059] Phase 4: Physical Frame Retrospection and Anti-Tampering Report Generation After receiving the verified conclusion, the automated forensics and anti-tampering report generation module precisely queries the security mapping table for the corresponding absolute physical offset of the original packet based on the Hash_ID: i9j0k1l2 referenced in the conclusion. The module opens the original PCAP file, directly locates the offset using the file pointer, and extracts the complete original binary network frame data from the physical layer, including all hexadecimal contents of the Ethernet frame header, IP header, TCP header, and RTP payload. A one-way, irreversible verification relationship exists between this original frame data and the Hash_ID; any tampering with the original frame will result in a mismatch between the recalculated hash value and the Hash_ID, thus inherently possessing anti-tampering characteristics.
[0060] The module ultimately generates a standardized automated digital forensics report, which includes: (1) a task overview and the original natural language instructions; (2) the full text of the verified fault attribution conclusion—"The root cause of the fault is attributed to server device B, whose encoder resources were exhausted, causing the RTP stream sequence number to jump abnormally. The responsible party is device supplier B"; (3) a list of all Hash_IDs supporting the conclusion and their corresponding hash values; and (4) a hexadecimal copy of the underlying physical frame of the original network packet anchored to each Hash_ID, which can be verified by an independent third party by recalculating the hash value. The report is output in PDF format and can be directly used as conclusive technical evidence in B2B business dispute arbitration or legal proceedings.
[0061] In summary, this embodiment achieves the following technical effects: 1. Achieving 100% objectivity, traceability, and tamper-proof consistency in AI reasoning conclusions (extremely high credibility): Utilizing Hash_ID verification and physical frame reverse extraction mechanisms, the system's output fault attribution conclusions are no longer vague semantic inferences, but rather tamper-proof evidence chains precisely anchored to microsecond-level timestamps and specific binary message sequence numbers. This endows automated diagnostic results with the validity of commercial arbitration and legal dispute resolution.
[0062] 2. Extremely high accuracy in attributing streaming media failures (performance breakthrough): Breaking through the limitations of single-dimensional protocol analysis, the system can accurately determine the responsible party based on the timing cross-verification of RTCP jitter, TCP retransmission and RTP interruption unique to streaming media (e.g., clearly indicating whether it is due to excessive client concurrency causing the TCP window to be 0, or whether it is due to server resource exhaustion causing the RTP stream to be interrupted).
[0063] 3. Exceptional Efficiency and Low-Cost Operation (Efficiency Improvement and Cost Reduction): The process of manually investigating massive amounts of PCAP packet captures, which traditionally takes hours or even days, is compressed into minutes by an automated agent. Furthermore, because the system only inputs the extracted key features and Hash_ID into the large model, it significantly reduces the token consumption of large models, avoiding memory overflow (OOM) issues when analyzing gigabyte-scale packet capture files.
[0064] 4. Significantly lowers the operational threshold for advanced network troubleshooting: Frontline maintenance personnel do not need to be proficient in complex underlying video transmission stream protocols. They only need to input natural language commands, and the system can automatically complete expert-level pathological feature analysis and evidence solidification.
[0065] Compared with the prior art, the present invention has technical advantages in different comparative dimensions, as detailed in Table 1: Table 1. Technology Comparison Table The above are only some embodiments of this application and do not limit the patent scope of this application. All equivalent structural transformations made under the technical concept of this application and using the contents of the specification and drawings of this application, or direct / indirect applications in other related technical fields, are included in the patent protection scope of this application.
Claims
1. A method for automatic evidence collection of streaming media based on an anti-tampering anchoring mechanism, characterized in that, Includes the following steps: Data ingestion and feature extraction steps: Receive and extract cross-temporal pathological features spanning the transport layer and application layer from the raw network streaming media data packets; wherein, the cross-temporal pathological features include at least a joint feature matrix between the transport layer connection status flag and the application layer media stream sequence number jump; Physical hash fingerprint generation and mapping steps: For each key signaling used to locate anomalies in the cross-temporal pathological features, a unique and irreversible physical hash fingerprint is generated based on one or more underlying physical attributes of the original network packet where the key signaling is located, using a one-way hash algorithm. A secure mapping table is then established locally from the physical hash fingerprint to the absolute physical offset of the original network packet. Constrained large model inference steps: Assemble the cross-temporal pathological features and the associated physical hash fingerprints into a prompt word template with mandatory constraint instructions; The prompt word template is input into the large language model, which is forced to use the physical hash fingerprint provided in the input as reasoning evidence when outputting the fault attribution conclusion. Closed-loop reverse verification and illusion interception steps: Obtain the fault attribution conclusion output by the large language model, parse and extract all referenced physical hash fingerprints in the fault attribution conclusion; compare and verify each extracted physical hash fingerprint with the physical hash fingerprint in the security mapping table; if any referenced physical hash fingerprint does not exist in the security mapping table, determine that the current fault attribution conclusion is an artificial intelligence illusion and intercept it. Anti-tampering report generation steps: After all the physical hash fingerprints referenced in the fault attribution conclusion have been verified, the underlying physical frame of the corresponding original network packet is extracted from the security mapping table based on the verified physical hash fingerprints. The underlying physical frame is used as anti-tampering evidence and packaged with the fault attribution conclusion to generate an automated digital forensics report.
2. The automatic forensic method for streaming media based on an anti-tampering anchoring mechanism according to claim 1, characterized in that, The streaming media data packets are video streaming communication data packets based on RTSP, RTMP, or RTP protocols.
3. The automatic forensic method for streaming media based on an anti-tampering anchoring mechanism according to claim 1, characterized in that, The cross-temporal pathological features extracted in the data ingestion and feature extraction steps specifically include: the joint matrix between the TCP receive window slip curve and the RTP sequence number jump fault, and the joint matrix between the RTCP control message timestamp jitter and the TCP retransmission rate.
4. The automatic forensic method for streaming media based on an anti-tampering anchoring mechanism according to claim 1, characterized in that, In the physical hash fingerprint generation and mapping step, the underlying physical attributes used to generate the physical hash fingerprint include the precise timestamp of the original network packet, the network communication quintuple, and protocol-specific key flag bits.
5. The automatic forensic method for streaming media based on an anti-tampering anchoring mechanism according to claim 1, characterized in that, The mandatory constraint instructions in the constrained large model reasoning steps are used to stipulate that when the large language model outputs a fault attribution conclusion, it must and can only use the physical hash fingerprint provided in the input data as evidence, and is prohibited from fabricating or using any external knowledge.
6. The automatic streaming media forensics method based on an anti-tampering anchoring mechanism according to claim 1, characterized in that, The closed-loop reverse verification and hallucination interception steps also include: when the current fault attribution conclusion is determined to be an artificial intelligence hallucination and is intercepted, a retry mechanism is triggered, and the instruction to remove the current virtual conclusion or add hallucination warning is reassembled with the original feature data and input into the large language model for reasoning again.
7. The automatic streaming media forensics method based on an anti-tampering anchoring mechanism according to claim 1, characterized in that, In the step of generating the anti-tampering report, the packaged digital forensics report specifically includes: a verified natural language conclusion on the cause of the failure, a list of all physical hash fingerprints referenced in the conclusion, and a copy of the underlying physical frame of the original network packet anchored to each physical hash fingerprint.
8. A streaming media automatic evidence collection system based on an anti-tampering anchoring mechanism, characterized in that, The system is used to execute the automatic streaming media forensics method based on an anti-tampering anchoring mechanism as described in any one of claims 1 to 7, the system comprising: Data ingestion and feature extraction module: receives and extracts cross-temporal pathological features spanning the transport layer and application layer from raw network streaming media data packets; wherein, the cross-temporal pathological features include at least a joint feature matrix between transport layer connection status flags and application layer media stream sequence number jumps; Physical hash fingerprint generation and mapping module: For each key signaling used to locate anomalies in the cross-temporal pathological features, based on one or more underlying physical attributes of the original network packet where the key signaling is located, a unique and irreversible physical hash fingerprint is generated using a one-way hash algorithm, and a secure mapping table is established locally from the physical hash fingerprint to the absolute physical offset of the original network packet. The constrained large model reasoning module assembles the cross-temporal pathological features and the associated physical hash fingerprints into a prompt word template with mandatory constraint instructions; inputs the prompt word template into the large language model, forcing the large language model to only use the physical hash fingerprints provided in the input as reasoning evidence when outputting fault attribution conclusions; Closed-loop reverse verification and illusion interception module: Obtain the fault attribution conclusion output by the large language model, parse and extract all referenced physical hash fingerprints in the fault attribution conclusion; compare and verify each extracted physical hash fingerprint with the physical hash fingerprint in the security mapping table; if any referenced physical hash fingerprint does not exist in the security mapping table, determine that the current fault attribution conclusion is an artificial intelligence illusion and intercept it. Anti-tampering report generation module: After all the physical hash fingerprints referenced in the fault attribution conclusion have been verified, the underlying physical frame of the corresponding original network packet is extracted from the security mapping table based on the verified physical hash fingerprints. The underlying physical frame is used as anti-tampering evidence and packaged with the fault attribution conclusion to generate an automated digital forensics report.