A large file and heterogeneous data one-way processing system for isolated environment
By employing dynamic fragmentation, dual verification, and multi-module linkage mechanisms, the efficiency and security issues of processing large files and heterogeneous data in isolated environments have been resolved, achieving efficient, secure, and accurate data transmission and processing.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- XIAN AEROSPACE PROPULSION INST
- Filing Date
- 2025-09-12
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies suffer from problems such as an imbalance between transmission efficiency and security, improper resource allocation, and poor data correlation when processing large files and heterogeneous data in isolated environments, making it difficult to meet the requirements for efficient, secure, and accurate processing.
It adopts a multi-module linkage mechanism consisting of dynamic fragmentation units, dual verification, preprocessing module, heterogeneous data processing module, unidirectional transmission control module, storage module and multimodal integration module. By dynamically adjusting the fragmentation size, verification threshold, resource allocation and data association method, it achieves efficient and secure data transmission and processing.
It improves transmission success rate, avoids feature distortion, optimizes resource allocation, enhances data association robustness, adapts to the special constraints of isolated environments, and achieves efficient, secure, and accurate one-way processing of large files and heterogeneous data.
Smart Images

Figure CN121116634B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data processing technology, specifically to a one-way processing system for large files and heterogeneous data in isolated environments. Background Technology
[0002] To ensure data security, physical isolation or strong logical isolation environment architecture is usually adopted. In such environments, data must strictly follow the principle of unidirectional flow (such as transmission from low security domain to high security domain). In addition, the number of large files (such as high-definition surveillance videos and massive logs) and heterogeneous data (text, images, structured tables, etc.) that need to be processed is increasing, which puts forward higher requirements for data transmission efficiency, processing accuracy and security.
[0003] Existing technologies for data processing in isolated environments mainly fall into the following categories:
[0004] Fixed-fragmentation transmission technology: Files are split into segments of preset size and then transmitted through a one-way channel. Its core purpose is to simplify the complexity of large file transfers, but the segment size is statically set and does not consider factors such as bandwidth fluctuations and channel stability differences in isolated environments.
[0005] Independent security verification and data preprocessing technology: Security isolation devices perform signature verification independently, while the data preprocessing module independently performs noise reduction and filtering operations. Both focus on security protection and data standardization respectively, but no linkage mechanism is established.
[0006] Static priority scheduling for heterogeneous data processing: This technique allocates computing resources with fixed priorities for large files and multimodal data, pre-setting the processing order according to file type. Its design goal is to simplify scheduling logic, but it does not consider the dynamic changes in limited resources within an isolated environment.
[0007] Fixed-rule unidirectional transmission and storage management technology: Transmission rules and storage quotas are pre-configured, such as a fixed upper limit for single transmission volume and allocation of fixed storage partitions according to type. Its core is to ensure the stability of transmission and storage, but it does not consider the impact of dynamic changes in storage capacity on transmission in an isolated environment;
[0008] Single-modal or fixed-weight multimodal integration techniques: These techniques extract independent features from text, images, and other data and store them directly, or use fixed-weight models for association. While designed to integrate basic data, they fail to optimize the association logic for the characteristics of isolated environments, such as data loss and quality fluctuations.
[0009] Under the special constraints of isolated environments, the aforementioned existing technologies have certain limitations in adaptability: fixed fragmentation transmission struggles to balance the contradiction between "excessively large fragments leading to high verification failure rates" and "excessively small fragments leading to low transmission efficiency"; independent security verification and preprocessing may result in feature distortion due to parameter mismatch; static priority scheduling is prone to causing delays in core task processing when resources are scarce; transmission and storage management based on fixed rules may lead to data loss due to storage overflow; and multimodal integration with single or fixed weights suffers from decreased correlation accuracy when data is missing. These limitations make it difficult for existing technologies to fully meet the needs of efficient, secure, and accurate processing of large files and heterogeneous data in isolated environments. Summary of the Invention
[0010] To address the shortcomings of existing technologies, this invention provides a one-way processing system for large files and heterogeneous data in isolated environments.
[0011] A one-way processing system for large files and heterogeneous data in isolated environments includes an input adaptation module and a security isolation module, which are connected in series.
[0012] The input adaptation module is configured with a dynamic fragmentation unit, which calculates the fragment size using the following formula. : in, For file type coefficients, Original file size, in units of , For isolated unidirectional channel bandwidth, the unit is 1. , For redundancy check coefficients, The unit is ;
[0013] The security isolation module uses optical gate physical isolation and includes a one-way protocol stack and a dual verification unit. The one-way protocol stack only receives fragmented data output by the input adaptation module. The dual verification unit performs feature code matching and behavior entropy value detection on the fragmented data. Fragments that fail the dual verification unit verification are regenerated by the input adaptation module and transmitted to the security isolation module.
[0014] It also includes a preprocessing module connected to the heterogeneous data processing module. This preprocessing module performs reassembly and standardization on the fragmented data output by the security isolation module. The standardization includes improved median filtering for images and non-textual processing. Character filtering and structured data improvement Outlier removal and data processing by the preprocessing module are then transmitted to the heterogeneous data processing module.
[0015] The heterogeneous data processing module includes a large file processing unit, a multimodal feature extraction unit, and a priority scheduling unit. The priority scheduling unit is shared by the large file processing unit and the multimodal feature extraction unit, and it calculates file priorities. ,according to Allocate computing resources to the large file processing unit and the multimodal feature extraction unit;
[0016] The one-way transmission control module and the storage module are connected to form a write closed loop. The one-way transmission control module includes a transmission rule base and a blockchain log unit. The transmission rule base restricts data to be transmitted only from the heterogeneous data processing module to the storage module. The blockchain log unit records the data transmission trajectory. The storage module adopts a read-only partition architecture, only receives write requests from the one-way transmission control module, stores data in partitions according to data type, and enables a type-adaptive compression algorithm.
[0017] The multimodal integration module is integrated into the storage module. Based on the standardization results of the preprocessing module and the feature extraction results of the heterogeneous data processing module, the multimodal integration module maps the features of text, images, and structured data to a unified space by improving the autoencoder, and establishes a cross-modal index.
[0018] The output module is connected to the storage module via a read-only interface. The output module performs format, syntax, and semantic checks on the integrated data before outputting it.
[0019] Preferably, the dynamic fragmentation unit includes a fragmentation calculation subunit and a retransmission probability prediction subunit. The fragmentation calculation subunit and the retransmission probability prediction subunit are linked, and the retransmission probability prediction subunit calculates the fragment loss probability using a formula. : in This is a retransmission correction factor, dynamically adjusted based on historical retransmission behavior, reflecting the degree of influence of the network environment on retransmissions. This refers to the fragmentation transmission time parameter;
[0020] when At that time, the retransmission probability prediction subunit triggers the fragment calculation subunit according to... The fragment size is reduced proportionally, and the input adaptation module sends a "high-risk fragment" flag to the security isolation module. When the dual verification unit of the security isolation module performs behavioral entropy value detection on fragments containing this flag, it sets the detection threshold. Adjusted to version 3.0;
[0021] When the calculated behavioral entropy value exceeds the adjusted detection threshold, the dual verification unit determines that the segment has an abnormal risk, thereby triggering subsequent security policies.
[0022] The dynamic fragmentation unit generates the compression ratio of deep compression for each fragment. The dual verification unit of the security isolation module, using checksums and file fingerprint hashes, improves the compression ratio of deep compression. After the verification code and file fingerprint hash are verified, a "verification passed" signal is sent back to the input adaptation module. This signal is transmitted in one direction only and there is no data return.
[0023] Preferably, the security isolation module includes a behavior entropy detection unit and an offline feature library update subunit. The behavior entropy detection unit works in conjunction with the offline feature library update subunit. The behavior entropy detection unit calculates the file behavior entropy using the following formula. :
[0024] in The total number of events involved in the calculation of behavioral entropy. For the first The probability of each feature term occurring;
[0025] When detected 3 times consecutively Furthermore, when no matching feature library is found, the offline feature library update sub-unit extraction... The byte feature fragment is added to the local feature library, and the hash value of the feature fragment is sent to the preprocessing module. The preprocessing module automatically performs depth standardization on subsequent files containing the same hash value. The depth standardization includes multi-round image denoising and text granular refinement.
[0026] Preferably, the preprocessing module includes an improved median filtering unit, and the heterogeneous data processing module includes an improved... Feature extraction unit, the improved median filtering unit and the improved The feature extraction unit works in tandem, and the improved median filtering unit calculates the window size using the following formula. :
[0027] in Image noise density; the image filtered by the improved median filter unit is transmitted to the improved... Feature extraction unit, improved Scale space of feature extraction unit Scope based on Dynamic adjustment The feature vector dimension is simplified by the following formula, where the feature vector is the improved... Feature extraction unit extracts Feature vector:
[0028]
[0029] in For the simplified version Feature vector dimension This is the baseline scale parameter.
[0030] Preferably, the transmission rule base of the one-way transmission control module is linked to the quota dynamic adjustment mechanism of the storage module. When the utilization rate of a certain partition of the storage module is ≥80%, the storage module sends a "quota insufficient" signal to the one-way transmission control module. The transmission rule base prioritizes the transmission of corresponding data types using the following formula. Reduced by 30%
[0031]
[0032] in, Prioritize data transmission. To indicate the urgency of the data, This represents the current used capacity of the storage module. The total capacity of the storage module, These are the characteristic values corresponding to the data type.
[0033] Preferably, the improved autoencoder of the multimodal integration module is linked with the standardized unit of the preprocessing module to improve the loss function of the autoencoder. Calculated using the following formula:
[0034]
[0035] in , These are the feature vectors of the preprocessed text and image, respectively. These are the weighting coefficients of the absolute difference between the text feature vector and the image feature vector; The weighting coefficients are the complement terms for the similarity between text and image features.
[0036] Preferably, it also includes a monitoring module, which forms a feedback closed loop with the unidirectional transmission control module and the heterogeneous data processing module. The monitoring module calculates the load index using a formula. :
[0037]
[0038] when At this time, the monitoring module reduces the transmission rate to the unidirectional transmission control module to the set value, and at the same time sends a resource reallocation signal to the heterogeneous data processing module. The heterogeneous data processing module increases the proportion of computing resources allocated to the large file processing unit to the set value.
[0039] Preferably, the output module includes a semantic verification unit, which is linked to the association index of the multimodal integration module. The semantic verification unit is based on the power domain ontology library and calculates the confidence level using a formula. :
[0040] in The confidence level output by the semantic verification unit. It is a matching degree function used to calculate the features to be verified. Domain rules The degree of matching; when When the output module calls the association index of the multimodal integration module, it extracts high-confidence data with the same topic and association strength ≥ the association strength judgment threshold for auxiliary verification. If the auxiliary verification passes, it is judged as qualified.
[0041] Preferably, the file fingerprint hash is generated by the dynamic sharding unit based on the shard data, the shard sequence number, and the file's unique identifier. After the hash is calculated and generated, and the dual verification unit of the security isolation module verifies the hash, the blockchain log unit records the corresponding hash value, which serves as the index key value for the partitioned storage of the storage module.
[0042] Preferably, the type-adaptive compression algorithm of the storage module is linked with the standardization result of the preprocessing module: data determined by the preprocessing module to be text-intensive is processed using... Deep compression, specifically a compression ratio of ≥3:1; data classified as image-dense uses... Quantization compression, specifically the compression parameters for quantization compression. .
[0043] This invention provides a one-way processing system for large files and heterogeneous data in isolated environments, which has the following advantages:
[0044] Through a dynamic adjustment mechanism involving multiple modules, the limitations of existing technologies in isolated environments are specifically addressed: dynamic fragmentation and dual verification work together to balance transmission efficiency and security, significantly improving transmission success rate compared to fixed fragmentation mode; parameter coordination between preprocessing and heterogeneous data processing avoids feature distortion and improves data association accuracy; dynamic scheduling based on load and storage status optimizes resource allocation, solving resource waste or overflow problems under static management; and adaptive weight adjustment through multimodal integration enhances the robustness of association in data missing scenarios, fully adapting to the special constraints of bandwidth fluctuations, limited resources, and uneven data quality in isolated environments, achieving efficient, secure, and accurate unidirectional processing of large files and heterogeneous data. Attached Figure Description
[0045] Figure 1 This is a system flowchart of the present invention. Detailed Implementation
[0046] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0047] like Figure 1 As shown, this invention proposes a one-way processing system for large files and heterogeneous data in isolated environments, including an input adaptation module and a security isolation module, wherein the input adaptation module and the security isolation module are connected in series.
[0048] The input adaptation module is configured with a dynamic fragmentation unit, which calculates the fragment size using the following formula. :
[0049] in, For file type coefficients, Original file size, in units of , For isolated unidirectional channel bandwidth, the unit is 1. , For redundancy check coefficients, The unit is ;
[0050] It should be noted that for file type coefficients The values are typically derived from internal technical specifications documents within a company or industry, and specific definitions are provided for different document types. Values: Text files have a regular data structure and low transmission and processing overhead. Set to 1.0; Image files (such as .jpg, .png) require consideration of image block integrity, resulting in slightly higher processing overhead. Take 1.5; Video files (such as .mp4, .avi) have inter-frame dependencies and the segments need to be aligned with the encoding blocks, making the processing logic more complex, so take 2.
[0051] The redundancy check coefficient is typically derived from specific algorithm design documents and needs to be determined by combining redundancy algorithm theory with fault simulation verification. To address the issue of fragment loss or corruption in one-way transmission, a [specific method / mechanism] is adopted. For erasure coding, if the requirement is "to be able to fully recover the original data even when no more than 25% of the fragments are lost", the theoretical formula of erasure coding is combined with the calculation and verified by multiple sets of experiments "simulating 25% fragment loss" (all 100 sets of experiments can fully recover the original data through redundant fragments). Finally, the value is determined to be 1.25 (that is, the total fragment data volume is 1.25 times the original data volume, including the original data and redundant check data).
[0052] The security isolation module uses optical gate physical isolation and includes a one-way protocol stack and a dual verification unit. The one-way protocol stack only receives fragmented data output by the input adaptation module. The dual verification unit performs feature code matching and behavior entropy value detection on the fragmented data. Fragments that fail the dual verification unit verification are regenerated by the input adaptation module and transmitted to the security isolation module.
[0053] It also includes a preprocessing module connected to the heterogeneous data processing module. This preprocessing module performs reassembly and standardization on the fragmented data output by the security isolation module. The standardization includes improved median filtering for images and non-textual processing. Character filtering and structured data improvement Outlier removal and data processing by the preprocessing module are then transmitted to the heterogeneous data processing module.
[0054] The heterogeneous data processing module includes a large file processing unit, a multimodal feature extraction unit, and a priority scheduling unit. The priority scheduling unit is shared by the large file processing unit and the multimodal feature extraction unit, and it calculates file priorities. ,according to Allocate computing resources to the large file processing unit and the multimodal feature extraction unit;
[0055] It should be noted that: ,in Urgency level coefficient , For file size, The maximum file size that the system can process. Weights for data types ;
[0056] The one-way transmission control module and the storage module are connected to form a write closed loop. The one-way transmission control module includes a transmission rule base and a blockchain log unit. The transmission rule base restricts data to be transmitted only from the heterogeneous data processing module to the storage module. The blockchain log unit records the data transmission trajectory. The storage module adopts a read-only partition architecture, only receives write requests from the one-way transmission control module, stores data in partitions according to data type, and enables a type-adaptive compression algorithm.
[0057] The multimodal integration module is integrated into the storage module. Based on the standardization results of the preprocessing module and the feature extraction results of the heterogeneous data processing module, the multimodal integration module maps the features of text, images, and structured data to a unified space by improving the autoencoder, and establishes a cross-modal index.
[0058] The output module is connected to the storage module via a read-only interface. The output module performs format, syntax, and semantic checks on the integrated data before outputting it.
[0059] It should be noted that by achieving integrated processing such as dynamic fragmentation, dual verification, standardization and feature extraction through multi-module collaboration, the limitations of existing technologies such as isolated modules, fixed fragmentation and single verification are overcome. The design aims to solve the problems of imbalance between transmission efficiency and security and poor data processing correlation in the one-way processing of large files and heterogeneous data in isolated environments, and adapt to the special constraints of isolated environments.
[0060] As an optional embodiment, the dynamic fragmentation unit includes a fragmentation calculation subunit and a retransmission probability prediction subunit. The fragmentation calculation subunit and the retransmission probability prediction subunit work together, and the retransmission probability prediction subunit calculates the fragment loss probability using a formula. : in This is a retransmission correction factor, dynamically adjusted based on historical retransmission behavior, reflecting the degree of influence of the network environment on retransmissions. This refers to the fragmentation transmission time parameter;
[0061] when At that time, the retransmission probability prediction subunit triggers the fragment calculation subunit according to... The fragment size is reduced proportionally, and the input adaptation module sends a "high-risk fragment" flag to the security isolation module. When the dual verification unit of the security isolation module performs behavioral entropy value detection on fragments containing this flag, it sets the detection threshold. Adjusted to version 3.0;
[0062] When the calculated behavioral entropy value exceeds the adjusted detection threshold, the dual verification unit determines that the segment has an abnormal risk, thereby triggering subsequent security policies.
[0063] The dynamic fragmentation unit generates the compression ratio of deep compression for each fragment. The dual verification unit of the security isolation module, using checksums and file fingerprint hashes, improves the compression ratio of deep compression. After the verification code and file fingerprint hash are verified, a "verification passed" signal is sent back to the input adaptation module. This signal is transmitted in one direction only and there is no data return.
[0064] It should be noted that linking fragment size adjustment with retransmission probability prediction and dynamic adjustment of verification threshold overcomes the shortcomings of existing technologies where fixed fragment size and fixed verification threshold cannot adapt to network fluctuations. The design aims to reduce the risk of fragment loss and enhance the strictness of verification, thereby improving the reliability of data transmission when the isolated network is unstable.
[0065] As an optional embodiment, the security isolation module includes a behavior entropy detection unit and an offline feature library update subunit. The behavior entropy detection unit works in conjunction with the offline feature library update subunit, and the behavior entropy detection unit calculates the file behavior entropy using the following formula. :
[0066] in The total number of events involved in the calculation of behavioral entropy. For the first The probability of each feature term occurring;
[0067] When detected 3 times consecutively Furthermore, when no matching feature library is found, the offline feature library update sub-unit extraction... The byte feature fragment is added to the local feature library, and the hash value of the feature fragment is sent to the preprocessing module. The preprocessing module automatically performs depth standardization on subsequent files containing the same hash value. The depth standardization includes multi-round image denoising and text granular refinement.
[0068] It should be noted that the deep cleaning and preprocessing of threat detection that achieves secure isolation solves the problems of isolated feature database updates and lagging handling of unknown threats in existing technologies. The design aims to improve the efficiency of handling unknown threats through "one-time detection, full-domain protection" when the threat database cannot be updated in real time in the isolated environment.
[0069] As an optional embodiment, the preprocessing module includes an improved median filtering unit, and the heterogeneous data processing module includes an improved... Feature extraction unit, the improved median filtering unit and the improved The feature extraction unit works in tandem, and the improved median filtering unit calculates the window size using the following formula. :
[0070]
[0071] in Image noise density; the image filtered by the improved median filter unit is transmitted to the improved... Feature extraction unit, improved Scale space of feature extraction unit Scope based on Dynamic adjustment The feature vector dimension is simplified by the following formula, where the feature vector is the improved... Feature extraction unit extracts Feature vector:
[0072] in For the simplified version Feature vector dimension This is the baseline scale parameter.
[0073] As an optional embodiment, the transmission rule base of the one-way transmission control module is linked with the quota dynamic adjustment mechanism of the storage module. When the utilization rate of a certain partition of the storage module is ≥80%, the storage module sends a "quota insufficient" signal to the one-way transmission control module. The transmission rule base prioritizes the transmission of corresponding data types using the following formula. Reduced by 30%
[0074] in, Prioritize data transmission. To indicate the urgency of the data, This represents the current used capacity of the storage module. The total capacity of the storage module, These are the characteristic values corresponding to the data type.
[0075] It should be noted that dynamically linking transmission control with storage quotas overcomes the problem that static quota management in existing technologies can easily lead to storage overflow or resource waste. The design aims to avoid data loss and achieve "priority storage of urgent data" in isolated environments with limited resources, thereby improving storage resource utilization.
[0076] As an optional embodiment, the improved autoencoder of the multimodal integration module is linked with the normalized unit of the preprocessing module to improve the loss function of the autoencoder. Calculated using the following formula:
[0077]
[0078] in , These are the feature vectors of the preprocessed text and image, respectively. These are the weighting coefficients of the absolute difference between the text feature vector and the image feature vector; These are the weighting coefficients of the complement term for the similarity between text and image features. , When the preprocessing module detects a data missing rate > 10%, the multimodal integration module will... The weight has been adjusted to 0.9.
[0079] It should be noted that dynamically adjusting the weights of the autoencoder loss function based on the data missing rate solves the problem of poor association performance in existing fixed-weight models when data is incomplete. The design aims to enhance the role of semantic similarity in cross-modal association and improve matching robustness in scenarios where data is easily missing in isolated environments.
[0080] As an optional embodiment, a monitoring module is also included. This monitoring module, together with the unidirectional transmission control module and the heterogeneous data processing module, forms a feedback closed loop. The monitoring module calculates the load index using a formula. :
[0081]
[0082] when At this time, the monitoring module reduces the transmission rate to the unidirectional transmission control module to the set value, and at the same time sends a resource reallocation signal to the heterogeneous data processing module. The heterogeneous data processing module increases the proportion of computing resources allocated to the large file processing unit to the set value.
[0083] The set value represents the range within which staff can implement the settings.
[0084] It should be noted that linking load monitoring with transmission rate and computing resource allocation breaks through the limitation of existing technologies where single monitoring alarms cannot dynamically adjust system load; the design aims to prevent system overload and crash in an isolated environment, prioritize core processing tasks, and improve system stability.
[0085] As an optional embodiment, the output module includes a semantic verification unit, which is linked to the association index of the multimodal integration module. The semantic verification unit calculates the confidence level using a formula based on an ontology library in the power field. :
[0086] in The confidence level output by the semantic verification unit. It is a matching degree function used to calculate the features to be verified. Domain rules The degree of matching;
[0087] when When the output module calls the association index of the multimodal integration module, it extracts high-confidence data with the same topic and association strength ≥ the association strength judgment threshold for auxiliary verification. If the auxiliary verification passes, it is judged as qualified.
[0088] It should be noted that by combining multimodal association indexes for auxiliary verification, the problem of output blocking caused by excessively strict single data verification in existing technologies is solved; the design aims to improve output efficiency while ensuring the compliance of data in the isolated environment.
[0089] As an optional embodiment, the file fingerprint hash is generated by the dynamic sharding unit based on the shard data, the shard sequence number, and the file's unique identifier. After the hash is calculated and generated, and the dual verification unit of the security isolation module verifies the hash, the blockchain log unit records the corresponding hash value, which serves as the index key value for the partitioned storage of the storage module.
[0090] It should be noted that by using file fingerprint hashes throughout the entire transmission, verification, and storage chain as index keys to achieve end-to-end association, the limitations of existing technologies where hashes are only used for verification at a single stage are overcome. The design aims to ensure the consistency of data transmission, verification, and storage in an isolated environment and improve the ability to detect data tampering.
[0091] As an optional embodiment, the type-adaptive compression algorithm of the storage module is linked with the standardization result of the preprocessing module: data determined by the preprocessing module to be text-intensive is processed using... Deep compression, specifically a compression ratio of ≥3:1; data classified as image-dense uses... Quantization compression, specifically the compression parameters for quantization compression. .
[0092] It should be noted that the compression algorithm is dynamically selected based on the actual data type after preprocessing, which overcomes the defect of fixed compression algorithms in existing technologies that are not suitable for heterogeneous data. The design aims to optimize compression efficiency and speed and save computing resources in isolated environments with limited computing resources.
[0093] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.
Claims
1. A one-way processing system for large files and heterogeneous data in isolated environments, characterized in that, It includes an input adaptation module and a security isolation module, which are connected in series. The input adaptation module is configured with a dynamic fragmentation unit, which calculates the fragment size using the following formula. : in, For file type coefficients, Original file size, in units of , For redundancy check coefficients, The unit is The security isolation module employs optical gate physical isolation and includes a one-way protocol stack and a dual verification unit. The one-way protocol stack only receives fragmented data output from the input adaptation module. The dual verification unit performs feature code matching and behavior entropy detection on the fragmented data. Fragments that fail the dual verification unit's verification are regenerated by the input adaptation module and transmitted to the security isolation module. It also includes a preprocessing module connected to the heterogeneous data processing module. The preprocessing module performs reconstruction and standardization on the fragmented data output from the security isolation module. The standardization includes improved median filtering for images and non-textual processing. Character filtering and structured data improvement Outlier removal and data processing by the preprocessing module are then transmitted to the heterogeneous data processing module. The heterogeneous data processing module includes a large file processing unit, a multimodal feature extraction unit, and a priority scheduling unit. The priority scheduling unit is shared by the large file processing unit and the multimodal feature extraction unit, and it calculates file priorities. ,according to Allocate computing resources to the large file processing unit and the multimodal feature extraction unit; The one-way transmission control module and the storage module are connected to form a write closed loop. The one-way transmission control module includes a transmission rule base and a blockchain log unit. The transmission rule base restricts data to be transmitted only from the heterogeneous data processing module to the storage module. The blockchain log unit records the data transmission trajectory. The storage module adopts a read-only partition architecture, only receives write requests from the one-way transmission control module, stores data in partitions according to data type, and enables a type-adaptive compression algorithm. The multimodal integration module is integrated into the storage module. Based on the standardization results of the preprocessing module and the feature extraction results of the heterogeneous data processing module, the multimodal integration module maps the features of text, images, and structured data to a unified space by improving the autoencoder, and establishes a cross-modal index. The output module is connected to the storage module via a read-only interface. The output module performs format, syntax, and semantic checks on the integrated data before outputting it. The transmission rule base of the unidirectional transmission control module is linked to the quota dynamic adjustment mechanism of the storage module. When the utilization rate of a certain partition of the storage module is ≥80%, the storage module sends a "quota insufficient" signal to the unidirectional transmission control module. The transmission rule base prioritizes the transmission of corresponding data types using the following formula. Reduced by 30% in, Prioritize data transmission. To indicate the urgency of the data, This represents the current used capacity of the storage module. The total capacity of the storage module, These are the characteristic values corresponding to the data type.
2. The system for one-way processing of large files and heterogeneous data in isolated environments according to claim 1, characterized in that: The dynamic fragmentation unit includes a fragmentation calculation subunit and a retransmission probability prediction subunit. The fragmentation calculation subunit and the retransmission probability prediction subunit work together, and the retransmission probability prediction subunit calculates the fragment loss probability using a formula. : in, This is a retransmission correction factor, dynamically adjusted based on historical retransmission behavior, reflecting the degree of influence of the network environment on retransmissions. For fragmented transmission time parameters, For isolated unidirectional channel bandwidth; when At that time, the retransmission probability prediction subunit triggers the fragment calculation subunit according to... The fragment size is reduced proportionally, and the input adaptation module sends a "high-risk fragment" flag to the security isolation module. When the dual verification unit of the security isolation module performs behavioral entropy value detection on fragments containing this flag, it sets the detection threshold. Adjusted to version 3.0; When the calculated behavioral entropy value exceeds the adjusted detection threshold, the dual verification unit determines that the segment has an abnormal risk, thereby triggering subsequent security policies. The dynamic fragmentation unit generates the compression ratio of deep compression for each fragment. The dual verification unit of the security isolation module, using checksums and file fingerprint hashes, improves the compression ratio of deep compression. After the verification code and file fingerprint hash are verified, a "verification passed" signal is sent back to the input adaptation module. This signal is transmitted in one direction only and there is no data return.
3. The system for one-way processing of large files and heterogeneous data in isolated environments according to claim 2, characterized in that: The security isolation module includes a behavior entropy detection unit and an offline feature library update subunit. The behavior entropy detection unit works in conjunction with the offline feature library update subunit. The behavior entropy detection unit calculates the file behavior entropy using the following formula. : in The total number of events involved in the calculation of behavioral entropy. For the first The probability of each feature term occurring; When detected 3 times consecutively When no matching feature library is found, the offline feature library update subunit extracts a 512-byte feature fragment to supplement the local feature library. At the same time, the hash value of the feature fragment is sent to the preprocessing module. The preprocessing module automatically performs depth standardization on subsequent files containing the same hash value. The depth standardization includes multi-round image denoising and text granular refinement.
4. The system for one-way processing of large files and heterogeneous data in isolated environments according to claim 3, characterized in that: The preprocessing module includes an improved median filtering unit, and the heterogeneous data processing module includes an improved... Feature extraction unit, the improved median filtering unit and the improved The feature extraction unit works in tandem, and the improved median filtering unit calculates the window size using the following formula. : in Image noise density; The image filtered by the improved median filter unit is transmitted to the improved Feature extraction unit, improved Scale space of feature extraction unit Scope based on Dynamic adjustment The feature vector dimension is simplified by the following formula, where the feature vector is the improved... Feature extraction unit extracts Feature vector: in For the simplified version Feature vector dimension This is the baseline scale parameter.
5. A one-way processing system for large files and heterogeneous data in isolated environments according to claim 1, characterized in that: The improved autoencoder of the multimodal integration module is linked with the standardized unit of the preprocessing module to improve the loss function of the autoencoder. Calculated using the following formula: in This is the preprocessed text feature vector. The preprocessed image feature vector. These are the weighting coefficients of the absolute difference term between the text feature vector and the image feature vector. The weighting coefficients are the complement terms for the similarity between text and image features.
6. The system for one-way processing of large files and heterogeneous data in isolated environments according to claim 1, characterized in that: It also includes a monitoring module, which forms a feedback loop with the unidirectional transmission control module and the heterogeneous data processing module. The monitoring module calculates the load index using the following formula. : ; when At this time, the monitoring module reduces the transmission rate to the unidirectional transmission control module to the set value, and at the same time sends a resource reallocation signal to the heterogeneous data processing module. The heterogeneous data processing module increases the proportion of computing resources allocated to the large file processing unit to the set value.
7. The system for one-way processing of large files and heterogeneous data in isolated environments according to claim 1, characterized in that: The output module includes a semantic verification unit, which is linked to the association index of the multimodal integration module. The semantic verification unit is based on the power domain ontology library and calculates the confidence level using formulas. : in The confidence level output by the semantic verification unit. It is a matching degree function used to calculate the features to be verified. Domain rules The degree of matching; when When the output module calls the association index of the multimodal integration module, it extracts high-confidence data with the same topic and association strength ≥ the association strength judgment threshold for auxiliary verification. If the auxiliary verification passes, it is judged as qualified.
8. A one-way processing system for large files and heterogeneous data in isolated environments according to claim 2, characterized in that: The file fingerprint hash is generated by the dynamic fragmentation unit based on fragment data, fragment sequence number, and file unique identifier. After the hash is calculated and generated, and the dual verification unit of the security isolation module verifies the hash, the blockchain log unit records the corresponding hash value, which serves as the index key value for the partitioned storage of the storage module.
9. A one-way processing system for large files and heterogeneous data in isolated environments according to claim 1, characterized in that: The storage module's type-adaptive compression algorithm is linked to the standardization results of the preprocessing module: data determined by the preprocessing module to be text-intensive is processed using... Deep compression, specifically a compression ratio of ≥3:1; data classified as image-dense uses... Quantization compression, specifically the compression parameters for quantization compression. .