Cross-border data trust evaluation method based on multi-modal features

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By generating semantic rhythm spectra and language distribution maps, adjusting the input rhythm, and balancing the semantic energy distribution, the problem of semantic imbalance in multilingual fusion modeling is solved, enabling accurate identification of high-risk texts and reliable assessment of cross-border data.

CN122197899APending Publication Date: 2026-06-12GUILIN UNIVERSITY OF TECHNOLOGY

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: GUILIN UNIVERSITY OF TECHNOLOGY
Filing Date: 2026-04-08
Publication Date: 2026-06-12

Application Information

Patent Timeline

08 Apr 2026

Application

12 Jun 2026

Publication

CN122197899A

IPC: G06F40/30; G06F40/263; G06F18/21; G06F18/22; G06F18/2415

AI Tagging

Application Domain

Semantic analysis

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

In existing technologies for multilingual fusion modeling, semantic mapping relationships are prone to dynamic imbalance, which can lead to the masking of high-risk statements and the erroneous amplification of low-risk content, resulting in errors in risk identification and compliance blockage.

⚗Method used

By generating a unified semantic rhythm spectrum, establishing a language distribution map, marking semantic energy peaks and gaps, adjusting the input rhythm using rhythm buffers, guiding high- and low-risk semantic segments into different channels, constructing a closed-loop control system, and balancing semantic energy distribution.

🎯Benefits of technology

It significantly improves the accuracy of high-risk text recognition in multilingual environments, reduces false alarm rates and processing delays, and builds the core capabilities of a cross-border data trust assessment platform.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122197899A_ABST

Patent Text Reader

Abstract

The application discloses a cross-border data credible evaluation method based on multi-modal features, relates to the technical field of data credible computing, and comprises the following steps: collecting cross-language text flow, extracting sentence rhythm signals and confidence trace data along a time axis, compressing language fluctuation features of different languages, generating a unified semantic rhythm spectrum, and establishing a basic rhythm line for subsequent semantic energy analysis. The application constructs a quantifiable risk scale through the semantic rhythm spectrum and the language distribution map, adjusts the semantic energy flow by combining rhythm buffering and semantic shunting, improves the multilingual high-risk identification accuracy and recall rate, reduces false positives and delays, realizes the stability and fairness of cross-language semantic processing, and constructs deployable cross-border data credible evaluation core capabilities.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data trust computing technology, and more specifically to a cross-border data trust assessment method based on multimodal features. Background Technology

[0002] Cross-border data trustworthiness assessment based on multimodal features refers to the process of intelligently and quantitatively assessing the reliability of data sources, the authenticity of content, the security of transmission, and the compliance and legality of data during cross-border data flows by comprehensively utilizing multi-source information such as text, images, voice, behavioral logs, and structured records. This method integrates multimodal feature analysis technologies such as natural language processing, computer vision, federated learning, blockchain traceability, and privacy computing to achieve full-chain perception and dynamic judgment of data flow behavior across languages, domains, and regulatory environments. Within the aforementioned project framework, this assessment mechanism is driven by intelligent agents and can automatically identify risks, generate credit models, and output trustworthiness scores at each stage of data collection, transmission, analysis, and application. This supports multinational corporations and financial institutions in conducting data business securely and compliantly under different regulatory systems, ultimately forming a locally deployable and offline-operable intelligent platform for cross-border data trustworthiness flow.

[0003] The existing technology has the following shortcomings: In existing technologies, multilingual fusion modeling typically relies on dynamic translation matrices to align the semantic spaces of different languages, achieving unified understanding and reliable assessment of cross-lingual data. However, when the model processes multilingual data with uneven language distribution, the contextual distribution of a few languages may create a non-linear resonance effect in the dynamic translation matrix, leading to a dynamic imbalance in semantic mapping relationships. Specifically, during the semantic vector generation process, some less common languages, due to sparse training samples or significant differences in semantic structure, may have their semantic features amplified or canceled out during the multilingual weight fusion stage, forming abnormal concentrations of semantic energy and causing directional shifts in cross-lingual semantic vectors. This shift directly affects the model's judgment of text risk levels, causing the semantic features of high-risk statements to be masked and not correctly identified, while low-risk content is incorrectly amplified due to semantic resonance, triggering erroneous risk warnings or compliance blocking.

[0004] The information disclosed in the background section is only intended to enhance the understanding of the background of this disclosure, and therefore may include information that does not constitute prior art known to those skilled in the art. Summary of the Invention

[0005] The purpose of this invention is to provide a cross-border data credibility assessment method based on multimodal features to solve the problems mentioned in the background art.

[0006] To achieve the above objectives, the present invention provides the following technical solution: a cross-border data credibility assessment method based on multimodal features, comprising the following steps: Collect cross-language text streams, extract sentence rhythm signals and confidence trace data along the time axis, compress the language fluctuation features of different languages, generate a unified semantic rhythm spectrum, and establish a basic rhythm line for subsequent semantic energy analysis; A language distribution map is constructed based on semantic rhythm spectrum. The semantic energy peaks and semantic energy gaps of each language are marked on the language distribution map. The semantic energy differences are converted into quantifiable risk scale lines to identify semantic imbalances caused by uneven language distribution. A rhythmic buffer zone is established at the cross-language text input end based on the risk scale line. By adjusting the input rhythm of semantic segments, high-intensity semantic segments are delayed in entering the processing channel, while low-intensity semantic segments are entered into the processing channel in advance, forming a continuous and controllable rhythmic rearrangement sequence. By using rhythmic rearrangement sequences to drive semantic diversion valves, high-risk semantic segments are guided into narrow channels, while low-risk semantic segments are guided into wide channels, forming stable semantic diversion trajectories, thereby reducing the risk of semantic energy accumulation and balancing multilingual semantic flows. A reverse energy-absorbing curtain is deployed along the semantic diversion trajectory. By absorbing the excessively amplified residual semantic energy and releasing the masked weak semantic fragments, the distribution of semantic energy is balanced, and a semantic scale with balanced semantic risk is output. Using a semantic scale as the core, a breathing-style gravity balance wheel is activated to dynamically adjust the channel width and flow rate of the semantic channel according to the real-time language density, and to perform rhythmic track switching and energy yielding, thereby completing the continuous suppression and dynamic regulation of multilingual semantic resonance and forming a closed-loop system of semantic energy balance.

[0007] Preferably, the semantic rhythm spectrum generation steps are as follows: Collect cross-lingual text information streams composed of natural language, preserve language attributes and temporal order, and form a continuous language stream sequence; Based on language structure rules, the text information flow is segmented to form a time-series sentence set with language attributes and time tags. Extract the rhythmic signal and confidence trace of each sentence segment to generate structured data containing rhythm intensity values and confidence values; All rhythm signals and confidence trace data are compressed under a unified metric standard to construct a semantic rhythm spectrum. Based on the semantic rhythm spectrum, the central change trend line is extracted to form the basic rhythm line.

[0008] Preferably, the steps for generating the risk scale line are as follows: The rhythm data of each language are classified, integrated, and uniformly projected onto a two-dimensional language distribution coordinate map to construct a time series semantic trajectory curve. Identify semantic energy peaks and semantic energy gaps in the language distribution coordinate map, and label rhythm intensity values, confidence trace values, and duration information; A risk scale line was constructed by combining language usage frequency, and compared with the basic rhythm line to obtain a graph of semantic energy deviation and multilingual risk. The semantic imbalance region is located by identifying the jump point of the risk scale curve, and the abnormal fluctuations caused by uneven language distribution are judged by combining the changes in the language ratio.

[0009] Preferably, the process of forming the rhythm rearrangement sequence is as follows: Extract rhythm intensity and confidence trace values from cross-linguistic semantic segments and classify them into high-intensity, medium-intensity, and low-intensity groups; For different intensity groups, separate input rhythm adjustment strategies are set up to delay the processing of high-intensity segments, advance the processing of low-intensity segments, and process medium-intensity segments in their original order. Construct a semantic rhythmic rearrangement sequence, control the output interval of each segment, and maintain the integrity of semantic content, language, structural features, and temporal information; A closed-loop evaluation is performed on the output effect of the rhythm rearrangement sequence, and the rhythm buffering strategy is dynamically optimized based on the semantic clustering frequency and risk scale response trend.

[0010] Preferably, the output interval of the rhythmic rearrangement sequence controls the processing rhythm of semantic segments by setting a minimum processing interval and a maximum processing interval, and maintains a stable input ratio for each language to prevent a single language from dominating the input stream.

[0011] Preferably, the semantic traffic routing trajectory formation process is as follows: Based on the rhythmic rearrangement sequence, the rhythmic intensity value and confidence trace value of the semantic segment are extracted, and the risk level is divided by combining the risk scale line; High-risk semantic segments are directed to narrow channels, low-risk semantic segments are directed to wide channels, and medium-risk semantic segments are dynamically scheduled according to the load of narrow channels. The semantic fragments with assigned channel labels are pushed to the corresponding processing paths in chronological order and scheduled sequentially according to language distribution; Construct a semantic triage trajectory record table to periodically store the channel processing rhythm mean, confidence value and latency data; Based on feedback from the semantic triage trajectory record table, the channel allocation strategy is dynamically adjusted to optimize the rhythm intensity division criteria and channel migration ratio.

[0012] Preferably, the semantic ruler output steps are as follows: Extract the processed semantic fragment set from the semantic diversion channel, and identify the semantic energy residual fragments and weak semantic fragments based on the historical fluctuation trend of rhythm values; By combining the time period density and risk distribution structure on the semantic diversion trajectory, energy absorption segments are constructed along the time axis, and energy absorption window is deployed on the semantic energy aggregation node; Within the energy-absorbing section, semantic energy suppression processing is applied to high-energy semantic segments. By extending the parsing time and inserting low-intensity semantic segments before and after, a rhythm buffer zone is formed. Weak semantic segments are released at opportune times within the rhythm buffer zone, and low-rhythm segments with semantic structure features are inserted into the gaps in the processing queue. Based on the rhythmic dynamic range after energy suppression and release of weak semantic fragments, a semantic rhythm spectrum and confidence density curve are constructed, and a semantic scale is generated.

[0013] Preferably, the following steps are taken to construct a closed-loop structure for semantic energy balance by using a semantic scale as the core, employing a breathing-style weight balance wheel, adjusting the width and flow rate of the semantic channel according to the real-time language density, performing rhythmic track switching and energy yielding, and building a semantic energy balance: Based on the semantic scale, the input density, rhythm intensity fluctuation range and confidence value change range of each language in the semantic channel are monitored, and the offset trend in the semantic structure is identified. Upon receiving a risk offset signal, the semantic channel bandwidth ratio is reconstructed in real time. High-frequency languages are configured to independent fine channels and the minimum interval time between semantic segments is set, while weak semantic languages are configured to wide channels and the processing bandwidth is increased. After completing the channel width reconstruction, semantic rhythm track switching and semantic energy yielding operations are performed to push the sudden rhythm segments into the buffer channel and output low-risk segments in advance to occupy the rhythm beat. The effects of channel structure adjustment are quantitatively evaluated, and the evaluation results are fed back to the semantic scale model to update the channel baseline.

[0014] The technical effects and advantages provided by the present invention in the above technical solution are as follows: This invention establishes an observable semantic energy feature foundation by introducing semantic rhythm spectrum and language distribution map, making the risk shift caused by uneven language distribution quantifiable. Furthermore, through innovative structural designs such as rhythm buffering, semantic diversion, reverse energy absorption, and a weighted balance wheel, semantic energy is coordinated and regulated in both the time and channel domains, forming an adaptive and self-balancing semantic flow mechanism. This significantly improves the accuracy and recall of identifying high-risk texts in multilingual environments, while reducing false positives and processing latency. This method not only enhances the ability to identify risky expressions in low-resource languages but also improves the fairness and stability of cross-language information processing. Ultimately, it constructs the core capabilities of a cross-border data trust assessment platform that can run offline, be deployed locally, and possess language energy governance capabilities. Attached Figure Description

[0015] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this invention. For those skilled in the art, other drawings can be obtained based on these drawings.

[0016] Figure 1 This is a flowchart of the cross-border data credibility assessment method based on multimodal features of the present invention. Detailed Implementation

[0017] Exemplary embodiments will now be described more fully with reference to the accompanying drawings. However, these exemplary embodiments can be implemented in many forms and should not be construed as limited to the examples set forth herein; rather, they are provided so that the description of this disclosure will be more complete and fully convey the concept of the exemplary embodiments to those skilled in the art.

[0018] This invention provides, for example Figure 1 The cross-border data trust assessment method based on multimodal features, as shown, includes the following steps: Collect cross-language text streams, extract sentence rhythm signals and confidence trace data along the time axis, compress the language fluctuation features of different languages, generate a unified semantic rhythm spectrum, and establish a basic rhythm line for subsequent semantic energy analysis; To achieve a unified measurement of semantic fluctuation features in cross-lingual text information, it is necessary to first collect cross-lingual text streams and extract rhythmic signals and confidence traces along the time axis to generate a unified semantic rhythm spectrum. Based on the rhythm spectrum, a basic rhythm line on which subsequent semantic energy analysis depends is established. The specific steps include: This process involves collecting continuous text streams of natural language from multiple known cross-border data sources, including English, Simplified Chinese, Traditional Chinese, Arabic, Spanish, Portuguese, French, German, Russian, and Korean. During collection, the original temporal order of each language data point is preserved, constructing a stable language stream sequence based on timestamps. The granularity of text stream collection is at the sentence / segment level, with each language segment's length controlled between 100 and 300 characters. For example, an international financial news subscription platform generates approximately 72,000 content update records in different languages per hour during peak hours. Each record is stored in the corresponding language's original text stream set after collection, ordered by its appearance time. No language conversion or merging is performed during collection to ensure each language segment retains its original expression form. HTML tags, special characters, garbled text, and non-natural language content, such as tables, code snippets, and mathematical formulas, are removed, preserving clean, complete, and structurally sound natural language segments.

[0019] The collected text stream is segmented chronologically. This process divides each original language segment into sentences based on target language characteristics. For example, for English, periods, question marks, and semicolons are used for segmentation, combined with subject-verb breakpoints; for Chinese, modal particles, commas, and the position of the main verb are used; and for Arabic and Portuguese, sentence segmentation is based on subject-prepositional patterns, interrogative suffixes, and phonetic extension structures. Segmentation for all languages must maintain semantic integrity, avoiding subject-verb-object breaks. An average sentence length control standard is used to decompose excessively long segments into two or three semantically coherent smaller segments, ultimately forming a continuous, chronologically ordered sequence of sentences. After segmentation, each segment will include its linguistic attributes, absolute timestamp, and sequence number from the original text stream.

[0020] Based on the segmented time-series sentence set, rhythmic signals and confidence traces are extracted from each sentence segment. The extraction of rhythmic signals is based on the syllable structure within the language, the intensity of grammatical jumps, and the degree of abrupt changes in contextual sentiment. Specifically, this involves identifying the continuous arrangement patterns of all verbs, conjunctions, and particles within the sentence segment and constructing its rhythmic unit spectrum. Taking the Chinese sentence segment "The current market trend is volatile, and investor sentiment is cautious" as an example, by identifying the strong and weak rhythmic arrangements of verbs and nouns such as "trend," "volatile," "sentiment," and "cautious," the rhythmic intensity value of this sentence segment is 7.2, with a standard deviation of 1.9 and 3 rhythmic jumps. Taking the English sentence segment "Markets remain volatile and investors stay cautious" as another example, through the combination of the verbs "remain" and "stay" with the adjectives "volatile" and "cautious," the rhythmic intensity value is 6.8, with 4 jumps and a rhythmic standard deviation of 2.1. Confidence traces are extracted by inversely calculating the frequency of occurrence of keywords in the sentence segment within a large corpus of a specific language. Terms that appear less frequently, such as data breach impact assessments or cross-border information compliance agreements, will be assigned higher confidence values, for example, reaching 0.92, while common expressions like "thank you for your participation" will only have a confidence value of 0.12. Finally, a rhythm signal vector and a confidence value vector are generated for each sentence segment, along with time and language labels, and passed as structured input data to subsequent steps.

[0021] Using the acquired rhythmic signals and confidence trace data for each sentence segment, feature compression and spectrogram construction were performed under a unified metric. The rhythm intensity unit, jump frequency, and confidence density for each language were standardized and mapped to a uniform numerical range. Sentence segments were grouped into sets of 1000, and their rhythmic and confidence sequences were weighted and merged to form a two-dimensional semantic spectrogram with the horizontal axis representing time progression and the vertical axis representing rhythmic intensity and confidence amplitude. In this spectrogram, high-frequency segments with rhythmic values exceeding 8.0 were highlighted, while segments with confidence values below 0.2 were treated with low brightness. This compression process ensured that the semantic fluctuation features of different languages under a unified standard could be dynamically presented over time, avoiding representational distortion caused by uneven language distribution. Subsequently, a center line representing the trend of language fluctuation intensity changes was extracted from the spectrogram, forming a basic rhythmic line representing the average fluctuation rhythm of all languages. This basic rhythmic line was smoothed using a moving average method to smooth all rhythmic extreme points in the semantic spectrogram, serving as a reference line for measuring subsequent semantic energy anomalies throughout the entire data stream evaluation process.

[0022] A language distribution map is constructed based on the semantic rhythm spectrum. The semantic energy peaks and semantic energy gaps of each language are marked on the language distribution map. The semantic energy differences are converted into quantifiable risk scale lines to identify semantic imbalances caused by uneven language distribution. Having obtained the semantic rhythm spectrum and basic rhythm line for each cross-linguistic sentence segment in the previous step, to further identify the semantic energy imbalance caused by language differences, it is necessary to construct a language distribution map, mark semantic energy extreme values, and quantify the risk scale. This includes the following steps: Based on the generated semantic rhythm spectrum, the rhythm data corresponding to each language are classified and integrated according to language category, and uniformly projected onto a two-dimensional language distribution coordinate map. A time axis is set on the horizontal axis, with seconds as the time unit, and each unit length corresponds to a sentence segment data at a given time point. The vertical axis is set as the comprehensive semantic rhythm intensity value, calculated by weighting and combining the rhythm signal value and confidence trace value to form a comprehensive expression intensity reflecting the semantic dynamics of the sentence segment. In this distribution map, languages are distinguished by color stratification; for example, red represents English, blue represents Simplified Chinese, green represents Spanish, orange represents Arabic, and purple represents Portuguese. Taking a real-world example, in a global financial news channel on June 12, 2025, a total of 4200 English and Chinese sentence segments, 2600 Spanish segments, and 1400 Arabic segments were extracted within one hour. Each sentence segment is mapped to a point, arranged sequentially, and connected to form a semantic trajectory curve with a time dimension. Through visualization, the distribution density, rhythmic fluctuation frequency, and confidence concentration of different languages on the timeline are clearly presented in the graph. This graph construction method differs from existing techniques that encode languages and then process them uniformly. This invention constructs a coexistence distribution map while maintaining language independence, thus more accurately reflecting the continuous changes in semantic rhythms over time and the differences in energy fluctuations between languages.

[0023] Semantic energy peaks and voids were identified and labeled on the language distribution map, representing high-energy clusters and semantically missing regions, respectively. Peaks were identified by locating all consecutive segments on each language's semantic curve where the rhythmic value within a 10-second interval was more than 20% higher than the previous average baseline rhythmic value, and the confidence trace value was higher than 0.75. For example, in the Arabic trajectory curve, the rhythmic intensity rose sharply from the average of 5.2 to 8.9 between seconds 840 and 858, with an average confidence trace of 0.83; this region was identified as a semantic energy peak area. Semantic energy voids were identified by locating regions where the rhythmic value was less than 30% below the average rhythmic line, the confidence trace was lower than 0.3, and the duration was at least 15 seconds. Taking the English trajectory as an example, the average rhythmic value between seconds 1260 and 1284 was only 2.1, far lower than the language average of 4.6, and the average confidence trace was 0.27; this interval was labeled as a void area. All identified peak and void regions will be marked on the language distribution map with different shapes: solid dots and hollow squares, respectively, along with rhythm intensity values, confidence trace values, and duration information, forming a complete set of semantic extreme value annotations. These extreme value annotations constitute the basic information source for subsequent risk scaling modeling.

[0024] By combining the identified semantic energy peaks and void regions with language usage frequency information, a risk scale expressing the degree of semantic imbalance is constructed. This scale operates on a one-minute analysis cycle, during which the total number of peaks, void regions, and their frequency for each language are calculated and normalized. It is then compared with the baseline rhythm line constructed in the previous stage to determine the degree of deviation of each language from the average rhythm line in the current cycle. For example, Simplified Chinese exhibits 4 peak regions and 2 void regions in the 42-minute cycle, with a positive semantic energy deviation of 11.2%; while Spanish exhibits 1 peak region and 4 void regions in the same period, with a negative deviation of 8.7%. The deviations of each language are combined chronologically to form a multilingual risk curve based on semantic energy fluctuations. The vertical axis of this curve represents the risk score, ranging from 0 to 1, and the horizontal axis represents time. A risk score greater than 0.75 indicates a significant semantic imbalance during that time period; scores between 0.45 and 0.75 represent a medium-risk area; and scores below 0.45 represent a stable area. This scale dynamically reflects the intensity and risk trend of semantic energy imbalance, constituting a basis for predicting language energy interference during multilingual flow. Compared to the fuzzy scoring method based on word matching used in existing technologies, this invention establishes a quantitative model by combining semantic rhythm energy with language temporal distribution, achieving higher accuracy and sensitivity.

[0025] This study uses a risk scale to locate semantically imbalanced areas in the language distribution map, identifying potential abnormal fluctuations caused by uneven language distribution. The location process involves two steps: first, tracking jumps in risk values within a unit of time based on abrupt changes in the scale curve. For example, a rapid jump from 0.41 to 0.79 at minute 54 indicates a surge in potential risk; second, retrospectively analyzing language distribution changes during this time period and statistically analyzing language proportion differences to confirm whether the change is due to a language mutation. Continuing with the above case, it was found that at minute 54, the proportion of Arabic input increased from an average of 3% to 12%, while the proportion of Simplified Chinese decreased to half its original level. In the corresponding language distribution map, the Arabic rhythmic trajectory showed dense peaks, while the Chinese rhythmic curve broke, forming a significant imbalance. These location results will guide the design of subsequent semantic buffering mechanisms and the dynamic adjustment of semantic flow channels, fundamentally mitigating multilingual semantic energy disturbances.

[0026] A rhythmic buffer zone is established at the cross-language text input end based on the risk scale line. By adjusting the input rhythm of semantic segments, high-intensity semantic segments are delayed in entering the processing channel, while low-intensity semantic segments are entered into the processing channel in advance, forming a continuous and controllable rhythmic rearrangement sequence. To effectively address the identified cross-lingual semantic energy aggregation phenomenon, a dynamic control mechanism needs to be introduced during the text input stage. This involves constructing a rhythmic buffer to temporally adjust the input semantic segments, thereby mitigating the risk of semantic shift and establishing a controllable semantic rearrangement sequence. Specifically, this includes the following steps: First, based on the constructed risk scale, all cross-lingual semantic segments entering the current time window are extracted in real time, and their rhythm intensity and confidence trace values are identified. Each semantic segment retains its original language identifier, semantic content, timestamp information, segment number, and the combined value of rhythm signal and confidence trace extracted in previous steps. Using a full minute as the unit, if a total of 3600 language segments are received within this time period, from English, Simplified Chinese, Spanish, and Arabic (1250 English, 1100 Simplified Chinese, 700 Spanish, and 550 Arabic), and the rhythm intensity is sorted, 412 segments have a rhythm intensity value greater than 8.0 and a confidence trace value greater than 0.85, while 638 segments have a rhythm intensity value less than 3.5 and a confidence trace value less than 0.4. All segments are divided into high-intensity, medium-intensity, and low-intensity groups based on their rhythm-confidence joint score, serving as the basis for subsequent rhythm adjustment.

[0027] Based on the segment intensity groups, input rhythm adjustment strategies are set for each group. For high-intensity semantic segments, to prevent concentrated input from causing semantic energy aggregation, a delayed advancement approach is adopted, i.e., retaining the entire content of the segment while adjusting its processing time order. Taking segments with a rhythm intensity greater than 9.0 as an example, the delay period is set to 3 seconds; segments with a rhythm intensity between 8.0 and 9.0 are delayed by 2 seconds. For example, the English sentence segment numbered EN_2035, with an original timestamp of 17:12:42, a rhythm intensity of 9.3, and a confidence trace of 0.88, has its processing time delayed to 17:12:45. Conversely, for low-intensity segments, such as the simplified Chinese sentence segment CN_1078 with a rhythm value of 2.8 and a confidence value of 0.22, with an original timestamp of 17:12:43, its processing time is advanced to 17:12:42. Medium-intensity segments are processed in their original order. This bidirectional adjustment method allows the semantic fragments that originally entered synchronously to be reordered according to their intensity levels, establishing a processing trajectory with fluidity differences.

[0028] After rhythmic buffering adjustments, a semantic rhythmic rearrangement sequence is constructed. This rearrangement sequence outputs processed segments sequentially in millisecond-level time order, requiring that the semantic content, language, structural features, and temporal information of each segment remain intact. Taking a ten-second window as an example, the rearrangement sequence processes a total of 600 semantic segments, prioritizing the output of 235 early segments, followed by 253 medium-intensity segments, and finally processing 112 delayed high-intensity segments. To ensure the continuity and rhythmic consistency of the output, the minimum processing interval between each segment is set to 120 milliseconds, and the maximum is no more than 400 milliseconds, ensuring that rhythmic breaks are not caused by the backlog of high-intensity segments. In this sequence, the proportion of each language remains relatively stable to prevent a single language from dominating the input stream. Through the continuously output rhythmic rearrangement sequence, the cross-lingual semantic flow exhibits a dynamic equilibrium trend.

[0029] A closed-loop evaluation is performed on the output process of the rhythm rearrangement sequence, and the rhythm buffering strategy is dynamically optimized accordingly. Evaluation criteria include semantic clustering frequency, energy surge rate, language distribution volatility, and risk scale response effect. For example, if the proportion of high-risk semantic segments delayed exceeds 35% within a continuous 60 seconds, but the risk scale value continues to rise, the delay amplitude needs to be reduced or the rhythm intensity classification threshold adjusted. Taking a real-world example, between the 25th and 26th minutes, 85 segments with Arabic rhythm intensity higher than 8.5 were delayed, representing 32% of the total, but the risk scale value still rose from 0.64 to 0.77, indicating insufficient buffering effectiveness. Appropriately relaxing the delay threshold or increasing the priority of low-intensity segments can make rhythm regulation more sensitive. The entire closed-loop evaluation process relies on the output effect of the rearranged sequence and the real-time risk fluctuation trend to achieve dynamic feedback adjustment, further improving the overall stability and controllability of cross-linguistic semantic flow.

[0030] By using rhythmic rearrangement sequences to drive semantic diversion valves, high-risk semantic segments are guided into narrow channels, while low-risk semantic segments are guided into wide channels, forming stable semantic diversion trajectories, thereby reducing the risk of semantic energy accumulation and balancing multilingual semantic flows. To alleviate processing congestion caused by high-energy semantic concentration and energy interference caused by language imbalance, a semantic diversion structure is introduced based on the rhythmic rearrangement sequence. This structure guides semantic segments of different risk levels to differentiated processing paths, forming a stable semantic diversion trajectory. The specific steps include: From the formed rhythmic rearrangement sequence, the rhythm intensity value and confidence trace value of each semantic segment are extracted sequentially, and their risk levels are classified based on the numerical range of the current risk scale line. Risk levels are divided into three categories: high risk, medium risk, and low risk. High risk is defined as a rhythm intensity value higher than 8.5 and a confidence trace value higher than 0.9; low risk is defined as a rhythm intensity value lower than 4.0 and a confidence trace value lower than 0.3; and the rest are classified as medium risk. In a cross-border data processing scenario on August 15, 2025, a total of 3900 semantic segments were received within a 45-46 minute period. Among them, 411 segments were high-risk, mainly concentrated in Arabic and English news descriptions; 1860 segments were medium-risk, with a mix of Simplified Chinese, Spanish, and English; and 1629 segments were low-risk, primarily consisting of Simplified Chinese customer service instructions and English activity invitations. Each semantic segment is assigned a risk level label after classification, which is used as a basis for subsequent channel allocation decisions.

[0031] Based on the identified risk levels, processing paths are assigned to the three categories of semantic segments. High-risk segments are assigned to the fine channel, which is designed for longer processing cycles and higher semantic recognition accuracy, such as a processing capacity limited to 110 segments per second. This channel is responsible for contextual logic reasoning, implicit reference parsing, and multilingual word order reconstruction. Low-risk segments are assigned to the wide channel, defined as a faster processing speed and shallower structural processing path, with a processing capacity of up to 320 segments per second. This channel primarily performs lexical-level mapping, basic content extraction, and format rule matching. Medium-risk segments are dynamically scheduled: when the fine channel load is below 80% capacity, medium-risk segments are assigned to the fine channel based on their rhythm intensity, while the rest are pushed to the wide channel. For example, the English sentence segment "military action may escalate in disputed territory" with the number EN_5097 has a rhythm intensity value of 9.4 and a confidence trace value of 0.91, and is therefore identified as a high-risk segment, guiding it to the fine channel. By establishing clear risk diversion paths, high-sensitivity semantics and low-sensitivity semantics can be effectively separated in the physical execution path, thus avoiding the superposition of linguistic energy.

[0032] The semantic segments with assigned channel labels are advanced to the actual processing path, ensuring a balanced language distribution within each channel to avoid backlogs of a single language or channel congestion. During execution, all segments enter the corresponding channel processing queue sequentially according to their timestamps, maintaining the original rhythmic characteristics of the rhythmic rearrangement sequence. For example, in the 47-minute cycle, the wide channel processes a total of 2480 segments, with Simplified Chinese accounting for 52%, Spanish for 28%, and English for 20%; the narrow channel processes a total of 895 segments, with English accounting for 44%, Arabic for 36%, and Simplified Chinese for 20%. The processing interval is set at an average of 300 milliseconds per segment, with a maximum of 600 milliseconds. If 10 segments of a certain language are input consecutively in a channel, the next segment of the same language will be temporarily held in the channel's front-end buffer, waiting for segments of other languages to be processed before entering, thus achieving language-order scheduling. During content processing, the narrow channel performs deep recognition of the syntactic dependency tree and performs multilingual sentiment tagging, while the wide channel only extracts keyword entities, content categories, and topic tags. This differentiated path processing strategy diverts semantic fragments according to risk structure, which not only improves processing efficiency but also avoids the accumulation of high-energy semantics in a single channel, thus preventing explosive energy aggregation.

[0033] During channel execution, a semantic traffic diversion trajectory record table is constructed to continuously monitor the stability of the processing path and the balance of semantic traffic. This record table stores the following data according to the processing cycle: the total number of segments accessed per minute in each channel, the proportion of each language segment, the average rhythm value of the channel, the proportion of risk levels, and the average processing latency. Taking the 49th minute as an example, the average processing rhythm of the fine channel is 8.7, the average confidence level of the segment is 0.89, and the average channel latency is 370 milliseconds; the average rhythm of the wide channel is 3.2, the average confidence level is 0.28, and the processing latency is 210 milliseconds. Multi-cycle continuous analysis revealed a high positive correlation between the concentration of processed content in the fine channel and the current risk scale value, indicating that the diversion strategy can effectively isolate high-risk semantics. Further visualization of the semantic energy flow trajectory curve shows that the distribution of different language segments within each channel exhibits a stabilizing trend according to their rhythmic direction. Unlike existing technologies that indiscriminately process all language segments into a unified processing path, this step explicitly introduces semantic risk structures into the path management mechanism, enabling semantic structures to be diverted according to risk, thereby improving the stability of semantic flow and the controllability of energy distribution.

[0034] Based on feedback from the traffic distribution trajectory table, the channel allocation strategy is adjusted. After each processing cycle, the current traffic distribution strategy is evaluated to determine whether it meets risk response requirements and channel load balancing standards. If the average load of the fine channel exceeds 95% for two consecutive cycles, the high-risk threshold should be immediately increased, for example, raising the rhythm intensity limit from 8.5 to 9.0, while simultaneously increasing the proportion of medium-risk segments migrating to the wide channel. If the wide channel idle time exceeds 120 seconds cumulatively, and low-risk segments are not fully loaded, the low-risk classification standard should be appropriately relaxed, for example, allowing edge segments with a rhythm intensity of 4.2 and a confidence trace of 0.36 to enter the wide channel. In the 50-minute processing, after implementing the above strategy, the processing efficiency of fine channel segments improved by 12%, the semantic backlog phenomenon was significantly alleviated, and the risk scale curve steadily decreased to 0.61, indicating that the strategy adjustment was effective. Through this traffic distribution mechanism based on trajectory feedback, the semantic flow path possesses adaptability, self-adjustment capability, and load balancing capability, forming a highly fault-tolerant and highly stable semantic input structure in a multilingual high-frequency data processing environment.

[0035] A reverse energy-absorbing curtain is deployed along the semantic diversion trajectory. By absorbing the excessively amplified residual semantic energy and releasing the masked weak semantic fragments, the distribution of semantic energy is balanced, and a semantic scale with balanced semantic risk is output. To alleviate the local polarization phenomenon caused by semantic energy accumulation within the channel and improve the balance and controllability of semantic output, this step involves deploying a reverse energy-absorbing curtain along the constructed semantic splitting trajectory. By absorbing high-energy residual fragments and releasing weak semantic fragments at the edges, a stable semantic scale is output. The process consists of the following steps: Based on the defined semantic triage channels, a set of semantic segments that have completed preliminary processing but have not yet entered the semantic fusion stage is extracted from each semantic path, and the residual semantic energy within these segments is identified. Specifically, all segments processed in the previous cycle are extracted from the narrow and wide channels, and the excessive fluctuation range is identified by comparing the historical fluctuation trend of the rhythm values. For example, in the 58-minute cycle, the narrow channel outputs a total of 987 segments, of which 184 still have high rhythm values after processing. Specifically, the rhythm value before processing is 8.7, and it still rises to 9.3 after processing, with the confidence trace increasing from 0.86 to 0.91, indicating energy superposition. At the same time, there are 312 semantic segments in the wide channel with rhythm values below 3.0 and confidence traces below 0.28 that are not included in the analysis layer and are in a semantic marginal state. These segments are marked as absorption and release objects, and are numbered and archived for entry into the energy absorption management process.

[0036] Combining the time-segment density and risk distribution structure of each language on the semantic traffic trajectory, energy-absorbing segments are constructed along the time axis, and energy-absorbing window windows are deployed at key energy concentration nodes. The energy-absorbing segments are established by statistically analyzing the average rhythm and variation amplitude of all semantic segments every ten seconds. If the rhythm fluctuation amplitude is higher than 5.0 and the confidence rise rate exceeds 0.07, it is identified as a candidate energy-absorbing segment. Taking the period from 58 minutes to 59 minutes as an example, in the English trajectory, the rhythm jumps continuously from 58 minutes 26 seconds to 58 minutes 38 seconds, the confidence of Arabic segments increases significantly, and the language distribution ratio also shows a shift towards high-risk languages, indicating that this period is a typical semantic energy accumulation area. Energy-absorbing curtains are deployed within the corresponding time period to reduce high-energy residual segments and replenish missed weak semantic segments.

[0037] Within the deployed energy-absorbing sections, semantic energy suppression is applied to identified high-energy segments. The energy suppression method involves extending the parsing time of the semantic segment within the processing path and inserting low-intensity semantic segments before and after it, constructing a rhythm buffer zone to achieve rhythm smoothing. Taking the English segment EN_2749 as an example, its rhythm value is 9.5 and confidence value is 0.93. After entering the energy-absorbing curtain, it does not directly advance to the next stage. Instead, the simplified Chinese segment ZH_2187 ("Your selected time period is invalid") is added before it, with a rhythm value of 2.6 and confidence value of 0.19. Following it is the Spanish segment ES_1543 ("No se encontraron datos coincidentes"), with a rhythm value of 3.1 and confidence value of 0.23. These three segments form a rhythm-neutralizing section surrounding the high-energy segment. This structure ensures that high-energy content is rhythmically surrounded and energy buffered at the semantic output layer, effectively preventing it from having a semantic dominance effect or energy surge effect on the entire output channel.

[0038] During the semantic energy buffering process, weak semantic segments that are not parsed but possess semantic structural features in the wide channel are released opportunistically to fill the expression gaps in the semantic channel. The release rule is as follows: identify statements in low-rhythm segments that possess command expressions, negation logic, compliant vocabulary, or edge expressions, and arrange them to be inserted into the gap position after the high-rhythm segment buffer. For example, the simplified Chinese segment numbered ZH_2791 failed to verify identity information, with a rhythm value of 2.1 and a confidence value of 0.17. It is inserted into the processing queue in the time gap that appears after the high-rhythm segment of Arabic is suppressed, and its content is released before the semantic scale is generated. This insertion does not change the output rhythm structure, but enhances semantic diversity, so that the final semantic measurement space contains more low-intensity expressions that might otherwise be ignored. In language fusion-type risk identification scenarios, this appropriate release of weak semantic segments can improve the model's sensitivity to ambiguity, avoidance, and negative expressions.

[0039] After suppressing high-energy segments and releasing weak semantic segments, the semantic rhythm spectrum and confidence density curve for that time period are reconstructed, and a semantic scale is generated based on the adjusted rhythm dynamic range. This semantic scale is a joint curve reflecting the balance of language proportions, the stability of rhythm changes, and the rationality of confidence fluctuations. Taking the final semantic scale at minute 59 as an example, after the reverse energy absorption curtain effect, the rhythm mean decreased from 6.3 to 5.4, the rhythm variance decreased from 1.8 to 1.1, the confidence density distribution tended to be normal, the language proportions approached one-third each, and there were no more excessive upward or downward spikes in the curve. The final generated semantic scale will serve as the benchmark reference line for subsequent dynamic control mechanisms, running through the entire process of rhythm track switching, channel reallocation, and language priority switching.

[0040] Using a semantic scale as the core, a breathing-style gravity balance wheel is activated to dynamically adjust the channel width and flow rate of the semantic channel according to the real-time language density, and to perform rhythmic track switching and energy yielding, thereby completing the continuous suppression and dynamic regulation of multilingual semantic resonance and forming a closed-loop system of semantic energy balance. To ensure rhythmic balance and energy stability during the dynamic operation of multilingual semantic flow, a breathing-style weight distribution structure, driven by a semantic scale, is introduced, building upon risk scaling control and energy distribution based on a semantic scale. This structure dynamically adjusts the channel width and flow rate of the semantic channel according to real-time language density, thereby achieving rhythmic shifting and energy allocation, and constructing a closed-loop semantic energy balance system. Specifically, this includes the following steps: Based on the output semantic scale, the input density, rhythm intensity fluctuation range, and confidence value variation of each language in the semantic channel are monitored within the current period. These real-time parameters are then compared one by one with the ideal rhythm reference range for each language recorded in the semantic scale to identify the offset trend in the semantic structure. Taking the 63rd minute as an example, the fine channel processes a total of 824 semantic segments, of which Arabic segments account for 41%, with their rhythm values concentrated between 8.4 and 9.2 and confidence value fluctuations ranging from 0.08 to 0.12, far exceeding the upper limit of the ideal Arabic rhythm range of 7.5 set by the semantic scale, exhibiting a typical high-frequency polarization state. At the same time, the proportion of English segments in the wide channel drops to 20%, while their average rhythm value is only 3.1, indicating that weak semantic segments are over-compressed. This abnormal difference will be marked as a risk offset signal and used as a trigger for subsequent structural adjustments.

[0041] Upon receiving the offset signal, the first stage of the breathing-type specific gravity balance structure is immediately triggered, performing an instantaneous reconstruction of the semantic channel bandwidth ratio. The core of the bandwidth ratio adjustment lies in reclassifying the number of semantic segments allowed to pass through the semantic channel per unit time according to risk density. In this operation, Arabic segments, due to their excessively high rhythm, are reconfigured to an independent narrow channel, with their processing bandwidth reduced from the original 130 segments per second to 100 segments per second. The minimum interval between semantic segments is set to 200 milliseconds to avoid semantic accumulation. Conversely, Simplified Chinese and English segments, due to their stable rhythm and low confidence, are configured to a newly established wide channel, with their processing bandwidth increased to 340 segments per second. The semantic interval threshold restriction is removed to accelerate the flow of weak semantic segments. In actual processing, within the last 10 seconds of the 63rd minute, Arabic semantic segments experienced a reduction of 9 high-rhythm bursts, while the average output rate of English segments increased by 18.7%, indicating that the structural adjustment achieved its buffering purpose. The breathing-like regulation process allows semantic channels to automatically transfer load according to the expansion and contraction of risk rhythms, thereby reducing channel blockage and semantic imbalance caused by the accumulation of linguistic energy.

[0042] After completing the channel width reconstruction, the second stage of the breathing-style specific gravity balance wheel is entered, focusing on semantic rhythm track switching and semantic energy yielding operations. Rhythm track switching refers to removing semantic segments with sudden increases or decreases in rhythm from the current main channel path and reordering them in buffer channels or bypass tracks to interrupt the high-frequency resonance structure. For example, the Arabic sentence segment numbered AR_3261, with a rhythm value of 9.5 and a confidence value of 0.96 at 63 minutes and 40 seconds, is in a state of significant overload and is temporarily transferred to an additional track. After a 2-second delay, it is interleaved with two English sentence segments with rhythm values of 3.2 and 3.5, forming a low-high-low rhythmic broken line structure, significantly reducing semantic impact. Semantic energy yielding refers to prioritizing low-risk segments in the output sequence before structural impact occurs, in order to pre-occupy rhythmic beats. For example, the Chinese segment numbered ZH_3107 has a rhythm value of 2.7. Originally located at the end of the main path, it enters the main output stream earlier when high-risk paths are buffered, forming a buffered segment with semantic rhythm overload, thus playing a harmonizing role structurally. This mechanism makes the semantic content more evenly distributed in the output rhythm, avoiding the phenomenon of unbalanced semantic mapping direction caused by the sudden entry of high-rhythm language.

[0043] After completing the rhythm shift and energy yielding operations, the system enters the closed-loop control phase. The effects of the structural adjustments are quantified in real time, and the evaluation results are fed back to the semantic scale model to update the channel baseline. The evaluation includes the output proportion of all languages within the current cycle, the standard deviation of rhythm stability, the rate of change of confidence values, the channel switching frequency, and the continuity of the rhythm trajectory. Taking the evaluation results at the 64th minute as an example, the average Arabic rhythm decreased from 9.0 to 8.1, the confidence value fluctuation converged from ±0.09 to ±0.05, the semantic channel structure completed three wide-narrow switches within 32 seconds, the language distribution balance increased to 92%, and the overall output rhythm curve tended to be linear without significant jumps. These evaluation results are simultaneously updated to the semantic scale and used to guide channel priority configuration and rhythm queue sorting in the next cycle. Compared to the shortcomings of existing language recognition technologies that lack emergency scheduling mechanisms for semantic overload, this implementation step, through dynamic structural reorganization and real-time rhythm adjustment, not only ensures the stability of semantic output, but also forms a self-correcting and self-adapting closed-loop control system, enabling the semantic model to cope with sudden energy fluctuations and language structure tilts in high-density multilingual semantic fusion scenarios.

[0044] This invention establishes an observable semantic energy feature foundation by introducing semantic rhythm spectrum and language distribution map, making the risk shift caused by uneven language distribution quantifiable. Furthermore, through innovative structural designs such as rhythm buffering, semantic diversion, reverse energy absorption, and a weighted balance wheel, semantic energy is coordinated and regulated in both the time and channel domains, forming an adaptive and self-balancing semantic flow mechanism. This significantly improves the accuracy and recall of identifying high-risk texts in multilingual environments, while reducing false positives and processing latency. This method not only enhances the ability to identify risky expressions in low-resource languages but also improves the fairness and stability of cross-language information processing. Ultimately, it constructs the core capabilities of a cross-border data trust assessment platform that can run offline, be deployed locally, and possess language energy governance capabilities.

[0045] The foregoing has only described certain exemplary embodiments of the present invention by way of illustration. Undoubtedly, those skilled in the art can modify the described embodiments in various ways without departing from the spirit and scope of the present invention. Therefore, the foregoing drawings and descriptions are illustrative in nature and should not be construed as limiting the scope of protection of the claims of the present invention.

Claims

1. A cross-border data credibility assessment method based on multimodal features, characterized in that, Includes the following steps: Collect cross-language text streams, extract sentence rhythm signals and confidence trace data along the time axis, compress the language fluctuation features of different languages, generate a unified semantic rhythm spectrum, and establish a basic rhythm line for subsequent semantic energy analysis; A language distribution map is constructed based on semantic rhythm spectrum. The semantic energy peaks and semantic energy gaps of each language are marked on the language distribution map. The semantic energy differences are converted into quantifiable risk scale lines to identify semantic imbalances caused by uneven language distribution. A rhythmic buffer zone is established at the cross-language text input end based on the risk scale line. By adjusting the input rhythm of semantic segments, high-intensity semantic segments are delayed in entering the processing channel, while low-intensity semantic segments are entered into the processing channel in advance, forming a continuous and controllable rhythmic rearrangement sequence. By using rhythmic rearrangement sequences to drive semantic diversion valves, high-risk semantic segments are guided into narrow channels, while low-risk semantic segments are guided into wide channels, forming stable semantic diversion trajectories, thereby reducing the risk of semantic energy accumulation and balancing multilingual semantic flows. A reverse energy-absorbing curtain is deployed along the semantic diversion trajectory. By absorbing the excessively amplified residual semantic energy and releasing the masked weak semantic fragments, the distribution of semantic energy is balanced, and a semantic scale with balanced semantic risk is output. Using a semantic scale as the core, a breathing-style gravity balance wheel is activated to dynamically adjust the channel width and flow rate of the semantic channel according to the real-time language density, and to perform rhythmic track switching and energy yielding, thereby completing the continuous suppression and dynamic regulation of multilingual semantic resonance and forming a closed-loop system of semantic energy balance.

2. The cross-border data credibility assessment method based on multimodal features according to claim 1, characterized in that, The steps for generating a semantic rhythm spectrum are as follows: Collect cross-lingual text information streams composed of natural language, preserve language attributes and temporal order, and form a continuous language stream sequence; Based on language structure rules, the text information flow is segmented to form a time-series sentence set with language attributes and time tags. Extract the rhythmic signals and confidence traces of each sentence segment to generate structured data; All rhythm signals and confidence trace data are compressed under a unified metric standard to construct a semantic rhythm spectrum. Based on the semantic rhythm spectrum, the central change trend line is extracted to form the basic rhythm line.

3. The cross-border data credibility assessment method based on multimodal features according to claim 2, characterized in that, The steps for generating risk scale lines are as follows: The rhythm data of each language are classified, integrated, and uniformly projected onto a two-dimensional language distribution coordinate map to construct a time series semantic trajectory curve. Identify semantic energy peaks and semantic energy gaps in the language distribution coordinate map, and label rhythm intensity values, confidence trace values, and duration information; A risk scale line was constructed by combining language usage frequency, and compared with the basic rhythm line to obtain a graph of semantic energy deviation and multilingual risk. The semantic imbalance region is located by identifying the jump point of the risk scale curve, and the abnormal fluctuations caused by uneven language distribution are judged by combining the changes in the language ratio.

4. The cross-border data credibility assessment method based on multimodal features according to claim 3, characterized in that, The process of forming rhythm rearrangement sequences is as follows: Extract rhythm intensity and confidence trace values from cross-linguistic semantic segments and classify them into high-intensity, medium-intensity, and low-intensity groups; For different intensity groups, separate input rhythm adjustment strategies are set up to delay the processing of high-intensity segments, advance the processing of low-intensity segments, and process medium-intensity segments in their original order. Construct a semantic rhythmic rearrangement sequence, control the output interval of each segment, and maintain the integrity of semantic content, language, structural features, and temporal information; A closed-loop evaluation is performed on the output effect of the rhythm rearrangement sequence, and the rhythm buffering strategy is dynamically optimized based on the semantic clustering frequency and risk scale response trend.

5. The cross-border data credibility assessment method based on multimodal features according to claim 4, characterized in that, The output interval of the rhythmic rearrangement sequence controls the processing rhythm of semantic segments by setting the minimum and maximum processing intervals, and maintains a stable input ratio for each language to prevent a single language from dominating the input stream.

6. The cross-border data credibility assessment method based on multimodal features according to claim 4, characterized in that, The semantic traffic splitting trajectory formation process is as follows: Based on the rhythmic rearrangement sequence, the rhythmic intensity value and confidence trace value of the semantic segment are extracted, and the risk level is divided by combining the risk scale line; High-risk semantic segments are directed to narrow channels, low-risk semantic segments are directed to wide channels, and medium-risk semantic segments are dynamically scheduled according to the load of narrow channels. The semantic fragments with assigned channel labels are pushed to the corresponding processing paths in chronological order and scheduled sequentially according to language distribution; Construct a semantic triage trajectory record table to periodically store the channel processing rhythm mean, confidence value and latency data; Based on feedback from the semantic triage trajectory record table, the channel allocation strategy is dynamically adjusted to optimize the rhythm intensity division criteria and channel migration ratio.

7. The cross-border data credibility assessment method based on multimodal features according to claim 6, characterized in that, The steps for outputting the semantic ruler are as follows: Extract the processed semantic fragment set from the semantic diversion channel, and identify the semantic energy residual fragments and weak semantic fragments based on the historical fluctuation trend of rhythm values; By combining the time period density and risk distribution structure on the semantic diversion trajectory, energy absorption segments are constructed along the time axis, and energy absorption window is deployed on the semantic energy aggregation node; Within the energy-absorbing section, semantic energy suppression processing is applied to high-energy semantic segments. By extending the parsing time and inserting low-intensity semantic segments before and after, a rhythm buffer zone is formed. Weak semantic segments are released at opportune times within the rhythm buffer zone, and low-rhythm segments with semantic structure features are inserted into the gaps in the processing queue. Based on the rhythmic dynamic range after energy suppression and release of weak semantic fragments, a semantic rhythm spectrum and confidence density curve are constructed, and a semantic scale is generated.

8. The cross-border data credibility assessment method based on multimodal features according to claim 7, characterized in that, Using a semantic scale as the core, a breathing-style gravity balance wheel is activated to adjust the width and flow rate of the semantic channel according to the real-time language density, and to perform rhythmic track switching and energy yielding to construct a closed-loop structure for semantic energy balance. The steps are as follows: Based on the semantic scale, the input density, rhythm intensity fluctuation range and confidence value change range of each language in the semantic channel are monitored, and the offset trend in the semantic structure is identified. Upon receiving a risk offset signal, the semantic channel bandwidth ratio is reconstructed in real time. High-frequency languages are configured to independent fine channels and the minimum interval time between semantic segments is set, while weak semantic languages are configured to wide channels and the processing bandwidth is increased. After completing the channel width reconstruction, semantic rhythm track switching and semantic energy yielding operations are performed to push the sudden rhythm segments into the buffer channel and output low-risk segments in advance to occupy the rhythm beat. The effects of channel structure adjustment are quantitatively evaluated, and the evaluation results are fed back to the semantic scale model to update the channel baseline.