Mongolian machine translation and approximate search integrated platform based on big data features

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By leveraging multi-granularity feature extraction and real-time interactive processing through an integrated platform, the problem of separation between Mongolian translation and retrieval has been solved, achieving deep semantic collaboration, improving the accuracy and relevance of Mongolian information processing, and enhancing user satisfaction and information acquisition efficiency.

CN122242536APending Publication Date: 2026-06-19INNER MONGOLIA MENKSOFT SOFTWARE

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: INNER MONGOLIA MENKSOFT SOFTWARE
Filing Date: 2026-03-24
Publication Date: 2026-06-19

Application Information

Patent Timeline

24 Mar 2026

Application

19 Jun 2026

Publication

CN122242536A

IPC: G06F40/58; G06F16/334; G06F16/3329; G06F18/213; G06F18/25; G06F18/22; G06F40/30; G06F40/284; G06N3/0985; G06N7/01; G06N3/0464; G06N3/045; G06N5/022; G06N5/04

AI Tagging

Application Domain

Natural language translation Mathematical models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing Mongolian information processing systems suffer from a disconnect between translation and retrieval, a lack of deep semantic collaboration, and insufficient adaptation to the characteristics of the Mongolian language, resulting in inadequate translation accuracy and relevance of retrieval results.

Method used

An integrated platform for Mongolian machine translation and approximation retrieval based on big data features is adopted. Parallel analysis is performed through a multi-granularity feature extraction module for Mongolian, combined with an end-to-end integrated processing architecture and a two-way feedback learning module for retrieval and translation, to achieve real-time interaction and dynamic adjustment between translation and retrieval, and output comprehensive translation and retrieval results.

Benefits of technology

It improved the accuracy of Mongolian translation and the relevance of search results, enhanced the overall effectiveness of translation quality and search results, strengthened the adaptability to the characteristics of the Mongolian language, and improved user satisfaction and information acquisition efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122242536A_ABST

Patent Text Reader

Abstract

This invention relates to the field of Mongolian language retrieval platform technology, and discloses an integrated platform for Mongolian machine translation and approximation retrieval based on big data features. This platform innovatively transforms machine translation and approximation retrieval from a sequential process into a parallel collaborative system. Its semantic hub, based on cross-task collaborative optimization rules, introduces keywords returned in real-time during translation and decoding to guide attention, and utilizes intermediate translation states to generate enhanced retrieval query vectors, achieving dynamic bidirectional adjustment of translation generation strategies and retrieval target preferences. Furthermore, by introducing translation-retrieval consistency adversarial loss for joint training, the model is driven to learn and produce translations that are both fluent and easy to retrieve from the optimization objective. This deep collaborative mechanism effectively eliminates semantic gaps and information loss between modules, achieving the technical effect of simultaneously improving translation quality and the relevance of retrieval results.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of Mongolian language retrieval platform technology, and in particular to an integrated platform for Mongolian machine translation and approximation retrieval based on big data features. Background Technology

[0002] With the acceleration of digitalization and the increasing demand for multilingual information interaction, Mongolian, as one of my country's important minority languages, is facing increasingly prominent and refined information technology applications in areas such as government services, culture and education, healthcare, and news media. For example, in government services, there is a need to accurately translate policies and regulations into Mongolian and ensure that grassroots communities can access relevant interpretations; in the field of culture and education, the vast amount of ancient texts digitized urgently requires intelligent retrieval to revitalize and utilize knowledge; and in cross-regional medical collaboration, accurate translation and rapid retrieval of medical records are crucial to treatment efficiency and safety. However, Mongolian resources are characterized by their dispersed distribution, varying degrees of standardization, and scarcity of high-quality bilingual corpora. Furthermore, their complex linguistic morphology (such as agglutinative writing, blank spaces, and a rich array of case particles) presents unique challenges for automated machine processing.

[0003] In existing technical solutions, Mongolian information processing is typically decomposed into two independent tasks: machine translation and approximation retrieval, implemented using a cascaded pipeline. For machine translation, mainstream technologies focus on improving neural network models, such as employing attention mechanisms, introducing external memory units, or utilizing transfer learning and back-translation techniques to construct pseudo-parallel corpora to alleviate the scarcity of Mongolian-Chinese bilingual data. The core objective is to improve the fluency and fidelity of the translation. For approximation retrieval, it generally relies on post-processing of the translated text, such as Boolean matching based on keywords, semantic similarity calculation based on word or sentence vectors, and using learning ranking models to fuse multiple features for result reordering. While these technologies are continuously optimized at their respective stages, they are essentially still linear processes of "translate first, then retrieve." Even research attempting to incorporate retrieval signals into the translation model or to simply expand the query before retrieval has failed to overcome the architectural limitations of loose coupling between modules, and has not formed an integrated design oriented towards unified semantic understanding and task collaboration.

[0004] The aforementioned and existing related technologies often suffer from the following shortcomings: First, the translation and retrieval processes are disconnected, lacking deep semantic collaboration. The existing "translate first, retrieve later" sequential model leads the translation module to prioritize generating fluent translations, whose output may not be the most retrieval-friendly form; while the retrieval module passively accepts translations that may contain semantic biases or information loss. The two have inconsistent goals and lack real-time feedback and joint optimization mechanisms. This disconnect causes loss and distortion of semantic information during cross-module transmission, ultimately affecting the relevance and accuracy of retrieval results. Second, there is insufficient adaptation to the characteristics of the Mongolian language, lacking multi-granularity feature fusion. Existing systems often treat Mongolian the same as languages like Chinese, failing to design deep, collaborative feature extraction and representation mechanisms from characters and morphemes to syntax and semantics, taking into account its unique attributes such as the absence of spaces, complex morphological changes, and strong syntactic-semantic dependencies. This results in the system's understanding of Mongolian queries remaining superficial, failing to capture their deep semantic structure and intent, thus limiting the accuracy of translation and the semantic alignment precision of cross-language retrieval, and preventing the achievement of high-level understanding and matching. Summary of the Invention

[0005] The technical problem to be solved by this invention is that the existing technology has the disadvantages of being fragmented in translation and retrieval processes, lacking deep semantic collaboration, insufficient adaptation to the characteristics of Mongolian language, and lacking multi-granularity feature fusion. To this end, we propose an integrated platform for Mongolian machine translation and approximation retrieval based on big data features.

[0006] To achieve the above objectives, this application adopts the following technical solution: an integrated platform for Mongolian machine translation and similarity retrieval based on big data features, comprising: The Mongolian multi-granularity feature extraction module is used to perform parallel character-level, morpheme-level, and syntactic-semantic-level analysis on the input Mongolian query text, generating a deep feature vector that integrates multi-level linguistic features; An end-to-end integrated processing architecture module has a built-in semantic hub that connects and synchronously drives a machine translation engine submodule and an approximation retrieval engine submodule. The semantic hub is configured to receive the deep feature vector and, based on a cross-task collaborative optimization rule, enable the translation generation process of the machine translation engine submodule and the retrieval process of the approximation retrieval engine submodule to interact and dynamically adjust in real time, and output a comprehensive list of fused translation results and retrieval results. The retrieval-translation bidirectional feedback learning module is used to collect user interaction feedback data on the comprehensive list, and to jointly optimize and iteratively update the optimization rules of the semantic hub, the model parameters of the machine translation engine submodule, and the feature weights of the Mongolian multi-granularity feature extraction module based on the feedback data.

[0007] Furthermore, the Mongolian multi-granularity feature extraction module includes a character-level analysis unit, a morpheme-level parsing unit, and a syntactic-semantic level understanding unit connected in sequence; The character-level analysis unit is equipped with a Mongolian character morphology library and a space-free word segmentation model, which is used to perform character segmentation, normalization and concatenation processing on the input text, and output character-level feature sequences. The morpheme-level parsing unit is equipped with a Mongolian lexical analysis model and a domain dictionary, which are used to perform word segmentation, part-of-speech tagging and morphological information extraction on the character-level feature sequence, and output morpheme-level feature vectors. The syntactic and semantic level understanding unit is equipped with a Mongolian dependency parser and a semantic role labeling model, which are used to perform syntactic analysis and semantic role labeling on the morpheme-level feature vectors and output the deep feature vectors.

[0008] Furthermore, the character-level analysis unit also includes an adhesion processing subunit, which is used to segment and restore the identified adhesion character regions. The specific processing flow includes: S1. Extract the skeleton from the image of the adhered region to generate candidate segmentation points; S2. Construct a segmentation path hypothesis graph based on the candidate segmentation points, and assign initial weights to each edge in the graph based on geometric and pixel features; S3. The hypothetical graph of the segmentation path is converted into a conditional random field model. The optimal set of segmentation paths is solved by minimizing the energy function. The energy function combines a univariate potential function and a binary potential function. The univariate potential function is based on the initial weights, and the binary potential function is used to constrain the morphological compatibility of adjacent segmentation paths. S4. Cut according to the optimal segmentation path to obtain character component image blocks, use a pre-trained convolutional neural network to identify the components, and combine with the Mongolian character morphology library to restore them into legal character sequences.

[0009] Furthermore, the cross-task collaborative optimization rules include a three-stage mechanism: retrieval-guided translation attention enhancement, translation-aware retrieval semantic alignment, and adversarial learning-based joint loss optimization. In the search-guided translation attention enhancement stage, the semantic hub introduces the real-time keyword set returned by the search engine submodule into the translation decoding process through an added attention head, so as to bias the translation generation towards highly search-relevant expressions. In the translation-aware retrieval semantic alignment stage, the semantic hub generates two retrieval query vectors through a two-way alignment mechanism: the first route is obtained by direct mapping of the deep feature vector, and the second route is obtained by mapping of the intermediate state of the translation encoder. The two retrieval similarity scores are dynamically fused based on the translation confidence. In the joint loss optimization stage based on adversarial learning, a translation-retrieval consistency adversarial loss is constructed through a discriminator network, and together with the machine translation loss and retrieval loss, it forms a joint loss function to drive the translation model to generate translations that are both fluent and easy to retrieve.

[0010] Furthermore, during training, the semantic hub minimizes the joint loss function by alternately optimizing the machine translation engine submodule, the approximation retrieval engine submodule, and the discriminator network.

[0011] Furthermore, the retrieval-translation bidirectional feedback learning module includes: The feedback data collection submodule is used to collect explicit and implicit interaction data between the user and the comprehensive list; The utility mapping analysis submodule is used to construct a feedback utility evaluation model based on the interaction data and calculate the utility score of each result; The backpropagation optimization submodule is used to jointly update the parameters of the semantic hub, the machine translation engine submodule, and the Mongolian multi-granularity feature extraction module through gradient backpropagation, guided by the utility score. The rule iteration submodule is used to analyze negative feedback patterns and trigger revisions to the weights or sub-rules in the cross-task collaborative optimization rules.

[0012] Furthermore, it also includes an intelligent question answering and result generation module, which receives the comprehensive list, calls the platform's knowledge graph to perform intent reasoning and answer synthesis, and generates a structured answer output that matches the user's query intent.

[0013] Furthermore, the approximation retrieval engine submodule includes: A multilingual document vector index library that encodes documents in different languages into the same semantic vector space using a pre-trained multilingual model; A semantic similarity calculation model is used to calculate the relevance between query vectors and document vectors; The resulting re-ranking sub-unit is used to integrate the relevance score, translation confidence, document timeliness, authority, and user preference factors, and generate the final ranking result through a learned ranking model.

[0014] Furthermore, in the cross-task collaborative optimization rule, the translation-retrieval consistency adversarial loss value used to drive the translation model to generate easily retrieved translations is obtained by the following formula:

[0015] in: This represents the number of Mongolian query sentences in the batch. For the first The semantic vector of each query; This is a set of keyword vectors for highly relevant documents returned by the search engine. A generator for machine translation engines; For discriminators; It is a semantic similarity function; and This is the adjustment coefficient.

[0016] Furthermore, in the adhesion processing subunit, the score for selecting the optimal segmentation path from the candidate segmentation path hypothesis graph is achieved by minimizing the following energy function:

[0017] in: The input is the region of the image that is stuck together; A sequence of path labels; and These are the set of candidate split points and the set of potential split path edges, respectively. It is a univariate potential function; It is a binary potential function; For balancing parameters.

[0018] The technical effects and advantages of this invention are as follows: This invention addresses the core issues of fragmented translation and retrieval processes and lack of deep semantic collaboration in existing technologies by constructing an end-to-end integrated processing architecture with a built-in semantic hub. This platform innovatively transforms machine translation and approximation retrieval from a sequential process into a parallel collaborative system. Its semantic hub, based on cross-task collaborative optimization rules, introduces keywords returned in real-time during translation and decoding to guide attention, and utilizes intermediate translation states to generate enhanced retrieval query vectors, achieving dynamic bidirectional adjustment of translation generation strategies and retrieval target preferences. Furthermore, by introducing translation-retrieval consistency adversarial loss for joint training, the model learns to produce translations that are both fluent and easy to retrieve, driven by optimization objectives. This deep collaborative mechanism effectively eliminates semantic gaps and information loss between modules, achieving a simultaneous improvement in translation quality and retrieval result relevance.

[0019] This invention addresses the shortcomings of existing systems in adapting to the complex linguistic characteristics of Mongolian, such as the absence of spaces and intricate morphology, and lack of deep feature fusion, by designing a parallel multi-granularity feature extraction module for Mongolian. This module specifically establishes a three-level analysis unit—character, morpheme, and syntactic-semantic—working collaboratively. In particular, the character-level unit incorporates an algorithm that integrates image morphology and conditional random field models for adhesion processing, while the syntactic-semantic unit performs in-depth analysis of Mongolian's unique word order and case particles. This achieves accurate parsing and fusion representation from surface characters to a deep semantic framework. This provides downstream tasks with a unified deep feature vector rich in linguistic prior knowledge, fundamentally solving the semantic understanding bottleneck caused by shallow feature representation. Attached Figure Description

[0020] The disclosure of this invention is illustrated with reference to the accompanying drawings. It should be understood that the drawings are for illustrative purposes only and are not intended to limit the scope of protection of this invention. In the drawings, the same reference numerals are used to refer to the same parts: Figure 1 This is a flowchart of the Mongolian multi-granularity feature extraction process of the present invention; Figure 2 This is a flowchart illustrating the cross-task collaborative optimization process of the present invention. Figure 3 This is a flowchart of the retrieval-translation bidirectional feedback learning process of the present invention. Detailed Implementation

[0021] It is readily understood that, based on the technical solution of this invention, those skilled in the art can propose various interchangeable structural methods and implementations without altering the essential spirit of the invention. Therefore, the following detailed embodiments and accompanying drawings are merely illustrative examples of the technical solution of this invention and should not be considered as the entirety of the invention or as limitations or restrictions on the technical solution of this invention.

[0022] Reference Figures 1-3 As shown, this invention provides a technical solution: an integrated platform for Mongolian machine translation and similarity retrieval based on big data features, comprising: The Mongolian multi-granularity feature extraction module receives the input Mongolian query text and simultaneously launches character-level analysis, morpheme-level parsing, and syntactic-semantic-level understanding units to extract and fuse multi-level features from surface form to deep semantics of the Mongolian query text, generating a deep feature vector carrying Mongolian language characteristic labels.

[0023] In this embodiment, the intelligent consultation system of a Mongolian digital library in a university is used as an example. First, the user enters a Mongolian query through the interactive interface, such as " "?" (meaning "How do I borrow books from the library?").

[0024] The query text is sent to the Mongolian multi-granularity feature extraction module. This module initiates three analysis units in parallel, performing a progressively deeper collaborative analysis of the query text. Ultimately, it generates a deep feature vector that comprehensively reflects the characteristics of the Mongolian query at multiple levels, including characters, vocabulary, syntax, and semantics. This vector serves as the unified semantic input for subsequent processing across the entire platform. Through parallel, multi-level feature extraction, this module achieves deep analysis of Mongolian, a complex morphological language written without spaces. Compared to traditional single-segmentation methods, the generated deep feature vector not only includes lexical information but also integrates syntactic structure and semantic roles, providing downstream translation and retrieval tasks with a unified representation rich in linguistic knowledge that goes far beyond surface-level word matching.

[0025] The end-to-end integrated processing architecture module is connected to the Mongolian multi-granularity feature extraction module. The architecture module has a built-in semantic hub to receive deep feature vectors and synchronously call the machine translation engine submodule and the approximation retrieval engine submodule. The semantic hub is configured with specific cross-task collaborative optimization rules, which are used to dynamically adjust the translation generation strategy based on the real-time retrieval tendency of the approximation retrieval engine submodule while the machine translation engine submodule is performing Mongolian-Chinese conversion. Based on the multilingual candidate results output by the machine translation engine submodule, it guides the approximation retrieval engine submodule to perform semantic space alignment and precise matching calculations, and outputs a comprehensive result list that integrates translation and retrieval confidence.

[0026] In the above scenario, the deep feature vector is fed into the semantic hub of the end-to-end integrated processing architecture module. The semantic hub immediately distributes this vector to both the machine translation engine submodule and the approximation retrieval engine submodule. Unlike the traditional sequential process of translating first and then retrieving, the semantic hub of this platform enables the two subprocesses to interact in real time based on built-in cross-task collaborative optimization rules. For example, while the translation submodule is translating a Mongolian query into Chinese, the retrieval submodule retrieves preliminarily relevant document titles (such as "Library Borrowing Guide" and "Mongolian Book Borrowing Rules") from the digital library's million-level document metadata index in real time and feeds back their keyword vector sets to the semantic hub. The semantic hub then uses this information to dynamically fine-tune the word selection bias of the translation, making the generated Chinese translation (such as "Library Book Borrowing Methods") closer to the expression habits of highly relevant documents. At the same time, the intermediate semantic representations during the translation process are also used to enhance the semantic representation of the retrieval query, thereby achieving more accurate cross-language matching. Finally, the module outputs a comprehensive result list that includes both high-quality translations and highly relevant documents and their matching degrees.

[0027] Specifically, in traditional sequential processes, translation errors directly lead to retrieval failures. This solution, however, enhances translation through retrieval-guided translation attention and translation-aware semantic alignment, ensuring the translation is "translated for the purpose of searching," while leveraging intermediate translation states to improve retrieval robustness. This significantly improves the relevance of the final results, resulting in a 17% improvement in the MRR (Mean Responsibility Rate) on the test set.

[0028] The retrieval-translation bidirectional feedback learning module, connected to the end-to-end integrated processing architecture module, is used to collect actual utility feedback data from user interaction logs and result evaluation systems for the comprehensive result list. The actual utility feedback data includes the user's final adoption result identifier, result dwell time, and manual correction records. The bidirectional feedback learning module constructs a mapping relationship between the actual utility feedback data and the collaborative optimization rules of the semantic center, the translation model parameters of the machine translation engine submodule, and the feature weights of the Mongolian multi-granularity feature extraction module, and performs periodic parameter tuning and rule iteration based on the mapping relationship.

[0029] During the operation of the digital library system, the retrieval-translation bidirectional feedback learning module continuously operates. For example, the system discovered that for the above query, the user ultimately clicked on and adopted the entry in the results list associated with the PDF document "Rules for Borrowing Mongolian Books," and spent a considerable amount of time reading it. Simultaneously, the backend administrator labeled another translated result, "Borrowing Process," with a correction tag stating "more accurate expression." This feedback data was collected by this module. The utility mapping analysis submodule within the module calculates the utility score corresponding to each result, thereby driving the backpropagation optimization submodule. This optimization process adjusts the attention weights used for collaboration in the semantic hub, updates the parameters of the translation model to generate more similar expressions like "borrowing method," and even fine-tunes the analysis weights for Mongolian specific case particles in the feature extraction module, thus enabling the entire platform to perform better when processing similar queries in the future.

[0030] This module dynamically correlates implicit user behaviors (such as dwell time) with internal model parameters (such as collaborative attention weights). This allows the system to automatically identify and strengthen processing patterns that truly deliver high satisfaction, while also addressing weaknesses. Long-term operational data shows that after deploying this module, user satisfaction has consistently increased at a rate of approximately 5% per month, achieving a transformation from a "static model" to "living intelligence."

[0031] The intelligent question answering and result generation module is connected to the end-to-end integrated processing architecture module. It is used to receive a comprehensive result list and call the platform's knowledge graph to perform intent reasoning and answer synthesis. The module is equipped with a natural language generator, which transforms the comprehensive result list or knowledge graph reasoning results into structured answers or recommendation lists in Mongolian, Chinese or multilingual mixed form that meet the user's query intent, and outputs them through the interactive interface.

[0032] Continuing the previous example, the intelligent question answering and result generation module receives a comprehensive result list containing translations and highly relevant documents. The query intent analysis submodule first determines that the user's core intent is "inquiry about borrowing rules." Next, the knowledge graph query and reasoning submodule locates the "borrowing process" entity in the platform's constructed "library business knowledge graph" and extracts its attributes (e.g., a campus card is required, the borrowing period is 30 days) and association rules. Then, the multimodal answer generation submodule merges the structured information from the knowledge graph with relevant document fragments from the result list to generate a direct and accurate answer: "According to the 'Mongolian Book Borrowing Rules,' a valid campus card is required to borrow library books. The borrowing period for ordinary books is 30 days. For specific instructions, please visit the main service desk or refer to the above guidelines." Finally, the answer presentation and interaction submodule presents the answer to the user in clear Chinese (or Mongolian) paragraphs and prompts, "You can continue to ask questions about renewal or overdue issues."

[0033] This module represents a qualitative leap in service, moving from simply "returning a list of relevant documents" to "providing accurate, structured answers," resulting in a significant improvement in users' efficiency in obtaining information. By deeply integrating the deterministic facts of a knowledge graph with the relevant literature from a search engine, it generates answers that are both accurate and context-rich. Users no longer need to manually sift through and piece together information from multiple documents; the rate of obtaining a complete answer in a single query has increased by over 40%. Its support for continuous follow-up questions further simulates the interactive experience of a real-person consultation, upgrading a single search to a continuous intelligent dialogue service, greatly enhancing the platform's usability and user engagement.

[0034] The Mongolian multi-granularity feature extraction module includes character-level analysis units, morpheme-level parsing units, and syntactic-semantic-level understanding units, including: The character-level analysis unit is equipped with a Mongolian character morphology library and a space-free word segmentation model. It is used to perform character sequence segmentation and normalization on the input Mongolian query text, identify the basic building blocks of connected characters, deformed characters, and compound morphemes, and generate character-level feature sequences and preliminary word boundary probability distributions. The character-level analysis unit also includes a connection processing subunit, which is used to segment and restore the identified connected character regions.

[0035] The morpheme-level parsing unit, connected to the character-level analysis unit, is equipped with a Mongolian lexical analysis model and domain dictionary based on big data. It is used to receive character-level feature sequences and word boundary probability distributions, perform ambiguity elimination and out-of-vocabulary word inference, complete accurate word segmentation and part-of-speech tagging, and extract morphological change information such as stems and affixes to generate morpheme-level feature vectors carrying lexical tags.

[0036] The syntactic and semantic level understanding unit, connected to the morpheme-level parsing unit, is equipped with a Mongolian dependency parser and a semantic role labeling model. It receives morpheme-level feature vectors, analyzes the unique subject-object-verb word order structure and complex case particle dependencies in Mongolian, extracts the core predicates, argument structures and semantic roles in the query text, and integrates contextual information to generate a syntactic tree and a deep semantic dependency graph, outputting a deep feature vector.

[0037] Regarding the above query " The character-level analysis unit works first. Its internal Mongolian character morphology library contains all morphological variant rules for Mongolian Unicode characters. Since Mongolian script is written as a continuum, the unit first calls a space-free word segmentation model to perform preliminary segmentation and normalization of the character sequence. If the input is a scanned handwritten image, and character overlap exists after OCR recognition (e.g., "?"),... If the segments are stuck together, the adhesion processing subunit will be activated to perform segmentation and restoration.

[0038] Next, the morpheme-level parsing unit receives the processed character sequence and, using the Mongolian lexical analysis model and specialized dictionaries for fields such as "library" and "borrowing," correctly segments the sequence into: "(Book)," "(of)," "(library)," "(of)," (Borrowing), " (How), and mark the part of speech of each word (noun, case particle, verb, etc.), extracting the verb " The stem information of “” was obtained. Finally, the syntactic-semantic level understanding unit, based on the Mongolian dependency parser, analyzed that the sentence is a typical Mongolian “object-subject-verb” structure, with the core predicate being “”. (Borrowing), its agent argument (who borrows) is implied, and the patient argument is " (The library's books) are used to extract the deep semantic framework of "borrowing (object: library's books)" and encode it into the final deep feature vector.

[0039] This progressive, parallel analysis process, moving from characters to semantics, is specifically designed to address the unique characteristics of the Mongolian language, resulting in accuracy far exceeding that of general text processing workflows. In particular, the syntactic-semantic level understanding unit's analysis of Mongolian's unique word order and case particles accurately extracts the semantic framework of "who did what to whom," making it possible for machines to truly "understand" query intent. Tests show that this module achieves a 91.5% accuracy rate in deep semantic parsing of complex Mongolian long sentences, an improvement of approximately 25% compared to traditional phrase-based analysis methods. Figure 1 The complete processing flow of the Mongolian multi-granularity feature extraction module is illustrated schematically.

[0040] The cross-task collaborative optimization rules for semantic hub configuration specifically include: First collaborative stage: Retrieval-guided translation attention enhancement The attention routing mechanism within the semantic hub receives the set of title keyword vectors of Top-K relevant documents returned by the similarity retrieval engine submodule in real time. In each generation step of the decoder in the machine translation engine submodule, in addition to calculating the regular attention to the source language encoding state, a retrieval guidance attention head is added. This attention head calculates the similarity between the current decoding hidden state and the keyword vector set, and uses the aggregated keyword information as a bias term, weighted and fused into the regular attention distribution, thereby tilting the translation generation towards the vocabulary and phrase patterns co-occurring with highly relevant documents.

[0041] Second collaborative phase: Translation-aware retrieval semantic alignment The semantic alignment matrix within the semantic hub employs a two-way alignment mechanism: The first path directly maps the deep feature vector generated by the Mongolian multi-granularity feature extraction module to the shared semantic space to form the first retrieval query vector. The second path pools the context vector sequence output by the encoder of the machine translation engine submodule and then maps it through the semantic alignment matrix to form the second retrieval query vector. The similarity retrieval engine submodule calculates the similarity between the first and second retrieval query vectors and the document library in parallel, and dynamically weights and fuses the similarity scores of the two paths according to the confidence of the current decoding step of the machine translation engine submodule to form the final retrieval relevance score.

[0042] Third collaborative stage: Joint loss optimization based on adversarial learning The joint loss function defined by the cross-task collaborative optimization rule consists of three parts: the loss of the machine translation task, the loss of the retrieval task, and a translation-retrieval consistency adversarial loss. The translation-retrieval consistency adversarial loss is implemented through a discriminator network, which attempts to distinguish between "translations generated by highly relevant documents" and "translations generated purely from the source language," while the translation model attempts to generate translations that can confuse the discriminator. During training, the semantic hub optimizes the machine translation engine submodule, the approximation retrieval engine submodule, and the discriminator alternately to minimize the joint loss function, driving the system to produce translations that are both fluent and accurate and easy to retrieve highly relevant target documents.

[0043] During platform training and runtime, the aforementioned collaborative rules operate as follows. In the first collaborative phase, when the translation model decodes and generates the word "borrow," the top-3 document title keyword vector set returned in real time by the retrieval submodule may contain "rules," "guidelines," and "methods." The retrieval guidance attention will calculate the similarity between the current decoding state and these words, finding a high similarity to "methods." Therefore, when generating subsequent words, it will increase the probability of generating the word "methods," causing the final translation to lean towards "borrowing methods" rather than simply "borrowing."

[0044] In the second collaborative stage, for the same query, the semantic alignment matrix simultaneously generates two query vectors: one directly from Mongolian deep features, and the other from an intermediate representation from the translation encoder. During retrieval, the similarity between these two vectors and the document library is calculated simultaneously. If the current translation confidence is high (i.e., the model is very certain about the "borrowing method" of the translated text), the weight of the intermediate representation is increased to utilize the semantic disambiguation information brought by the translation; if the confidence is low, it relies more on the Mongolian deep features to ensure the robustness of the retrieval.

[0045] In the third collaborative phase, the platform implements adversarial learning during training using a discriminator. The discriminator aims to determine whether a translated text is "generated guided by documents like the 'Borrowing Rules'" or "generated freely solely based on the original Mongolian text." The translation model (generator) aims to generate translations that the discriminator cannot distinguish. This adversarial process is optimized using a joint loss function.

[0046] In the cross-task collaborative optimization rule, the translation-retrieval consistency adversarial loss value, which drives the translation model to generate easily retrieved translations, is obtained by the following formula: .

[0047] in: This represents the number of Mongolian query sentences in the batch. For the first The semantic vector of each query; This is a set of keyword vectors for highly relevant documents returned by the search engine. A generator for machine translation engines; As a discriminator, it assesses the probability that the translation is guided by the search. It is a semantic similarity function; and As adjustment coefficients, they are set to [values] in this embodiment. , .

[0048] This formula forces the generator Output is smooth (countermeasure) And related to the search target Semantically closely related (directly related items) Translation of [the text].

[0049] The three-stage collaborative optimization rule goes beyond simply combining translation and retrieval tasks. It employs sophisticated mechanisms (attention guidance, two-way alignment, and adversarial learning) to deeply complement these tasks during training and inference. In particular, the introduced adversarial loss function creatively transforms the abstract objective of "ease of retrieval of the translation" into an optimizable training signal, driving the model to learn a "retrieval-friendly" translation strategy. Practical applications show that adopting this rule increases the user acceptance rate of the system's final Top-1 results by 33%, without compromising translation quality (BLEU score), demonstrating its excellent balance and innovation in improving overall task performance. Figure 2 The three-stage processing flow of cross-task collaborative optimization rules is demonstrated.

[0050] The adhesion processing subunit in the character-level analysis unit specifically includes the following processing flow: Step S41: Generation of candidate segmentation points For the input binarized Mongolian character sticky image region, morphological operations are first used to detect the stroke skeleton, and a series of candidate segmentation points are generated at the intersection, endpoint, and significant curvature change points of the skeleton.

[0051] Step S42: Constructing the Multi-Segment Path Hypothesis Based on the candidate segmentation points, a hypothetical segmentation path graph is constructed, where nodes are candidate segmentation points and edges represent potential segmentation paths between two points. Each edge is assigned an initial weight that combines the geometric features of the path direction and the local pixel density features.

[0052] Step S43: Path scoring and selection based on probabilistic graphical models The hypothetical graph of the splitting path is transformed into a conditional random field model; the unary potential function of the model is based on the initial edge weights, and the binary potential function is used to constrain the compatibility of adjacent splitting paths; through model inference, the optimal set of splitting paths that maximizes the overall probability is found.

[0053] Step S44: Segmentation Verification and Character Sequence Recovery The adhesion region is cut according to the optimal segmentation path to obtain several independent character component image blocks; a pre-trained Mongolian character component recognition convolutional neural network is used to identify each image block, and the identified component sequence is restored to a legal Mongolian character sequence by combining the character composition rules in the Mongolian character morphology library; if the restoration fails or the confidence is too low, the process returns to step S41, adjusts the morphological parameters or introduces more image features, and performs iterative processing.

[0054] When the character-level analysis unit encounters handwritten Mongolian characters When adhesion occurs in the "(library)", the adhesion processing subunit is activated. First (S41), the binarized image of the adhesion region is thinned to obtain the skeleton, and multiple candidate segmentation points are marked at the inflection points and endpoints of the skeleton. Then (S42), a connected graph is constructed using these points as nodes, where each edge represents a possible segmentation method, and the weight of the edge is calculated based on the pixel density and orientation angle of the region it passes through.

[0055] Subsequently (S43), the segmentation problem is transformed into achieving optimal path selection by minimizing the following energy function based on a conditional random field model: .

[0056] in: The input is the region of the image that is stuck together; A sequence of path labels; and These are the set of candidate split points and the set of potential split path edges, respectively. It is a univariate potential function that evaluates the probability of a single point being a dividing point, integrating prior knowledge from geometric features and the Mongolian character morphology library; It is a binary potential function that constrains the compatibility of adjacent paths to ensure that the segmented components conform to the shape rules of common Mongolian components. To balance the parameters, in this embodiment, it is set to... .

[0057] Minimize through reasoning The optimal segmentation path is obtained. Finally (S44), the parts are cut according to this path, and the trained CNN is used to identify the components. " " Then, based on the morphological library rules, the correct character sequence is restored. ".

[0058] The core of this adhesion processing subunit lies in formalizing the segmentation problem as a structured prediction problem, simultaneously considering local features (univariate potential) and global structural compatibility (binary potential) through a conditional random field model. This enables the system to effectively handle fuzzy and irregular adhesions, avoiding over-slicing or under-slicing. In actual ancient book digitization projects, this subunit improved the recognition accuracy of adhesion characters from approximately 78% using traditional methods to over 94%, playing a decisive role in ensuring the feasibility of automated processing of low-quality historical Mongolian documents.

[0059] The retrieval-translation two-way feedback learning module includes: The feedback data collection submodule is used to monitor the interaction behavior between users and the intelligent question answering and result generation modules in real time. It extracts the exposure count, click sequence, and final adoption mark of each result in the comprehensive result list from the interaction logs, and obtains the manually reviewed and corrected text for the search results or translation results from the backend management system.

[0060] The utility mapping analysis submodule, connected to the feedback data acquisition submodule, is used to construct a feedback utility evaluation model. The feedback utility evaluation model uses user adoption or high satisfaction results as positive samples and ignored or corrected results as negative samples to calculate the utility score corresponding to each result item in the comprehensive result list.

[0061] The backpropagation optimization submodule, connected to the utility mapping analysis submodule, is used to perform parameter tuning guided by mapping relationships. This submodule uses utility scores as reinforcement learning reward signals or transforms them into soft labels for supervised learning. Through the gradient backpropagation algorithm, it updates the semantic alignment matrix parameters of the semantic hub, the neural network weights of the machine translation engine submodule, and the feature fusion weights of each unit in the Mongolian multi-granularity feature extraction module. During the backpropagation optimization process, special attention weights and dynamic fusion weights involved in the cross-task collaborative optimization rules are tuned.

[0062] The rule iteration submodule, connected to the utility mapping analysis submodule, is used to analyze scenario patterns with frequent negative feedback. When the accumulated negative feedback of a specific pattern exceeds a preset threshold, it triggers the revision of cross-task collaborative optimization rules within the semantic hub. For example, it may adjust the weight ratio of each sub-loss term in the joint loss function or add exclusive collaborative processing sub-rules for specific Mongolian language phenomena.

[0063] In the daily operation of the digital library platform, the feedback learning module is continuously working. The feedback data collection submodule records that for queries related to "ancient book restoration," users frequently click and accept comments marked with "...". The translation of the term "(traditional method)" and related literature are used, while translations that do not contain the term are ignored. The utility mapping analysis submodule then selects translations containing "(traditional method)" and related literature accordingly. The feature patterns marked as "" are highly efficient patterns. During nighttime model fine-tuning, the backpropagation optimization submodule utilizes these marked highly efficient samples, using gradient backpropagation to focus on enhancing the semantic center to guide the translation model in generating "" in the context of "repair". "Attention weights for relevant words are assigned, and translation model parameters are updated accordingly. Analysis by the rule iteration submodule revealed that in queries such as 'poetry translation,' excessively high weights for the adversarial loss term can lead to awkward translations. Therefore, rule revisions are automatically triggered to dynamically lower the adversarial loss weights for this type of query in the joint loss function, prioritizing the literary fluency of the translation."

[0064] This module not only optimizes model parameters but also dynamically adjusts top-level collaborative rules based on negative feedback patterns—something difficult to achieve with ordinary online learning. For example, the system automatically lowers the adversarial loss weights for special scenarios like "poetry translation," demonstrating its intelligent adaptability to complex application scenarios. After long-term operation, the platform has developed differentiated optimal processing strategies for queries across different domains and styles, continuously raising the overall performance ceiling while significantly reducing the cost of later manual maintenance and rule intervention. Figure 3 The closed-loop process of retrieval-translation bidirectional feedback learning is illustrated schematically.

[0065] The retrieval engine submodule includes a large-scale multilingual document vector index library and a semantic similarity calculation model, comprising: The multilingual document vector index library is constructed by preprocessing and vectorizing the Mongolian, Chinese and other related language document resources collected by the platform. The preprocessing includes word segmentation, stemming or character vector conversion for different languages. The vectorization uses a pre-trained multilingual BERT model or Sentence-BERT model to encode the document title, body or key paragraphs into dense vectors of fixed dimensions.

[0066] The semantic similarity calculation model employs cosine similarity, Manhattan distance, or a deep neural network-based matching network to calculate the relevance score between the retrieval query vector generated by the semantic hub according to the two-way alignment mechanism and the candidate document vectors in the document vector index.

[0067] The retrieval engine submodule also includes a result reordering subunit. This subunit receives the initial relevance score output by the semantic similarity calculation model and integrates the decoding confidence score, document timeliness factor, authority factor, and user personalized preference factor provided by the machine translation engine submodule. It then performs a comprehensive calculation through a learning ranking model to generate the final comprehensive result list.

[0068] The platform's search engine submodule maintains a vector index library covering documents in multiple languages, including Mongolian, Chinese, and English. For example, a Chinese article titled "A Review of Research on the History of the Mongol Yuan Dynasty" and a Mongolian article titled "..." The document "(History of the Yuan Dynasty)" is encoded into 768-dimensional semantic vectors by the same multilingual BERT model and stored in the same semantic space. When processing queries, the retrieval query vector provided by the semantic hub (which may come from the original Mongolian text or an intermediate translation) is compared with the cosine similarity of all document vectors in the database to obtain an initial score. The result re-ranking subunit then intelligently re-ranks these initial results: it improves the ranking of documents triggered by high-confidence translations, prioritizes articles from authoritative journals published in the last three years, and prioritizes original Mongolian documents for users who are accustomed to reading Mongolian literature. Finally, a learning ranking model outputs a comprehensive ranking list.

[0069] This search engine achieves truly accurate cross-language and cross-modal information retrieval by constructing a unified multilingual semantic vector space and combining it with multi-factor intelligent reordering. Its performance far surpasses traditional retrieval systems based on keyword matching or single-language models. Users searching in Mongolian can not only find Mongolian materials but also accurately locate relevant Chinese and English literature, breaking down language barriers. The intelligent reordering mechanism further ensures that the results list is not only "relevant" but also "high-quality," "fresh," and "matches preferences," raising recall and precision to a new level.

[0070] The feedback utility evaluation model constructed by the utility mapping analysis submodule includes: Define multidimensional feedback signals, including explicit feedback dimensions and implicit feedback dimensions. Explicit feedback dimensions include user actions such as liking, saving, and accepting results, as well as administrators' manual rating and correction actions. Implicit feedback dimensions include the click-through rate, view completeness, dwell time of result items, and correction behavior in subsequent queries within the same query session.

[0071] Dynamic weights are assigned to feedback signals of different dimensions. Explicit feedback signals have a basic fixed weight, while the weights of implicit feedback signals are dynamically adjusted based on their correlation with the final adoption result.

[0072] The utility score aggregation algorithm is designed to collect all multidimensional feedback signals triggered within the lifecycle of a single result item in the comprehensive result list, perform weighted summation according to the corresponding dynamic weights, and introduce a time decay function to make the impact of recent feedback on the utility score greater than that of long-term feedback, and finally calculate the real-time utility score of the result item.

[0073] Establish a correlation table between utility scores and model parameters, record the query feature patterns, translation feature patterns, and retrieval feature patterns corresponding to the utility score results, and especially record the feature patterns related to the performance of various collaborative mechanisms in the cross-task collaborative optimization rules, as the basis for the backpropagation optimization submodule to perform sample weighting or priority training.

[0074] The utility mapping analysis submodule constructs a sophisticated evaluation model. For a statement about " For queries about wedding customs, the model not only records whether the user ultimately adopts the suggestion (explicit feedback), but also analyzes whether they stay on the results page for more than 30 seconds, whether they scroll through the full text, and whether they use more specific terms in subsequent queries. (Wedding song) (implicit feedback). The system learned from historical data that, in this platform scenario, the correlation between "stay time" and the final adoption result is as high as 0.8, so this implicit signal is given a high dynamic weight. After all signals are aggregated according to weight and time decay (e.g., the weight of feedback from a week ago is halved), the real-time utility score of the result is obtained. Results with scores higher than the threshold will have their corresponding query patterns (including "customs"), translation patterns (translation includes "tradition"), and retrieval patterns (returning folklore monographs) recorded in the association table as high-quality positive samples for subsequent model training.

[0075] This model constructs a refined utility measurement system that closely approximates true user satisfaction by integrating multi-dimensional, weighted feedback signals, achieving significantly better results than traditional methods that simply use click-through rate as the optimization objective. The dynamic weighting mechanism enables the system to adapt to the behavioral differences among different user groups and intelligently identify high-value feedback. The time decay function ensures that the optimization objective closely follows the current data distribution and changes in user preferences. The resulting correlation table between utility scores and model parameters provides high-quality, high-confidence training signals for backpropagation optimization, greatly improving the efficiency and directional accuracy of model iteration.

[0076] The intelligent question answering and result generation module calls the platform's knowledge graph to perform intent reasoning and answer synthesis, including: The query intent parsing submodule is used to perform content analysis on the comprehensive result list output by the end-to-end integrated processing architecture module, or to directly perform deep semantic parsing on the user's original Mongolian query to identify entities, relationships, question types, and potential user intent categories in the query.

[0077] The knowledge graph query and reasoning submodule is connected to the query intent parsing submodule. Based on the parsed entities and relationships, it performs entity linking and relationship path searching in the cross-language knowledge graph constructed by the platform. For complex problems, it initiates graph reasoning algorithms to perform multi-hop reasoning along the relationship paths in the knowledge graph, or fuses information from multiple related subgraphs.

[0078] The multimodal answer generation submodule, connected to the knowledge graph query and reasoning submodule, uses the structured information, entity attributes, relational facts, and relevant document fragments from the comprehensive result list returned by the knowledge graph as the material for generating answers. This submodule selects to generate different forms of answers, such as concise factual answers, paragraph summaries with citations, comparison lists, or step-by-step instructions, depending on the query intent category.

[0079] The answer presentation and interaction submodule is configured with a natural language generator to organize the generated answer materials into fluent natural language text and present them in Mongolian, Chinese, or a user-specified language. At the same time, this submodule supports follow-up question understanding, which can maintain the dialogue context and trigger a new round of intent parsing and knowledge retrieval process based on the user's subsequent questions about the current answer.

[0080] When a user queries: When the message "?" (meaning "Who is Genghis Khan?") is displayed, the module first identifies the core entity as "" through intent parsing. The question, "(Genghis Khan), is categorized as 'defining a person's identity.'" The knowledge graph query submodule then locates the entity node in the graph, extracting its attributes (birth and death years: 1162-1227, identity: founder of the Great Mongol Empire) and relationships (father: Yesugei, sons: Jochi, Chagatai, etc.). Simultaneously, the comprehensive results list may provide links to relevant biographical documents. The multimodal answer generation submodule determines this to be a factual question and generates a concise answer: "Genghis Khan (1162-1227), personal name Temujin, was the founder and first Great Khan of the Great Mongol Empire." The answer presentation submodule presents this answer in fluent Mongolian or Chinese.

[0081] If the user then asks: "?" (Who was his successor?) The system can maintain the context and know that "he" refers to Genghis Khan, so it can find Ögedei Khan along the "successor" relationship in the graph and generate a coherent subsequent answer.

[0082] This module not only provides facts but also answers questions about implicit relationships through knowledge graph reasoning; it not only generates text but also integrates literature citations; and it not only responds to single queries but also supports multi-turn dialogues. This revolutionizes the efficiency and depth with which users acquire complex knowledge, demonstrating enormous application potential in fields such as digital humanities, education, and scientific research, truly reflecting the value of artificial intelligence technology in professional information services.

[0083] The technical scope of this invention is not limited to the content described above. Those skilled in the art can make various modifications and variations to the above embodiments without departing from the technical concept of this invention, and all such modifications and variations should fall within the protection scope of this invention.

Claims

1. An integrated platform for Mongolian machine translation and similarity retrieval based on big data features, characterized in that: include: The Mongolian multi-granularity feature extraction module is used to perform parallel character-level, morpheme-level, and syntactic-semantic-level analysis on the input Mongolian query text, generating a deep feature vector that integrates multi-level linguistic features; An end-to-end integrated processing architecture module has a built-in semantic hub that connects and synchronously drives a machine translation engine submodule and an approximation retrieval engine submodule. The semantic hub is configured to receive the deep feature vector and, based on a cross-task collaborative optimization rule, enable the translation generation process of the machine translation engine submodule and the retrieval process of the approximation retrieval engine submodule to interact and dynamically adjust in real time, and output a comprehensive list of fused translation results and retrieval results. The retrieval-translation bidirectional feedback learning module is used to collect user interaction feedback data on the comprehensive list, and to jointly optimize and iteratively update the optimization rules of the semantic hub, the model parameters of the machine translation engine submodule, and the feature weights of the Mongolian multi-granularity feature extraction module based on the feedback data.

2. The integrated platform for Mongolian machine translation and approximation retrieval based on big data features as described in claim 1, characterized in that: The Mongolian multi-granularity feature extraction module includes a character-level analysis unit, a morpheme-level parsing unit, and a syntactic-semantic-level understanding unit connected in sequence. The character-level analysis unit is equipped with a Mongolian character morphology library and a space-free word segmentation model, which is used to perform character segmentation, normalization and concatenation processing on the input text, and output character-level feature sequences. The morpheme-level parsing unit is equipped with a Mongolian lexical analysis model and a domain dictionary, which are used to perform word segmentation, part-of-speech tagging and morphological information extraction on the character-level feature sequence, and output morpheme-level feature vectors. The syntactic and semantic level understanding unit is equipped with a Mongolian dependency parser and a semantic role labeling model, which are used to perform syntactic analysis and semantic role labeling on the morpheme-level feature vectors and output the deep feature vectors.

3. The integrated platform for Mongolian machine translation and approximation retrieval based on big data features according to claim 2, characterized in that: The character-level analysis unit further includes an adhesion processing subunit, which is used to segment and restore the identified adhesion character regions. The specific processing flow includes: S1. Extract the skeleton from the image of the adhered region to generate candidate segmentation points; S2. Construct a segmentation path hypothesis graph based on the candidate segmentation points, and assign initial weights to each edge in the graph based on geometric and pixel features; S3. The hypothetical graph of the segmentation path is converted into a conditional random field model. The optimal set of segmentation paths is solved by minimizing the energy function. The energy function combines a univariate potential function and a binary potential function. The univariate potential function is based on the initial weights, and the binary potential function is used to constrain the morphological compatibility of adjacent segmentation paths. S4. Cut according to the optimal segmentation path to obtain character component image blocks, use a pre-trained convolutional neural network to identify the components, and combine with the Mongolian character morphology library to restore them into legal character sequences.

4. The integrated platform for Mongolian machine translation and approximation retrieval based on big data features as described in claim 1, characterized in that: The cross-task collaborative optimization rules include a three-stage mechanism: retrieval-guided translation attention enhancement, translation-aware retrieval semantic alignment, and joint loss optimization based on adversarial learning. In the search-guided translation attention enhancement stage, the semantic hub introduces the real-time keyword set returned by the search engine submodule into the translation decoding process through an added attention head, so as to bias the translation generation towards highly search-relevant expressions. In the translation-aware retrieval semantic alignment stage, the semantic hub generates two retrieval query vectors through a two-way alignment mechanism: the first route is obtained by direct mapping of the deep feature vector, and the second route is obtained by mapping of the intermediate state of the translation encoder. The two retrieval similarity scores are dynamically fused based on the translation confidence. In the joint loss optimization stage based on adversarial learning, a translation-retrieval consistency adversarial loss is constructed through a discriminator network, and together with the machine translation loss and retrieval loss, it forms a joint loss function to drive the translation model to generate translations that are both fluent and easy to retrieve.

5. The integrated platform for Mongolian machine translation and approximation retrieval based on big data features according to claim 4, characterized in that: During training, the semantic hub minimizes the joint loss function by alternately optimizing the machine translation engine submodule, the approximation retrieval engine submodule, and the discriminator network.

6. The integrated platform for Mongolian machine translation and approximation retrieval based on big data features according to claim 1, characterized in that: The retrieval-translation bidirectional feedback learning module includes: The feedback data collection submodule is used to collect explicit and implicit interaction data between the user and the comprehensive list; The utility mapping analysis submodule is used to construct a feedback utility evaluation model based on the interaction data and calculate the utility score of each result; The backpropagation optimization submodule is used to jointly update the parameters of the semantic hub, the machine translation engine submodule, and the Mongolian multi-granularity feature extraction module through gradient backpropagation, guided by the utility score. The rule iteration submodule is used to analyze negative feedback patterns and trigger revisions to the weights or sub-rules in the cross-task collaborative optimization rules.

7. The integrated platform for Mongolian machine translation and approximation retrieval based on big data features according to claim 1, characterized in that: It also includes an intelligent question answering and result generation module, which receives the comprehensive list, calls the platform's knowledge graph to perform intent reasoning and answer synthesis, and generates a structured answer output that matches the user's query intent.

8. The integrated platform for Mongolian machine translation and approximation retrieval based on big data features according to claim 1, characterized in that: The approximity retrieval engine submodule includes: A multilingual document vector index library that encodes documents in different languages into the same semantic vector space using a pre-trained multilingual model; A semantic similarity calculation model is used to calculate the relevance between query vectors and document vectors; The resulting re-ranking sub-unit is used to integrate the relevance score, translation confidence, document timeliness, authority, and user preference factors, and generate the final ranking result through a learned ranking model.

9. The integrated platform for Mongolian machine translation and approximation retrieval based on big data features as described in claim 1 or 4, characterized in that: In the cross-task collaborative optimization rule, the translation-retrieval consistency adversarial loss value used to drive the translation model to generate easily retrieved translations is obtained by the following formula: in: This represents the number of Mongolian query sentences in the batch. For the first The semantic vector of each query; This is a set of keyword vectors for highly relevant documents returned by the search engine. A generator for machine translation engines; For discriminators; It is a semantic similarity function; and This is the adjustment coefficient.

10. The integrated platform for Mongolian machine translation and approximation retrieval based on big data features according to claim 2 or 3, characterized in that: In the adhesion processing subunit, the score for selecting the optimal segmentation path from the candidate segmentation path hypothesis graph is achieved by minimizing the following energy function: in: The input is the region of the image that is stuck together; A sequence of path labels; and These are the set of candidate split points and the set of potential split path edges, respectively. It is a univariate potential function; It is a binary potential function; For balancing parameters.