A system and method for training translation models using source extension training examples.
By using source-extended training examples with domain-specific labels, the system enhances neural machine translation models' consistency and accuracy across varied sources, addressing data quality issues and reducing computational overhead.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- GOOGLE LLC
- Filing Date
- 2022-06-28
- Publication Date
- 2026-06-26
AI Technical Summary
Neural machine translation models are affected by the quality and variability of training data, making it difficult to ensure consistent translation quality across different sources and domains.
A system and method for training translation models using source-extended training examples, where labels such as internet domains, URLs, and IP addresses are used to associate translation styles with data sources, enabling flexible emulation of desired translation qualities by adjusting these labels during inference.
This approach reduces the need for extensive data filtering and allows a single model to generate translations with consistent quality across different domains, improving translation accuracy and reducing computational complexity.
Smart Images

Figure 0007881004000001 
Figure 0007881004000002 
Figure 0007881004000003
Abstract
Description
Background Art
[0001] The quality of translations generated by neural machine translation models can be affected by both the amount and quality of the data used to train the models. Unfortunately, while large amounts of training data can be collected using various automated methods, it can be difficult to guarantee the quality of such data and often requires human supervision. For example, a system may be configured to crawl the Internet to identify sets of pages published in multiple languages (e.g., pages from domains en.websight.com and es.website.com may have the same content published in English and Spanish, respectively) and separate corresponding sequences of text from which training examples can be generated. However, training examples from some websites or web pages may be of relatively high or low quality depending on various factors such as whether the translation was created or supervised by a human translator, whether the translation is more concise or more verbose, etc. Similarly, training examples from some websites or web pages may use specific jargon, making them more or less desirable for training a given translation model (e.g., web pages targeted at a particular region may use region-specific dialects, and web pages targeted at scientific or legal content may use terms that have different meanings in non-scientific or non-legal contexts, etc.).
Summary of the Invention
[0002] This technology relates to a system and method for training a translation model using source-extended training examples so that the model can learn to associate a particular translation style with the source of each example. For example, in some embodiments of this technology, the translation model may be trained on a first text sequence in a first language, a second text sequence in a second language different from the first language, and labels based on the source of the second text sequence. In some embodiments, the labels may include an internet domain, an internet subdomain, a uniform resource locator ("URL"), a website name, or an IP address related to the source of the second text sequence. Similarly, in some embodiments, the labels may further indicate the source of the first text sequence. Furthermore, in some aspects of this technology, each given training example of multiple training examples may be automatically generated by sampling a first text sequence from a first page of a given internet domain, sampling a second text sequence from a second page of a given internet domain, and generating labels based on the source of the second text sequence and / or the first text sequence (e.g., all or part of the URLs, internet domains, internet subdomains, website names, or IP addresses of the first and / or second pages).
[0003] Therefore, this technique can generate translation models that can be prompted during inference to emulate translations from specific high-quality or otherwise desirable sources by simply including the source labels along with the input text sequence. These high-quality or desirable sources can be identified after training by repeatedly feeding the trained translation model a validation set of examples using different labels and comparing the quality of the generated translations (e.g., using automated quality metrics, human scorers, or a combination thereof). In this way, this technique can reduce or eliminate the amount of filtering required on a given set of training data, thus enabling the training of translation models using large datasets of automatically collected, generated, and / or filtered synthetic training examples. Similarly, this technique can be used to generate translation models that can be flexibly and efficiently "tuned" to emulate different translation qualities and / or styles simply by changing which source labels are used during inference. Thus, this technique can solve the technical problem of how to control the output of translation models trained on multiple sources or domains so that they generate translations based on the characteristics of a particular source or domain of interest. Furthermore, in various exemplary embodiments, this can be achieved by training only a single model (rather than one or more models for each domain of interest), thereby reducing technical complexity and computational cost.
[0004] In one embodiment, the Disclosure describes a computer implementation method for training a translation model, the training comprising: (1) for each given training example of a plurality of training examples, which includes a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on the source of the second text sequence, using the translation model to generate a predicted text sequence based at least partially on the first text sequence and label of the given training example; using one or more processors of a processing system to compare the predicted text sequence with the second text sequence to generate a loss value for the given training example; and (2) using one or more processors to modify one or more parameters of the translation model based at least partially on the loss values generated for each of the plurality of training examples. In some embodiments, the label includes an internet domain. In some embodiments, the label includes an internet subdomain. In some embodiments, the label includes a uniform resource locator. In some embodiments, the label includes a website name. In some embodiments, the label includes an IP address. In some embodiments, the label further indicates the source of the first text sequence. In some embodiments, the source of the first text sequence is in a first subdomain of a given internet domain, and the source of the second text sequence is in a second subdomain of a given internet domain. In some embodiments, the method further includes generating each given training example of a plurality of training examples by using one or more processors to sample the first text sequence from a first page of a given internet domain, to sample the second text sequence from a second page of a given internet domain, and to generate a label based on all or part of the uniform resource locator of the second page.In some embodiments, the method further includes generating a given training example for each of a plurality of training examples by using one or more processors to sample a first text sequence from a first page of a given internet domain, to sample a second text sequence from a second page of a given internet domain, and to generate a label based on all or part of the IP address of the second page.
[0005] In another aspect, the disclosure describes a computer program product that includes computer-readable instructions that, when executed by a processing system, cause the processing system to perform one of the methods described in the preceding paragraphs.
[0006] In another aspect, the disclosure describes a processing system comprising: (1) a memory for storing a translation model; and (2) one or more processors coupled to the memory and configured to train the translation model according to a training method which includes: (a) for each given training example of a plurality of training examples, each training example comprising a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on the source of the second text sequence, using the translation model to generate a predicted text sequence based at least partially on the first text sequence and label of the given training example; comparing the predicted text sequence to the second text sequence to generate a loss value for the given training example; and (b) modifying one or more parameters of the translation model based at least partially on the loss values generated for each of the plurality of training examples. In some aspects, the one or more processors are configured to train the translation model with each given training example which comprises a label including an internet domain, according to the training method. In some embodiments, one or more processors are configured to train a translation model on each given training example that includes labels containing internet subdomains, according to a training method. In some embodiments, one or more processors are configured to train a translation model on each given training example that includes labels containing uniform resource locators, according to a training method. In some embodiments, one or more processors are configured to train a translation model on each given training example that includes labels containing website names, according to a training method. In some embodiments, one or more processors are configured to train a translation model on each given training example that includes labels containing IP addresses, according to a training method.In some embodiments, one or more processors are configured to train a translation model with each given training example, which includes labels indicating the source of a first text sequence and the source of a second text sequence, according to a training method. In some embodiments, one or more processors are further configured to generate each given training example of a plurality of training examples by sampling a first text sequence from a first page of a given internet domain, sampling a second text sequence from a second page of a given internet domain, and generating labels based on all or part of the uniform resource locator of the second page. In some embodiments, one or more processors are further configured to generate each given training example of a plurality of training examples by sampling a first text sequence from a first page of a given internet domain, sampling a second text sequence from a second page of a given internet domain, and generating labels based on all or part of the IP address of the second page.
[0007] In another aspect, the Disclosure describes a processing system comprising: (1) a memory for storing a translation model; and (2) one or more processors coupled to the memory and configured to use the translation model to generate a predicted translation of an input text sequence based on an input text sequence and labels, wherein the translation model is trained to generate a predicted translation according to a training method that includes: (a) for each given training example of a plurality of training examples, each training example comprising a first text sequence in a first language, a second text sequence in a second language different from the first language, and labels based on the source of the second text sequence, using the translation model to generate a predicted text sequence based at least partially on the first text sequence and labels of the given training example; comparing the predicted text sequence to the second text sequence to generate a loss value for the given training example; and (b) modifying one or more parameters of the translation model at least partially on the loss values generated for each of the plurality of training examples. [Brief explanation of the drawing]
[0008] [Figure 1] This is a functional diagram of an exemplary system in accordance with the aspects of this disclosure. [Figure 2] This is a functional diagram of an exemplary system in accordance with the aspects of this disclosure. [Figure 3] This flowchart illustrates how exemplary training examples may be generated based on pages of a website, in accordance with aspects of this disclosure. [Figure 4] This disclosure describes exemplary methods for training a translation model according to the aspects described herein. [Figure 5] An exemplary method for generating multiple training examples according to aspects of this disclosure is described. [Modes for carrying out the invention]
[0009] Next, the technology will be described in relation to the following exemplary systems and methods. Common reference numbers among the figures illustrated and described below are intended to identify the same features.
[0010] Exemplary System Figure 1 shows a high-level system diagram 100 of an exemplary processing system 102 for performing the method described herein. The processing system 102 may include one or more processors 104 and a memory 106 for storing instructions 108 and data 110. The instructions 108 and data 110 may include a translation model, as further described below. Furthermore, the data 110 may store training examples used when training the translation model (e.g., those used in pre-training, training, or fine-tuning), training signals and / or loss values generated during training, and / or predicted text sequences generated by the translation model.
[0011] The processing system 102 may reside on a single computing device. For example, the processing system 102 may be a server, a personal computer, or a mobile device, and therefore the translation model may be local to that single computing device. Similarly, the processing system 102 may reside on a cloud computing system or other distributed system. In such a case, the translation model may be distributed across two or more different physical computing devices. For example, the processing system may include a first computing device that stores layers 1 to n of a translation model having m layers, and a second computing device that stores layers n to m of the translation model. In such a case, the first computing device may be a computing device (e.g., a personal computer, a mobile phone, a tablet, etc.) that has less memory and / or processing power compared to the memory and / or processing power of the second computing device, and vice versa. Similarly, in some aspects of the present technology (for example, as further described below with respect to the exemplary method 500 in Figure 5), the processing system may include one or more computing devices that store the translation model and one or more separate computing devices configured to collect and / or generate training examples. Furthermore, in some aspects of this technology, the data used by the translation model (e.g., training data, labels used during inference, etc.) may be stored on a different computing device than the translation model.
[0012] Furthermore, in this regard, Figure 2 shows a high-level system diagram 200 in which the exemplary processing system 102 described above is distributed across two computing devices 102a and 102b, each of which two computing devices 102a and 102b may include one or more processors (104a, 104b) and memory (106a, 106b) that store instructions (108a, 108b) and data (110a, 110b). The processing system 102 is shown including computing devices 102a and 102b that communicate with one or more websites and / or remote storage systems via one or more networks 202 that include websites 204 and remote storage systems 212. In this example, website 204 includes one or more servers 206a to 206n. Each of servers 206a to 206n may have one or more processors (e.g., 208) and associated memory (e.g., 210) that stores instructions and data, including the content of one or more web pages. Similarly, although not shown, the remote storage system 212 may also include one or more processors and memory for storing instructions and data. In some embodiments of the technology, the processing system 102, including computing devices 102a and 102b, may be configured to retrieve data from one or more of the website 204 and / or the remote storage system 212 for use when training a translation model. For example, in some embodiments, the first computing device 102a may be configured to retrieve training examples from the remote storage system 212 for use in pre-training, training, or fine-tuning of the translation model stored in the first computing device 102a and / or the second computing device 102b.Similarly, in some embodiments, (as further described below with respect to the exemplary method 500 in Figure 5) the first computing device 102a may be configured to store a translation model, and the second computing device 102b may be configured to collect data from the website 204 and generate training examples based on the retrieved data for use in training the translation model. Furthermore, in such cases, the second computing device 102b may be configured to store one or more of the generated training examples on the remote store system 212 for retrieval by the first computing device 102a.
[0013] The processing systems described herein may be implemented on any type of computing device(s), such as any type of general-purpose computing device, server, or set thereof, and may also include other components typically found in general-purpose computing devices or servers. Similarly, the memory of such a processing system may be of any non-temporary type capable of storing information accessible by the processor(s) of the processing system. For example, memory may include non-temporary media such as hard drives, memory cards, optical discs, solid-state drives, and tape memory. Computing devices suitable for the roles described herein may include different combinations of the foregoing, thereby storing different parts of instructions and data in different types of media.
[0014] In any case, the computing devices described herein may further include any other components commonly used in connection with computing devices, such as a user interface subsystem. A user interface subsystem may include one or more user inputs (e.g., a mouse, keyboard, stylus, touchscreen, and / or microphone), and one or more electronic displays (e.g., a monitor with a screen, or any other electrical device capable of displaying information). Output devices other than electronic displays, such as speakers, lights, and vibration elements, pulse elements, or tactile elements, may also be included in the computing devices described herein.
[0015] Each computing device may contain one or more processors, which may or may not be commercially available central processing units ("CPU"), graphics processing units ("GPU"), or tensor processing units ("TPU"). Alternatively, one or more processors may be dedicated devices, such as ASICs or other hardware-based processors. Each processor may have multiple cores capable of operating in parallel. The processors, memory, and other elements of a single computing device may be housed in a single physical housing or distributed between two or more housings. Similarly, the memory of a computing device may include hard drives or other storage media located in a different housing from the processor(s) housing, such as in an external database or networked storage device. Thus, a reference to a processor or computing device is understood to include a collection of processors, computing devices, or memories, which may or may not operate in parallel, and a reference to one or more servers in a load-balancing server farm or cloud-based system.
[0016] The computing devices described herein may store instructions that can be directly executed by a processor(s) (such as machine code) or instructions that can be indirectly executed by a processor(s) (such as scripts). The computing devices may also store data that can be retrieved, stored, or modified by one or more processors according to the instructions. Instructions may be stored as computing device code on a computing device-readable medium. In this regard, the terms “instruction” and “program” may be used interchangeably herein. Instructions may also be stored in object code form for direct processing by a processor(s), or in any other computing device language, including scripts or a collection of independent source code modules that are interpreted on demand or pre-compiled. For example, the programming language may be C#, C++, Java®, or another computer programming language. Similarly, any component of an instruction or program may be implemented in a computer scripting language such as JavaScript®, PHP, ASP, or any other computer scripting language. Furthermore, any one of these components may be implemented using a combination of a computer programming language and a computer scripting language.
[0017] Exemplary Method Figure 3 is a flowchart 300 illustrating how an exemplary training example may be generated based on pages of a website, in accordance with aspects of this disclosure. In the example of Figure 3, the website in question is assumed to be the exemplary website 204 in Figure 2 above. Furthermore, it is assumed that website 204 contains two web pages 302a and 302b. In this example, web page 302a is from the URL "http: / / en.website.com / " and contains English text, and web page 302b is from the URL "http: / / es.website.com / " and contains the corresponding Spanish text. Thus, web pages 302a and 302b are different subdomains of the same root domain (website.com).
[0018] Figure 3 further illustrates a training example 304 that can be generated from the content of web pages 302a and 302b. In this case, training example 304 includes a first text sequence containing a sentence from web page 302a that says "This page is also available in other languages" in English, a second text sequence containing a corresponding sentence from web page 302b that says "Esta pagina esta disponible" in Spanish, and a label containing part of the URL of web page 302b. As can be seen, it would also be possible to generate a second training example in which the sentence from web page 302b is the "first text sequence", the sentence from web page 302a is the "second text sequence", and the label contains part of the URL of web page 302a.
[0019] The label in the example in Figure 3 uses the full domain name of webpage 302b, but the label may be based on any appropriate information regarding the source of webpage 302b. For example, in some embodiments, the label in training example 304 may include the full URL of webpage 302b (e.g., http: / / es.website.com / ), the domain and / or subdomain of webpage 302b (e.g., "es.website.com", "website.com", "es", "website", or "com"), the name of the website (e.g., "Website"), the IP address of webpage 302b, and / or any other appropriate information relating to the source of webpage 302b.
[0020] Similarly, although not reflected in the example in Figure 3, in some embodiments of the technology, labels may include information about the source of the first text sequence, in addition to, or instead of, information based on the source of the second text sequence. For example, in some embodiments, the label in training example 304 may include information about the source of webpage 302a, such as the full URL of webpage 302a (e.g., http: / / en.website.com / ), the domain and / or subdomain of webpage 302a (e.g., "en.website.com", "website.com", "en", "website", or "com"), the name of the website (e.g., "Website"), the IP address of webpage 302a, and / or any other appropriate information relating to the source of webpage 302a.
[0021] Furthermore, in some embodiments, the labels of training example 304 may include information not directly related to web pages 302a and 302b. For example, if web pages 302a and 302b are obtained from a curated set of websites or web pages relating to a particular topic (e.g., artificial intelligence, law, sports, etc.), the labels of training example 304 may include information relating to that topic (either alone or in addition to information from other sources).
[0022] Labels can be included in the training examples 304 in any suitable way and format. For example, in some aspects of the present technology, labels can be prepended or appended to the input sequence as vector embeddings, tokenized text, or raw text (thus requiring no extra preprocessing or special vocabulary). In that regard, if the training examples are collected from sources with similar domain names, including the raw text of the domain name in each label can increase the likelihood that the translation model can infer the similarity of the training examples of those domains.
[0023] The example of FIG. 3 assumes that the training examples 304 are generated based on text collected from web pages 302a and 302b, but it should be understood that the training examples can be generated from any suitable source available in multiple languages (such as books, user manuals, advertisements, song lyrics, etc.). Thus, as an example, the training examples can be generated from a first text sequence collected from a book, a second corresponding text sequence collected from a translated copy of the book, and a label indicating information based on sources such as the title of the book, the title of the translated copy, the name of the author, the name of the translator, etc.
[0024] FIG. 4 illustrates an exemplary method 400 for training a translation model according to an aspect of the present disclosure.
[0025] In step 402, the processing system (e.g., the processing system 102 of FIG. 1 or FIG. 2) selects a given training example from a plurality of training examples, where the given training example includes a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on the source of the second text sequence. The plurality of training examples may be from any suitable source or collection of sources. For example, the plurality of training examples may include training examples from an existing database of training data, examples generated or supervised by a human, and / or synthetically generated examples (e.g., generated according to the exemplary method 500 of FIG. 5). The label may also include any suitable information regarding the source of the second text sequence, including any of the options described above with respect to the training example 304 of FIG. 3.
[0026] Furthermore, although not reflected in the example of FIG. 4, in some aspects of the present technology, the label may also include other information instead of, or in addition to, information based on the source of the second text sequence, as described above with respect to the training example 304 of FIG. 3. In that regard, in some aspects, the label may include information regarding the source of the first text sequence instead of, or in addition to, information based on the source of the second text sequence. Similarly, in some aspects, the label may include information that is not directly related to the source of the first text sequence or the second text sequence (e.g., the field or group of topics to which the training example belongs) instead of, or in addition to, information based on the source of the second text sequence.
[0027] In step 404, the processing system uses the translation model to generate a predicted text sequence based at least partially on the first text sequence and label of a given training example (e.g., the first text sequence and label of training example 304 in Figure 3). The processing system may do this using any suitable type of translation model, architecture, and number of parameters, including transformer architectures, long short-term memory ("LSTM") architectures, recurrent neural network architectures ("RNN"), convolutional neural network ("CNN") architectures, and / or any suitable hybrids thereof. For example, in some aspects of the art, the translation model may be a deep LSTM network including multiple encoder and decoder layers (e.g., a 6-layer LSTM encoder and an 8-layer LSTM decoder, an 8-layer LSTM encoder and an 8-layer LSTM decoder, etc.). Similarly, in some aspects of the art, the translation model may be based on a hybrid architecture, such as an architecture using transformers as encoders and RNNs as decoders (e.g., a 12-layer transformer encoder and a 2-layer RNN decoder).
[0028] Furthermore, a translation model may generate a predicted text sequence directly or indirectly based on a first text sequence and labels from a given training example. Therefore, for example, a processing system or translation model may be configured to first process the first text sequence and / or labels to generate modified versions thereof (e.g., tokenized versions of the first text sequence and / or labels, labels based on the first text sequence and / or labels, vectors, etc.). In such a case, the translation model may generate a predicted text sequence based on the modified versions of the first text sequence and / or labels (e.g., tokenized versions, vectors, etc.).
[0029] In step 406, the processing system compares the predicted text sequence to a second text sequence from a given training example (e.g., the second text sequence from training example 304 in Figure 3) to generate a loss value. The processing system can perform this comparison and generate the loss value in any suitable way using any suitable loss function(s). For example, in some embodiments of the art, the processing system may be configured to compare the predicted text sequence to the second text sequence using a “hard distillation” method that evaluates how similar each string of text is to the other strings. Similarly, in some embodiments, the processing system may be configured to compare the predicted text sequence to the second text sequence using a loss in connectionist time series classification ("CTC loss") or cross-entropy loss.
[0030] In step 408, the processing system determines whether there are further training examples in the batch. In this regard, multiple training examples may be split into multiple batches, or they may be retained as a whole, in which case there will be a single “batch” containing all the training examples of the multiple first training examples. In either case, if the processing system determines that there are further training examples in the batch, as indicated by the “yes” arrow, the processing system proceeds to step 410. In step 410, the processing system selects the next given training example from the batch and then repeats steps 404-408 for the newly selected training example. This process is then repeated for the next given training example in the batch until, in step 408, the processing system determines that there are no further training examples in the batch and therefore proceeds to step 412 (as indicated by the “no” arrow).
[0031] As shown in step 412, after a loss value is generated for each given training example in a batch (in step 406), the processing system modifies one or more parameters of the translation model based at least in part on the generated loss value. The processing system may be configured to modify one or more parameters based on these generated loss values at any appropriate interval in any suitable way. For example, an optimization routine such as stochastic gradient descent may be applied to the generated loss values to determine the parameter modifications. In some aspects of the art, each “batch” may contain a single training example such that the processing system performs a backpropagation step to modify one or more parameters of the translation model each time a loss value is generated. Similarly, if each “batch” contains two or more training examples, the processing system may be configured to combine the generated loss values with a total loss value (e.g., by summing or averaging multiple loss values) and modify one or more parameters of the translation model based on the total loss value.
[0032] In step 414, the processing system determines whether there are further batches within the multiple training examples. If the multiple training examples are not split and therefore there is a single “batch” containing all the training examples within the multiple training examples, the decision in step 414 is automatically “no,” and method 400 then terminates as shown in step 418. However, if the multiple training examples are split into two or more batches, the processing system proceeds to step 416 following the “yes” arrow and selects the next given training example from the multiple training examples. This then initiates a set of other paths via steps 404-408 for each training example in the next batch and for any other modifications to one or more parameters of the translation model in step 412. This process continues until there are no further batches, at which point the processing system proceeds to step 418 following the “no” arrow.
[0033] Method 400 indicates that the process ends in step 418 when all training examples of a plurality of training examples have been used to adjust the parameters of the translation model. However, it should be understood that Method 400 can be repeated any appropriate number of times using the same plurality of training examples until each predicted text sequence is sufficiently close to its respective second text sequence of each training example. In this regard, in some aspects of the Art, the processing system may be configured to repeat Method 400 for a predetermined number of training examples. Furthermore, in some aspects, the processing system may be configured to aggregate all loss values generated during a given pass through Method 400 and determine whether to repeat Method 400 for a plurality of training examples based on the total loss value. For example, in some aspects of the Art, the processing system may be configured to repeat Method 400 for a plurality of training examples if the total loss value of the most recent pass through Method 400 exceeds some predetermined threshold. Similarly, in some embodiments, the processing system may be configured to use gradient descent and thus repeat method 400 for multiple training examples until the total loss value in a given path passing through method 400 is equal to or greater than the total loss value from the previous path.
[0034] As described above, once the translation model is trained according to Method 400, the translation model may be tested with different labels to determine which labels cause the trained translation model to produce the highest quality results for a given validation set. For example, if the trained translation model is intended to be used to translate between English and French, the validation set may be taken for that language pair (e.g., from a benchmark translation dataset, from one or more representative websites or books). Similarly, if the trained translation model is intended to perform translations in a particular topic, the validation set may be taken from sources in that topic (e.g., websites on that topic, books on that topic). Examples of validation sets can then be repeatedly given to the translation model to generate translations using each different label in the set of candidate labels. The translation sets for each candidate label can then be evaluated and compared for quality to identify which label caused the translation model to produce the most desirable results. These quality assessments can be performed in any appropriate way, for example, using any known automated quality metric (e.g., BLEU, BLEURT, ROUGE, BERT scores), comparison against a target translation (e.g., using examples from a benchmark training set containing the target translation for each input text sequence), evaluation by a human scorer, or a combination thereof.
[0035] Figure 5 illustrates an exemplary method 500 for generating multiple training examples according to an aspect of the present disclosure. In this regard, in some aspects of the present technology, the exemplary method of Figure 5 may be used to generate the multiple training examples referenced in method 400 of Figure 4.
[0036] In step 502, the processing system (e.g., processing system 102 in Figure 1 or 2, processing system 400 in Method 4, etc.) samples a first text sequence from a first page of a given internet domain (e.g., a first text sequence sampled from web page 302a to generate training example 304 in Figure 3). The processing system may perform this sampling in any suitable manner. For example, in some embodiments of the Art, the processing system may directly sample the first text sequence from the first page. Similarly, in some embodiments, the processing system may download the first page (or a portion thereof) and then sample the first text sequence from the downloaded copy or portion of the first page.
[0037] In step 504, the processing system samples a second text sequence from a second page of a given internet domain (for example, the second text sequence sampled from web page 302b to generate training example 304 in Figure 3). Again, the processing system may perform this sampling in any suitable manner. For example, in some embodiments of the technology, the processing system may directly sample the second text sequence from the second page. Similarly, in some embodiments, the processing system may download the second page (or a portion thereof) and then sample the second text sequence from the downloaded copy or portion of the second page.
[0038] In step 506, the processing system generates a label based on the source of the second text sequence (for example, a label generated based on the URL of webpage 302b to generate training example 304 in Figure 3). As described above with respect to training example 304 in Figure 3, the processing system may generate a label based on any appropriate information regarding the source of the second text sequence, including any of the options described above with respect to training example 304 in Figure 3. Thus, in some aspects of the art, the processing system may generate a label based on all or part of the URL of the second page (e.g., "http: / / es.website.com / ", "es.website.com", "website.com", "es", "website", or "com"), the name of the website (e.g., "Website"), the IP address of the second page, and / or any other appropriate information relating to the source of the second page.
[0039] Furthermore, although not reflected in the example in Figure 5, in some embodiments of this technology, the label may include other information in addition to, or instead of, the information based on the source of the second text sequence, as described above with respect to training example 304 in Figure 3. In this regard, in some embodiments, the label may include information about the source of the first text sequence in addition to, or instead of, the information based on the source of the second text sequence. For example, the label may include all or part of the URL of the first page (e.g., "http: / / en.website.com / ", "en.website.com", "website.com", "es", "website", or "com"), the name of the website (e.g., "Website"), the IP address of the second page, and / or any other appropriate information relating to the source of the first page. Similarly, in some embodiments, the label may include information that is not directly related to the source of the first or second text sequence (e.g., the topic or group to which the training example belongs) in addition to, or instead of, the information based on the source of the second text sequence.
[0040] Unless otherwise specified, the aforementioned alternatives are not mutually exclusive but can be implemented in various combinations to achieve their own advantages. Since these and other variations and combinations of the features described above can be used without departing from the subject matter defined by the claims, the foregoing descriptions of exemplary systems and methods should be considered illustrative rather than limiting the subject matter defined by the claims. Furthermore, the provision of examples described herein, and phrases such as “~etc,” “including,” and “comprising,” should not be interpreted as limiting the subject matter of the claims to specific examples; rather, the examples are intended to illustrate only a portion of the many possible embodiments. Additionally, the same reference numeral in different drawings may identify identical or similar elements.
Claims
1. A method performed by a computer, the method is This includes training a translation model, and the training is For each given training example of multiple training examples, each including a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on the source of the second text sequence, Using the translation model, generate a predicted text sequence based at least partially on the first text sequence and the labels of the given training example. Using one or more processors of the processing system, the predicted text sequence is compared with the second text sequence to generate a loss value for the given training example. This includes modifying one or more parameters of the translation model based at least in part on the loss values generated for each of the plurality of training examples using the one or more processors, The aforementioned label includes an internet domain, and the method.
2. A method performed by a computer, wherein the method is: This includes training a translation model, and the training is For each given training example of multiple training examples, each including a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on the source of the second text sequence, Using the translation model, generate a predicted text sequence based at least partially on the first text sequence and the labels of the given training example. Using one or more processors of the processing system, the predicted text sequence is compared with the second text sequence to generate a loss value for the given training example. This includes modifying one or more parameters of the translation model based at least in part on the loss values generated for each of the plurality of training examples using the one or more processors, The aforementioned label includes internet subdomains, and the method is...
3. A method performed by a computer, wherein the method is: This includes training a translation model, and the training is For each given training example of multiple training examples, each including a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on the source of the second text sequence, Using the translation model, generate a predicted text sequence based at least partially on the first text sequence and the labels of the given training example. Using one or more processors of the processing system, the predicted text sequence is compared with the second text sequence to generate a loss value for the given training example. This includes modifying one or more parameters of the translation model based at least in part on the loss values generated for each of the plurality of training examples using the one or more processors, The aforementioned label includes a uniform resource locator, and the method is as follows.
4. A method performed by a computer, wherein the method is: This includes training a translation model, and the training is For each given training example of multiple training examples, each including a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on the source of the second text sequence, Using the translation model, generate a predicted text sequence based at least partially on the first text sequence and the labels of the given training example. Using one or more processors of the processing system, the predicted text sequence is compared with the second text sequence to generate a loss value for the given training example. This includes modifying one or more parameters of the translation model based at least in part on the loss values generated for each of the plurality of training examples using the one or more processors, The aforementioned label includes the website name, method.
5. A method performed by a computer, wherein the method is: This includes training a translation model, and the training is For each given training example of multiple training examples, each including a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on the source of the second text sequence, Using the translation model, generate a predicted text sequence based at least partially on the first text sequence and the labels of the given training example. Using one or more processors of the processing system, the predicted text sequence is compared with the second text sequence to generate a loss value for the given training example. This includes modifying one or more parameters of the translation model based at least in part on the loss values generated for each of the plurality of training examples using the one or more processors, The aforementioned label includes an IP address.
6. A method performed by a computer, wherein the method is: This includes training a translation model, and the training is For each given training example of multiple training examples, each including a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on the source of the second text sequence, Using the translation model, generate a predicted text sequence based at least partially on the first text sequence and the labels of the given training example. Using one or more processors of the processing system, the predicted text sequence is compared with the second text sequence to generate a loss value for the given training example. This includes modifying one or more parameters of the translation model based at least in part on the loss values generated for each of the plurality of training examples using the one or more processors, Using the one or more processors, Sampling the first text sequence from a first page of a given internet domain, Sampling the second text sequence from the second page of the given internet domain, The label is generated based on all or part of the uniform resource locator on the second page, A method further comprising generating a given training example for each of the plurality of training examples.
7. A method performed by a computer, wherein the method is: This includes training a translation model, and the training is For each given training example of multiple training examples, each including a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on the source of the second text sequence, Using the translation model, generate a predicted text sequence based at least partially on the first text sequence and the labels of the given training example. Using one or more processors of the processing system, the predicted text sequence is compared with the second text sequence to generate a loss value for the given training example. This includes modifying one or more parameters of the translation model based at least in part on the loss values generated for each of the plurality of training examples using the one or more processors, Using the one or more processors, Sampling the first text sequence from a first page of a given internet domain, Sampling the second text sequence from the second page of the given internet domain, The label is generated based on all or part of the IP address of the second page, A method further comprising generating a given training example for each of the plurality of training examples.
8. The method according to any one of claims 1 to 7, wherein the label further indicates the source of the first text sequence.
9. The method according to any one of claims 1 to 7, wherein the source of the first text sequence is located in a first subdomain of a given internet domain, and the source of the second text sequence is located in a second subdomain of the given internet domain.
10. A processing system, Memory to store the translation model, The system comprises one or more processors coupled to the memory and configured to train the translation model according to a training method, The aforementioned training method is For each given training example of multiple training examples, each including a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on the source of the second text sequence, Using the translation model, generate a predicted text sequence based at least partially on the first text sequence and the labels of the given training example. The predicted text sequence is compared with the second text sequence to generate the loss value for the given training example. This includes modifying one or more parameters of the translation model based at least partially on the loss values generated for each of the plurality of training examples, A processing system in which one or more processors are configured to train the translation model with each given training example which includes an internet domain, according to the training method.
11. A processing system, Memory to store the translation model, The system comprises one or more processors coupled to the memory and configured to train the translation model according to a training method, The aforementioned training method is For each given training example of multiple training examples, each including a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on the source of the second text sequence, Using the translation model, generate a predicted text sequence based at least partially on the first text sequence and the labels of the given training example. The predicted text sequence is compared with the second text sequence to generate the loss value for the given training example. This includes modifying one or more parameters of the translation model based at least partially on the loss values generated for each of the plurality of training examples, A processing system in which one or more processors are configured to train the translation model with each given training example which includes a label including an internet subdomain, according to the training method.
12. A processing system, Memory to store the translation model, The system comprises one or more processors coupled to the memory and configured to train the translation model according to a training method, The aforementioned training method is For each given training example of multiple training examples, each including a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on the source of the second text sequence, Using the translation model, generate a predicted text sequence based at least partially on the first text sequence and the labels of the given training example. The predicted text sequence is compared with the second text sequence to generate the loss value for the given training example. This includes modifying one or more parameters of the translation model based at least partially on the loss values generated for each of the plurality of training examples, A processing system in which one or more processors are configured to train the translation model with each given training example which includes a label including a uniform resource locator, according to the training method.
13. A processing system, Memory to store the translation model, The system comprises one or more processors coupled to the memory and configured to train the translation model according to a training method, The aforementioned training method is For each given training example of multiple training examples, each including a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on the source of the second text sequence, Using the translation model, generate a predicted text sequence based at least partially on the first text sequence and the labels of the given training example. The predicted text sequence is compared with the second text sequence to generate the loss value for the given training example. This includes modifying one or more parameters of the translation model based at least partially on the loss values generated for each of the plurality of training examples, A processing system in which one or more processors are configured to train the translation model with each given training example which includes a label including a website name, according to the training method.
14. A processing system, Memory to store the translation model, The system comprises one or more processors coupled to the memory and configured to train the translation model according to a training method, The aforementioned training method is For each given training example of multiple training examples, each including a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on the source of the second text sequence, Using the translation model, generate a predicted text sequence based at least partially on the first text sequence and the labels of the given training example. The predicted text sequence is compared with the second text sequence to generate the loss value for the given training example. This includes modifying one or more parameters of the translation model based at least partially on the loss values generated for each of the plurality of training examples, A processing system in which one or more processors are configured to train the translation model with each given training example which includes a label including an IP address, according to the training method.
15. A processing system, Memory to store the translation model, The system comprises one or more processors coupled to the memory and configured to train the translation model according to a training method, The aforementioned training method is For each given training example of multiple training examples, each including a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on the source of the second text sequence, Using the translation model, generate a predicted text sequence based at least partially on the first text sequence and the labels of the given training example. The predicted text sequence is compared with the second text sequence to generate the loss value for the given training example. This includes modifying one or more parameters of the translation model based at least partially on the loss values generated for each of the plurality of training examples, The one or more processors described above are Sampling the first text sequence from a first page of a given internet domain, Sampling the second text sequence from the second page of the given internet domain, The label is generated based on all or part of the uniform resource locator on the second page, A processing system further configured to generate a given training example for each of the plurality of training examples.
16. A processing system, Memory to store the translation model, The system comprises one or more processors coupled to the memory and configured to train the translation model according to a training method, The aforementioned training method is For each given training example of multiple training examples, each including a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on the source of the second text sequence, Using the translation model, generate a predicted text sequence based at least partially on the first text sequence and the labels of the given training example. The predicted text sequence is compared with the second text sequence to generate the loss value for the given training example. This includes modifying one or more parameters of the translation model based at least partially on the loss values generated for each of the plurality of training examples, The one or more processors described above are Sampling the first text sequence from a first page of a given internet domain, Sampling the second text sequence from the second page of the given internet domain, The label is generated based on all or part of the IP address of the second page, A processing system further configured to generate a given training example for each of the plurality of training examples.
17. The processing system according to any one of claims 10 to 16, wherein one or more processors are configured to train the translation model with each given training example including a source for a first text sequence and a label indicating the source for a second text sequence, according to the training method.
18. A program that, when executed by a processing system, includes a computer-readable instruction causing the processing system to perform the method according to any one of claims 1 to 7.