Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

345 results about "Source text" patented technology

A source text is a text (sometimes oral) from which information or ideas are derived. In translation, a source text is the original text that is to be translated into another language.

Confidence-driven rewriting of source texts for improved translation

A method for rewriting source text includes receiving source text including a source text string in a first natural language. The source text string is translated with a machine translation system to generate a first target text string in a second natural language. A translation confidence for the source text string is computed, based on the first target text string. At least one alternative text string is generated, where possible, in the first natural language by automatically rewriting the source string. Each alternative string is translated to generate a second target text string in the second natural language. A translation confidence is computed for the alternative text string based on the second target string. Based on the computed translation confidences, one of the alternative text strings may be selected as a candidate replacement for the source text string and may be proposed to a user on a graphical user interface.
Owner:XEROX CORP

Hybrid adaptation of named entity recognition

A machine translation method includes receiving a source text string and identifying any named entities. The identified named entities may be processed to exclude common nouns and function words. Features are extracted from the source text string relating to the identified named entities. Based on the extracted features, a protocol is selected for translating the source text string. A first translation protocol includes forming a reduced source string from the source text string in which the named entity is replaced by a placeholder, translating the reduced source string by machine translation to generate a translated reduced target string, while processing the named entity separately to be incorporated into the translated reduced target string. A second translation protocol includes translating the source text string by machine translation, without replacing the named entity with the placeholder. The target text string produced by the selected protocol is output.
Owner:XEROX CORP

Machine translation-driven authoring system and method

An authoring method includes generating an authoring interface configured for assisting a user to author a text string in a source language for translation to a target string in a target language. Initial source text entered by the user is received through the authoring interface. Source phrases are selected that each include at least one token of the initial source text as a prefix and at least one other token as a suffix. The source phrase selection is based on a translatability score and optionally on fluency and semantic relatedness scores. A set of candidate phrases is proposed for display on the authoring interface, each of the candidate phases being the suffix of a respective one of the selected source phrases. The user may select one of the candidate phrases, which is appended to the source text following its corresponding prefix, or may enter alternative text. The process may be repeated until the user is satisfied with the source text and the SMT model can then be used for its translation.
Owner:XEROX CORP

Lexical and phrasal feature domain adaptation in statistical machine translation

A translation method is adapted to a domain of interest. The method includes receiving a source text string comprising a sequence of source words in a source language and generating a set of candidate translations of the source text string, each candidate translation comprising a sequence of target words in a target language. An optimal translation is identified from the set of candidate translations as a function of at least one domain-adapted feature computed based on bilingual probabilities and monolingual probabilities. Each bilingual probability is for a source text fragment and a target text fragment of the source text string and candidate translation respectively. The bilingual probabilities are estimated on an out-of-domain parallel corpus that includes source and target strings. The monolingual probabilities for text fragments of one of the source text string and candidate translation are estimated on an in-domain monolingual corpus.
Owner:XEROX CORP

Lexical and phrasal feature domain adaptation in statistical machine translation

A translation method is adapted to a domain of interest. The method includes receiving a source text string comprising a sequence of source words in a source language and generating a set of candidate translations of the source text string, each candidate translation comprising a sequence of target words in a target language. An optimal translation is identified from the set of candidate translations as a function of at least one domain-adapted feature computed based on bilingual probabilities and monolingual probabilities. Each bilingual probability is for a source text fragment and a target text fragment of the source text string and candidate translation respectively. The bilingual probabilities are estimated on an out-of-domain parallel corpus that includes source and target strings. The monolingual probabilities for text fragments of one of the source text string and candidate translation are estimated on an in-domain monolingual corpus.
Owner:XEROX CORP

Translation quality quantifying apparatus and method

A system for automating the quality evaluation of a translation. The system may include a computer having a processor and memory device operably connected to one another. A source text in a first language may be stored within the memory device. A target text comprising a translation of the source text into a second language may also be stored within the memory device. Additionally, a plurality of executables may be stored on the memory device and be configured to, when executed by the processor, independently identify a test sample comprising one or more blocks, each comprising a matched set having a source portion selected from the source text and a corresponding target portion selected from the target text.
Owner:MULTILING CORP

Audio renderings for expressing non-audio nuances

Methods, systems, computer program products, and methods of doing business by adapting audio renderings of non-audio messages (for example, e-mail messages that are processed by a text-to-speech translator) to reflect various nuances of the non-audio information. Audio cues are provided for this purpose, which are sounds that are “mixed” in with the audio rendering as a separate (background) audio stream. Audio cues may reflect information such as the topical structure of a text file, or changes in paragraphs. Or, audio cues may be used to signal nuances such as changes in the color or font of the source text. Audio cues may also be advantageously used to reflect information about the translation process with which the audio rendering of a text file was created, such as using varying background tones to convey the degree of certainty in the accuracy of translating text to audio using a text-to-speech translation system, or of translating audio to text using a voice recognition system, or of translating between languages, and so forth. Stylesheets, such as those encoded in the Extensible Stylesheet Language (“XSL”), may optionally be used to customize the audio cues. For example, a user-specific stylesheet customization may be performed to override system-wide default audio cues for a particular user, enabling her to hear a different background sound for messages on a particular topic than other users will hear.
Owner:CERENCE OPERATING CO

Means and Method for Adapted Language Translation

This invention relates to a means and a method for translating source text into a target text where the context information is taken into consideration. A source text unit is defined around a translation unit which is to be translated. This source text unit is mapped onto a bilingual sublanguage space where the bilingual sublanguage space comprises a source sublanguage space and mappings to the target language. The translation is adapted to the source text unit, thereby considering contextual information.
Owner:NAT RES COUNCIL OF CANADA

Hybrid machine translation

A system and method for hybrid machine translation approach is based on a statistical transfer approach using statistical and linguistic features. The system and method may be used to translate from one language into another. The system may include at least one database, a rule based translation module, a statistical translation module and a hybrid machine translation engine. The database(s) store source and target text and rule based language models and statistical language models. The rule based translation module translates source text based on the rule based language models. The statistical translation module translates source text based on the statistical language models. A hybrid machine translation engine, having a maximum entropy algorithm, is coupled to the rule based translation module and the statistical translation module and is capable of translating source text into target text based on the rule based and statistical language models.
Owner:EBAY INC

Machine translation using non-contiguous fragments of text

A machine translation method for translating source text from a first language to target text in a second language includes receiving the source text in the first language and accessing a library of bi-fragments, each of the bi-fragments including a text fragment from the first language and a text fragment from the second language, at least some of the bi-fragments comprising non-contiguous bi-fragments in which at least one of the text fragment from the first language and the text fragment from the second language comprises a non-contiguous fragment.
Owner:XEROX CORP

Selected text obfuscation and encryption in a local, network and cloud computing environment

A keyboarded a mix of private and public text composing a source-text document is submitted through an encryption adapter intersituated in a data signal link between the keyboard and a computer system. A user selects and obfuscates the private text character portions with surrogate cloak characters in concurrent alternation with direct entry of the public text characters. A resultant protected document's data signal is safeguarded for editing and data storage without revelation of the private text content. The protected document is sufficiently secure for submission into a cloud computing environment and is immune to key-entry tracking malware. Clandestine hacking of residual data remaining in the computer firmware is negated in interpretative value by enabling the user's selective obfuscation of the source-text document's private text character content with surrogate cloak characters prior to entry into the computer system's keyboard input port.
Owner:SAVVYSTUFF PROPERTY TRUST

Adaptive machine translation

A computer-implemented method for providing information to an automatic machine translation system to improve translation accuracy is disclosed. The method includes receiving a collection of source text. An attempted translation that corresponds to the collection of source text is received from the automatic machine translation system. A correction input, which is configured to effectuate a correction of at least one error in the attempted translation, is also received. Finally, information is provided to the automatic machine translation system to reduce the likelihood that the error will be repeated in subsequent translations generated by the automatic machine translation system.
Owner:MICROSOFT TECH LICENSING LLC

Method and system for smart search engine and other applications

The present invention provides a new method for indexing a given text objects, using text parsing module and words indexing databases. According to this method each word is assigned a first index code according to words meaning, a second index code according to each word syntax category and a third index code according to word syntactical role. The words indices are arranged according to hierarchical order based on syntactical relations between the text words. At the last stage, differentiating symbols, which represent indices hierarchical order, are assigned between adjacent words indices. The indexing process may be implemented as automatic computerized program or as wizard application enabling human intervention in the indexing process. The indexing method can be utilized for enabling text search utilities based on matching between The query indices and source text indices.
Owner:GOVRIN OMRI +1

Computer-assisted natural language translation

A computer implemented method of translating source material in a source natural language into a target natural language includes receiving a first data input which is a first part of a sub-segment of a translation of the source material from the source natural language into the target natural language, identifying a selectable target text sub-segment in the target natural language associated with the received first data input, and outputting the selectable target text sub-segment. The selectable target text sub-segment is extracted from a corpus of previously translated text segment pairs, each text segment pair having a source text segment in the source natural language and a corresponding translated text segment in the target natural language.
Owner:SDL LTD

Method and system for encoding and accessing linguistic frequency data

Linguistic frequency data is encoded by identifying a plurality of sets of character strings in a source text, where each set comprises at least a first and a second character string. Frequency data is obtained for each set and stored at a memory position in a first memory array that is assigned to each first character string. A pointer pointing to a position in the first memory array that has been assigned to the corresponding first character string of the respective set and which has stored the frequency data of the respective set, is stored in a second memory array for each set comprising each character string that is a second character string. The encoded data is accessed by identifying regions in the memory arrays that are each assigned a search string and a pointer pointing to a position in the first memory array.
Owner:XEROX CORP

Machine translation using elastic chunks

A machine translation method includes receiving source text in a first language and retrieving text fragments in a target language from a library of bi-fragments to generate a target hypothesis. Each bi-fragment includes a text fragment from the first language and a corresponding text fragment from the second language. Some of the bi-fragments are modeled as elastic bi-fragments where a gap between words is able to assume a variable size corresponding to a number of other words to occupy the gap. The target hypothesis is evaluated with a translation scoring function which scores the target hypothesis according to a plurality of feature functions, at least one of the feature functions comprising a gap size scoring feature which favors hypotheses with statistically more probable gap sizes over hypotheses with statically less probable gap sizes.
Owner:XEROX CORP

Adaptive machine translation

A computer-implemented method for providing information to an automatic machine translation system to improve translation accuracy is disclosed. The method includes receiving a collection of source text. An attempted translation that corresponds to the collection of source text is received from the automatic machine translation system. A correction input, which is configured to effectuate a correction of at least one error in the attempted translation, is also received. Finally, information is provided to the automatic machine translation system to reduce the likelihood that the error will be repeated in subsequent translations generated by the automatic machine translation system.
Owner:MICROSOFT TECH LICENSING LLC

Document de-registration

A document accessible over a network can be registered. A registered document, and the content contained therein, cannot be transmitted undetected over and off of the network. In one embodiment, the invention includes maintaining a plurality of stored signatures in a signature database, each signature being associated with one of a plurality of registered documents. In one embodiment, the invention further includes maintaining the signature database by de-registering documents by removing the signatures associated with de-registered documents. In one embodiment, the invention further includes maintaining the database by removing redundant and high detection rate signatures. In one embodiment, the invention also includes maintaining the signature database by removing signatures based on the source text used to generate the signature.
Owner:MCAFEE LLC

System for natural language understanding

A general-purpose apparatus for analyzing natural language text that allows for the implementation of a broad range of natural language understanding applications. The apparatus for natural language understanding analyzes a source text and transforms the source text into a semantically-interpretable syntactic representation (SISR), comprising a syntax template and semantic clause annotations. The general-purpose apparatus for natural language understanding is adaptable to various source text natural languages and is adaptable to various natural language understanding applications, such as query answering, translation, summarization, information extraction, disambiguation, and parsing. A natural language query answering apparatus for answering questions about a source text, whereby the query answering apparatus utilizes the general-purpose apparatus for transforming the natural language query into SISR format.
Owner:GHANNAM RIMA +1

Method and system for identifying sentence boundaries

The present invention is directed to systems and methods for isolating sentence boundaries between sentences in text. Sentences of the normalized document feeds or source text are separated by determining boundaries between individual sentences, by a Bayesian algorithm, that has been seeded with rule frequencies, developed from a previous training phase, that employed a text of sentences with marked boundaries between the sentences.
Owner:JILES

Method and apparatus for automated measurement of quality for machine translation

A machine translation quality determination mechanism uses comparisons of subsequent and potentially numerous reverse translations of a translated human language back to the source language. The process of translating from source language to target language to source language may iterate many times to ultimately yield information as to an assertion of low quality translation. Thus, the present invention continuously iterates this “back-and-forth” translation until the resulting source human language text is not reasonably equivalent to the original source human language or until the process iterates a predetermined number of times. If the back-and-forth translation results in a source human language text that is not reasonably equivalent to the original source text, then the translation is identified as low quality.
Owner:IBM CORP

Compression and abbreviation for fixed length messaging

A method, computer program product, and data processing system for compressing and abbreviating text messages at a first text messaging device for transport and subsequent interpretation at a second text messaging device is disclosed. A user-defined message length reduction profile for producing human-readable compressed text is associated with a source text message at a first text messaging device. The source text message is then shortened using abbreviations and transformation rules in the profile. The shortened text message can then be transmitted to a second text messaging device. In addition, the compression provided by the present invention, although intended to be human-readable, can be complemented with decompression software to expand the compressed and abbreviated text to its full length and verifying, using a checksum or other error detecting code, that the expanded version corresponds to the original text.
Owner:IBM CORP

Statistical machine translation adapted to context

This invention relates to a means and a method for translating source text into a target text where the context information is taken into consideration. A source text unit is defined around a translation unit which is to be translated. This source text unit is mapped onto a bilingual sublanguage space where the bilingual sublanguage space comprises a source sublanguage space and mappings to the target language. The translation is adapted to the source text unit, thereby considering contextual information.
Owner:NAT RES COUNCIL OF CANADA

Statistics-based machine translation method and apparatus, and electronic device

The present invention discloses a statistics-based machine translation method and apparatus and an electronic device, a semantic similarity-degree calculation method and apparatus and an electronic device, and a word quantization method and apparatus and an electronic device. The statistics-based machine translation method comprises: according to a feature that affects a translation probability and that is of each candidate translation and a pre-generated translation probability prediction model generating a translation probability of a sentence to be translated into each candidate translation, wherein the feature that affects the translation probability at least comprises a semantic similarity-degree between the sentence to be translated and the candidate translation; and selecting a preset number of candidate translations whose translation probabilities rank top as a translation of the sentence to be translated. By adoption of the statistics-based machine translation method provided by the present application, the semantic level of the natural language can be reached deeply when the machine translation model is constructed, and the deviation of semantics between the translation and the source text is avoided, so as to achieve the effect of improving translation quality.
Owner:阿里巴巴(中国)网络技术有限公司

In-context exact (ICE) matching

Methods, systems and program product are disclosed for determining a matching level of a text lookup segment with a plurality of source texts in a translation memory in terms of context. In particular, the invention determines any exact matches for the lookup segment in the plurality of source texts, and determines, in the case that at least one exact match is determined, that a respective exact match is an in-context exact (ICE) match for the lookup segment in the case that a context of the lookup segment matches that of the respective exact match. The degree of context matching required can be predetermined, and results prioritized. The invention also includes methods, systems and program products for storing a translation pair of source text and target text in a translation memory including context, and the translation memory so formed. The invention ensures that content is translated the same as previously translated content and reduces translator intervention.
Owner:SDL INK

Internet public opinion analysis method

InactiveCN105740228AAnalysis helpsMonitoring helpsWeb data indexingSemantic analysisStatistical analysisOpinion analysis
The invention discloses an internet public opinion analysis method. The internet public opinion analysis method comprises the steps of firstly for selected and acquired events, partitioning a source text of a microblog and removing partition items unrelated to sentiments; secondly making statistics by adopting a statistic analysis tool to obtain an input of a sentiment classification model; and finally for the input, modeling related words, expressions and symbols capable of expressing the sentiments in the microblog content by using a classification algorithm, giving out comprehensive sentiment index assessment, obtaining sentiment categories, and performing public opinion monitoring and sentiment trend analysis. According to the method, the words, the expressions, the symbols and the like in the microblog are subjected to sentiment modeling, and the response situations of hot events in the microblog can be automatically classified and effectively monitored through sentiment index calculation, so that the public opinion risk can be effectively assessed and intemperate events can be prevented and controlled.
Owner:YUNNAN UNIV

In-context exact (ICE) matching

Methods, systems and program product are disclosed for determining a matching level of a text lookup segment with a plurality of source texts in a translation memory in terms of context. In particular, the invention determines any exact matches for the lookup segment in the plurality of source texts, and determines, in the case that at least one exact match is determined, that a respective exact match is an in-context exact (ICE) match for the lookup segment in the case that a context of the lookup segment matches that of the respective exact match. The degree of context matching required can be predetermined, and results prioritized. The invention also includes methods, systems and program products for storing a translation pair of source text and target text in a translation memory including context, and the translation memory so formed. The invention ensures that content is translated the same as previously translated content and reduces translator intervention.
Owner:SDL INK

Apparatus and method for adding information to a machine translation dictionary

Given a source text, a desired translation of the source text into a target language, and a machine-readable dictionary, a first set of morphemes in the target language is generated from the source text, typically by using the dictionary to perform a machine translation of the source text. The second text is analyzed into a second set of morphemes in the target language. Differences between the first and second sets of morphemes are found, and morphemes corresponding to the differences are taken from the source text. Existing information including these source-text morphemes is extracted from the dictionary, and new information to be added to the dictionary is automatically generated from the extracted information and the differences. This process generates comparatively short dictionary entries, corresponding only to the differences between the two set of morphemes, and therefore creates useful dictionary entries while saving dictionary space.
Owner:OKI ELECTRIC IND CO LTD

Method and apparatus to facilitate high-quality translation of texts by multiple translators

A method for facilitating the high quality translation of matter by multiple translators, which includes a processor obtaining translated segments from a first group, where each translated segment is a translation of a source text segment, where each source text segment is a portion of a source text, and where for each of the source text segments, at least one translated segment is obtained, and selecting a second group and notifying the second group of an opportunity, where the opportunity comprises the group accessing the translated segments obtained from the first group and the second group providing data regarding the quality of the translated segments. The method also includes obtaining data regarding the quality of the segments from the second group and determining a designated translated segment for each source text segment and generating a final translation.
Owner:PROZ COM

Optimized program code generator, a method for compiling a source text and a computer-readable medium for a processor capable of operating with a plurality of instruction sets

An improved optimization technique is described for a program code for a processor capable of operating on the basis of either one of a plurality of instruction sets in a computer system. The optimization is carried out by the steps of reading said program code of a target program to be optimized, estimating the costs of executable instruction sequences respectively to be obtained by translating said program code on the basis of said plurality of instruction sets; and determining an optimum one of said plurality of instruction sets for translating said program code by evaluating the costs as estimated under a predetermined criteria.
Owner:KK TOSHIBA +1
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products