Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

36 results about "Morphological parsing" patented technology

Morphological parsing, in natural language processing, is the process of determining the morphemes from which a given word is constructed. It must be able to distinguish between orthographic rules and morphological rules. For example, the word 'foxes' can be decomposed into 'fox' (the stem), and 'es' (a suffix indicating plurality).

Method for synthesizing a self-learning system for extraction of knowledge from textual documents for use in search

The invention relates to computer science, information-search and intelligent systems, and can be used in developing information-search and other information and intelligent systems that operate on the basis of Internet. The invention provides the possibility of automatic creation of knowledge by extraction of knowledge from textual documents in electronic form in different languages; intelligent processing of textual information and users' requests to extract knowledge in any foreign language. The claimed method provides a mechanism of self-learning in the form of a stochastically indexed system of artifical intelligence, providing automatic instruction of the system in rules of grammatical and semantic analysis. The method includes creating databases of stochastically indexed dictionaries, tables of indices of linguistic texts and knowledge bases of morphological analysis; performing morphological and syntactical analysis, and also stochastic indexing of textual documents in respect to a given theme from the search system in a given language, and creating knowledge base of syntactical analysis. Stochastically indexed textual documents pertaining to the given theme are subjected to semantic analysis, and knowledge bases of semantic analysis. A user's request is compiled and transformed, in the stochastically indexed form, into a plurality of new requests that are equivalent to the original request; and stochastically indexed fragments of textual documents that comprise all word combinations of the transformed request are selected. A stochastically indexed structure is generated from the selected documents and basing on said structure by means of logical conclusion a brief reply of the system is generated. Relevancy of the obtained brief reply is checked by generating an interrogative sentence based on said reply, and by comparing said sentence with the request. When the user's request is identical to the obtained interrogative sentence, the decision is made that the brief reply of the system is identical to the request, and the reply is submitted to the user.
Owner:VLADIMIR VLADIMIROVICH NASYPNY

Morphological analyzer and analysis method

A morphological analyzer divides a received text into known words and unknown words, divides the unknown words into their constituent characters, analyzes known words on a word-by-word basis, and analyzes unknown words on a character-by-character basis to select a hypothesis as to the morphological structure of the received text. Although unknown words are divided into their constituent characters for analytic purposes, they are reassembled into words in the final result, in which any unknown words are preferably tagged as being unknown. This method of analysis can process arbitrary unknown words without requiring extensive computation, and with no loss of accuracy in the processing of known words.
Owner:OKI ELECTRIC IND CO LTD

Abstract generation method and program product

The present invention relates to an abstract generation method of generating an abstract from document information, such as an electronic patient chart, and a program product that implements the abstract generation method, and has an object to make it possible to display only main parts of sentences concisely and effectively. When document information (electronic patient chart, for instance) is inputted into a system, morphological analysis is performed on the document information and it is judged whether a part of a sentence matches the whole of another sentence. When a matching result is obtained, a partially matching character string is set as a simplified sentence candidate. On the other hand, when a matching result is not obtained, the sentence is set as a simplification candidate as it is. Note that even when the partially matching result is obtained, when the number of characters of the matching character string is less than M or when the number of morphemes thereof is less than N, the partially matching character string is not set as the simplified sentence candidate but the sentence is set as the simplification candidate as it is. Next, each simplification candidate containing a keyword is extracted from among generated simplification candidates and is set as a summary candidate. Then, an abstract is generated by marking each part of the input document corresponding to the summary candidate.
Owner:SANYO ELECTRIC CO LTD

Neural machine translation method and apparatus

The present invention provides a method of generating training data to which explicit word-alignment information is added without impairing sub-word tokens, and a neural machine translation method and apparatus including the method. The method of generating training data includes the steps of: (1) separating basic word boundaries through morphological analysis or named entity recognition of a sentence of a bilingual corpus used for learning; (2) extracting explicit word-alignment information from the sentence of the bilingual corpus used for learning; (3) further dividing the word boundaries separated in step (1) into sub-word tokens; (4) generating new source language training data by using an output from the step (1) and an output from the step (3); and (5) generating new target language training data by using the explicit word-alignment information generated in the step (2 ) and the target language outputs from the steps (1) and (3).
Owner:ELECTRONICS & TELECOMM RES INST

System and method for differential document analysis and storage

Systems and methods for differential document analysis and storage are provided. Specifically, the system can be configured to perform one or more differential analyses on a set of documents to detect and measure changes in language across entire sets of documents of a similar type, as well as changes in language in the specific objects (e.g., document sections, paragraphs, clauses) of the documents. The system comprises three primary components: document parsing, textual near-duplicate detection, and morphological analysis. The document parsing component breaks documents down into objects and creates indexes for each full document and components of the document. These indexes enable documents and objects to be compared for similarity using the near-duplicate detection component, which implements various similarity analysis algorithms. The morphological analyses component is configured to search the documents for particular language or sections and compare documents in which the searched language is present.
Owner:PLANET DATA SOLUTIONS INC

Generic system for linguistic analysis and transformation

A system providing a set of natural language processing functionalities, such as named entity extraction, domain extraction, sense disambiguation, automatic translation between different natural languages, morphological analysis, tokenization, via a unified process of analysis and transformation, using underlying linguistic database. The invention can accept text input and can be used to translate text, find out the correct sense of a word, obtain the main subject of a text, obtain the grammatical attributes of a word, paraphrase a text, and search for specific entities within the input text.
Owner:LINGUASYS

Method for predicting the readings of japanese ideographs

System and methods allowing for effective and reliable reading predictions for Japanese ideographs are provided. In an illustrative implementation, a reading predictions system operating in "learning" and "execution / run-time" modes is provided. In the "learning" mode the reading predictions system operates on a number of input sources to produce a decision tree that is used in the "execution / run-time" mode to return reading predictions for inputted Japanese sentences containing Japanese ideographs. Among the inputs utilized in the "learning" mode are base Japanese script readings, a training corpus, and quasi-phonological rules. From these inputs underlying readings and a decision tree are created. When operating in the "execution / run-time" mode, the reading predictions system employs a morphological analyzer to perform a morphology analysis on inputted sentences. Using the morphological analysis, the quasi-phonological rules, the underlying readings, and the decision tree reading predictions are provided.
Owner:MICROSOFT TECH LICENSING LLC

Morphological analyzer, natural language processor, morphological analysis method and program

The invention can include a token list generating unit 11 for decomposing a natural language text to be processed into tokens that are components of the natural language text and registering them on a token list, and a token string selecting unit 13 for selecting optimum token strings for composing the natural language text to be processed on the basis of the token list generated by the token list generating unit 11. The token list generating unit 11 registers, on the token list, tokens among the tokens obtained by decomposing the natural language text to be processed except tokens decomposable into smaller tokens according to conditions imposed on the morphological analysis.
Owner:IBM CORP

System and method for disambiguating non diacritized arabic words in a text

The present invention proposes a solution to the problem of word lexical disambiguation in Arabic texts. This solution is based on text domain-specific knowledge, which facilitates the automatic vowel restoration of modern standard Arabic scripts. Texts similar in their contents, restricted to a specific field or sharing a common knowledge can be grouped in a specific category or in a specific domain (examples of specific domains; sport, art, economic, science . . . ). The present invention discloses a method, system and computer program for lexically disambiguating non diacritized Arabic words in a text based on a learning approach that exploits; Arabic lexical look-up, and Arabic morphological analysis, to train the system on a corpus of diacritized Arabic text pertaining to a specific domain. Thereby, the contextual relationships of the words related to a specific domain are identified, based on the valid assumption that there is less lexical variability in the use of the words and their morphological variants within a domain compared to an unrestricted text.
Owner:MACHINES CORP INT BUSINESS

Information Processing Apparatus, Informaton Processing Method, Program, and Recording Medium

Disclosed herein is an information processing apparatus for analyzing text data, including: acquisition means for acquiring the text data; morpheme information registration means for registering morpheme information for use in analyzing the text data morphologically; morphological analysis means for analyzing the text data acquired by the acquisition means; compound word processing rule registration means for registering compound word processing rules for creating a compound word not registered in the morpheme information registration means; and compound word processing means, by use of the compound word processing rules registered in the compound word processing rule registration means, for combining the morphemes included in the morphological analysis information created by the morphological analysis means, into the compound word not registered in the morpheme information registration means and detecting the created compound word.
Owner:SONY CORP

Program recommending apparatus and method

The invention relates to a program recommendation apparatus and method. The apparatus includes: a module configured to extract category information and program abstracts of programs contained in an electronic program guide, extract program-specific terms from the program abstracts by morphological analysis and combine the category information and the program-specific terms to generate category-added terms; a module configured to analyze a history of programs viewed by a user based on the generated category-added terms to generate a preference vector indicating user's preferences for programs; a module analyzing the program abstracts based on the category-added terms to generate broadcast program vectors; a module generating a relevant term model for the category-added terms; a module calculating similarities between the preference vector and each of the broadcast program vectors based on the generated relevant term model; and a module outputting programs having the calculated similarities satisfying a predetermined condition as recommended programs matching with the user's preferences.
Owner:KK TOSHIBA

Alignment system and aligning method for multilingual documents

In order to realize an alignment system for multilingual documents as efficiently aligns sentences among the documents of the same contents formed of a plurality of languages, an alignment system for multilingual documents according to the present invention comprises morphological analysis means for dividing the documents in n sorts of languages (: n being a natural number of at least 2), every word, means for selecting two of the n sorts of languages of the documents, means for computing an evaluation function for the documents in the two selected sorts of languages, and means for aligning the documents in the n sorts of languages in accordance with evaluated results.
Owner:OKI ELECTRIC IND CO LTD

Annotation Assisting Apparatus and Computer Program Therefor

annotation data generation assisting system includes: an input / output device receiving an input through an interactive process; morphological analysis system and dependency parsing system performing morphological and dependency parsing on text data in text archive; first to fourth candidate generating units detecting a zero anaphor or a referring expression in the dependency relation of a predicate in a sequence of morphemes, identifying a position as an object of annotation and estimating candidates of expressions to be inserted by using language knowledge; a candidate DB storing estimated candidates; and an interactive annotation device reading candidates of annotation from candidate DB and annotate a candidate selected by an interactive process by input / output device.
Owner:NAT INST OF INFORMATION & COMM TECH

System and method for selection of meaningful page elements with imprecise coordinate selection for relevant information identification and browsing

Method for identifying search candidates, including receiving a content that includes unprocessed content, markup elements and element styles; identifying raw content; applying content styles to the markup elements to determine which sequence of parts of content produces compact logically linked and visually bounded parts of the content; performing syntactic analysis to generate parsing trees; performing morphological analysis to determine parts of speech and word morphology in bounded parts; performing stemming on the parts of speech and constructing chains that meet grammar rules; identifying zests and calculating weights of the zests; applying weights to the chains to determine zests; adjusting the weights based on a distance from a point of user interaction with the content; selecting zests with the highest weight and degree of belonging to a region around the point of interaction; adjusting zests near the point of interaction and using it for selection of information to display.
Owner:SLICKJUMP

Translation method and translation system oriented to morphologically-rich language

The invention relates to a translation method and a translation system oriented to a morphologically-rich language. The method comprises the following steps of: (1) carrying out morphological analysis on the morphologically-rich language, so as to obtain stem and affix information; (2) during the extraction of translation rules, taking a stem as an atomic translation unit, and reserving corresponding affix distribution information; and (3) during translation, acquiring stem and affix distribution according to a fragment to be translated, wherein a stem sequence is used for querying a rule table, the affix distribution information and candidate affix distribution according to a rule are used for calculating similarity, so as to characterize the degree of the similarity between the affix distribution information and the candidate affix distribution, and guide to decod, and the stem sequence is a sequence consisting of a plurality of stems.
Owner:INST OF COMPUTING TECH CHINESE ACAD OF SCI

Apparatus and method for morphological analysis

To improve the processing speed of a morphological analysis system, an analysis processing part retrieves words constituting a natural language sentence from a word dictionary, retrieves concatenation rules from a grammar dictionary, creates an automaton based on the words and the concatenation rules, and obtains a morphological analysis. The grammar dictionary stores rules integrating concatenation rules representing non-null transitions and concatenation rules representing null transitions in which transition sources of the first concatenation rule are their transition destinations. To create the automaton, an optimal solution searching part of the analysis processing part generates, responsive to an input word entered into the automaton, states necessary for transition according to the concatenation rules corresponding to the input word without generating any state that can be transited by pursuing a null transition succeeding the generated states.
Owner:IBM CORP

Apparatus, method and computer program product for optimum translation based on semantic relation between words

A machine translation apparatus includes a identification information detection unit that detects information identifiable to a designated object; a receiving unit that receives a source language sentence; a word dividing unit that divides the source language sentence into a plurality of first word by morphological analysis; a deixis detection unit that detects, from the divided word, a deixis indicating the object directly; a correspondence setting unit that sets the identification information of the designated object and the deixis in correspondence with each other; a semantic class determining unit that determines the semantic class indicating a semantic attribute of the designated objectbased on the identification information of the designated object corresponding to the deixis; and a translation unit that translates the source language sentence where the deixis is attached the attribute having the semantic class of the designated object.
Owner:TOSHIBA DIGITAL SOLUTIONS CORP

Method for creating FMEA sheet and device for automatically creating FMEA sheet

An method for creating an FMEA sheet includes the steps of, retrieving a plurality of documents, dividing words in each of the plurality of retrieved documents into a plurality of morpheme words by morphological analysis, calculating a co-occurrence frequency of each of the plurality of morpheme words, generating a co-occurrence frequency network with morpheme words having a greater co-occurrence frequency than a predetermined level in the plurality of morpheme words, grouping the plurality of documents using the co-occurrence frequency network, extracting a word same as an FMEA word registered in an FMEA word concept dictionary created in advance from each of the plurality of documents belonging to a same group as a word to be used in creating the FMEA sheet and substituting the extracted word to the FMEA sheet.
Owner:ORMON CORP

Optimized model for rapid identification of transgenic soybeans based on morphological analysis

The invention provides an optimized model for rapid identification of transgenic soybeans based on morphological analysis. According to the invention, firstly, an establishment method of the optimizedmodel for rapid identification of the transgenic soybean based on morphological analysis is provided; for whole soybeans, spectral information under a characteristic wave band of 9403-5438cm<-1> is selected, the spectrum is preprocessed by adopting a second derivative, and a PLS-DA model is established by adopting a partial least squares-discrimination method; and for powdery soybeans, spectral information under a characteristic waveband of 7505-4597cm<-1> is selected, the spectrum is preprocessed by adopting vector normalization and a first derivative, and a PLS-DA model is established by adopting a partial least squares-discrimination method. According to the invention, the transgenic soybeans are identified by combining the near-infrared spectrum with a discriminant analysis method, and the identification accuracy of the discrimination model can be improved by selecting the sample form, the wavelength range and the spectrum pretreatment method, so that the optimal model is selectedto be applied to actual production.
Owner:NAT INST FOR NUTRITION & HEALTH CHINESE CENT FOR DISEASE CONTROL & PREVENTION

Selection method and system of recognition unit for Uygur language voice recognition

ActiveCN103065632AAlleviate the problem of too many out-of-set wordsImprove speech recognition rateSpeech recognitionSpeech soundMorphological parsing
The invention relates to a selection method and a system of a recognition unit for Uygur language voice recognition. The method includes: corresponding text corpora are collected or prepared for to-be-recognized voice; different terms are picked out from the text corpora; the different terms are input into a morphological analyzer, corresponding term splitting results are obtained if analysis is successful, term splitting based on a tail dropping algorithm is carried out on the terms if the analysis is unsuccessful so as to obtain the splitting results, and a corresponding stem and supplementary elements of each term are obtained according to the splitting results; and the terms in the text corpora are mapped into the stems and the supplementary elements, and the high-frequency stems and supplementary elements are picked out to be used as a dictionary unit. According to the selection method and the system of the recognition unit for the Uygur language voice recognition, the Uygur language terms are split into stems and supplementary elements according to morphological change rules of Uygur language, the stems and the supplementary elements are selected to be used as the recognition unit, and therefore the problem that excessive foreign words are collected in the recognition system is solved, and recognition rates of the system are improved.
Owner:INST OF ACOUSTICS CHINESE ACAD OF SCI +1

Apparatus, method, and recording medium for morphological analysis and registering a new compound word

Disclosed herein is an information processing apparatus for analyzing text data, including: acquisition means for acquiring the text data; morpheme information registration means for registering morpheme information for use in analyzing the text data morphologically; morphological analysis means for analyzing the text data acquired by the acquisition means; compound word processing rule registration means for registering compound word processing rules for creating a compound word not registered in the morpheme information registration means; and compound word processing means, by use of the compound word processing rules registered in the compound word processing rule registration means, for combining the morphemes included in the morphological analysis information created by the morphological analysis means, into the compound word not registered in the morpheme information registration means and detecting the created compound word.
Owner:SONY CORP

Apparatus and method for morphological analysis

To improve the processing speed of a morphological analysis system, an analysis processing part retrieves words constituting a natural language sentence from a word dictionary, retrieves concatenation rules from a grammar dictionary, creates an automaton based on the words and the concatenation rules, and obtains a morphological analysis. The grammar dictionary stores rules integrating concatenation rules representing non-null transitions and concatenation rules representing null transitions in which transition sources of the first concatenation rule are their transition destinations. To create the automaton, an optimal solution searching part of the analysis processing part generates, responsive to an input word entered into the automaton, states necessary for transition according to the concatenation rules corresponding to the input word without generating any state that can be transited by pursuing a null transition succeeding the generated states.
Owner:IBM CORP

Information processing apparatus, information processing method, and program

An information processing apparatus 1 comprises: a dictionary DB 15 storing categories of constituents and storing information representing a semantic interpretation, the dictionary DB 15 containing as the categories the category of object and the category of spatial location; a morphological parser 22 for performing morphological parsing of an inputted sentence; a tree structure generator 23 for, with reference to information stored in the dictionary DB 15, providing categories and lambda expressions of constituents each consisting of a morpheme or a bundle of neighboring morphemes obtained by the morphological parser 22, generating a tree structure in which the categories are hierarchically put together by combining neighboring categories in accordance with a predetermined function application rule, and generating a lambda expression representing the sentence; and a hierarchical structure generator 24 for generating a hierarchical structure in which atomic categories of the tree structure are set as nodes.
Owner:DENSO IT LAB +1

Neural machine translation method and apparatus

The present invention provides a method of generating training data to which explicit word-alignment information is added without impairing sub-word tokens, and a neural machine translation method and apparatus including the method. The method of generating training data includes the steps of: (1) separating basic word boundaries through morphological analysis or named entity recognition of a sentence of a bilingual corpus used for learning; (2) extracting explicit word-alignment information from the sentence of the bilingual corpus used for learning; (3) further dividing the word boundaries separated in step (1) into sub-word tokens; (4) generating new source language training data by using an output from the step (1) and an output from the step (3); and (5) generating new target language training data by using the explicit word-alignment information generated in the step (2) and the target language outputs from the steps (1) and (3).
Owner:ELECTRONICS & TELECOMM RES INST

Intention inference system and intention inference method

An intention inference system includes, a morphological analyzer to perform morphological analysis for a complex sentence with multiple intentions involved, a syntactic analyzer to perform syntactic analysis for the complex sentence morphologically analyzed by the morphological analyzer and to divide it into the first simple sentence and the second simple sentence, an intention inference unit to infer the first intention involved in the first simple sentence and the second intention involved in the second simple sentence, a feature extractor to extract as the first feature a morpheme showing execution order of operations involved in the first simple sentence and to extract as the second feature a morpheme showing execution order of operations involved in the second simple sentence, and an execution order inference unit to infer the execution order of the first operation corresponding to the first intention and the second operation corresponding to the second intention on the basis of the first feature and the second feature extracted by the feature extractor. This enables the system to infer user's intentions accurately.
Owner:MITSUBISHI ELECTRIC CORP

Annotation assisting apparatus and computer program therefor

An annotation data generation assisting system includes: an input / output device receiving an input through an interactive process; morphological analysis system 380 and dependency parsing system performing morphological and dependency parsing on text data in text archive; first to fourth candidate generating units detecting a zero anaphor or a referring expression in the dependency relation of a predicate in a sequence of morphemes, identifying a position as an object of annotation and estimating candidates of expressions to be inserted by using language knowledge; a candidate DB storing estimated candidates; and an interactive annotation device reading candidates of annotation from candidate DB and annotate a candidate selected by an interactive process by input / output device.
Owner:NAT INST OF INFORMATION & COMM TECH

Information processing device

An information processing device (10) includes: a morphological analysis unit (11a, 11b) performing morphological analysis to divide each of an article body text included in an article and a caption of each of images into morphemes; a phrase acquiring unit (12) dividing the article body text into phrases on a basis of a result of the morphological analysis performed by the morphological analysis unit (11b); and a correspondence determining unit (13). The correspondence determining unit (13) determines correspondence between each of the phrases of the article body text and the images by calculating a correlation between the caption and each of the phrases of the article body text on a basis of the result of the morphological analysis performed by the morphological analysis unit (11a).
Owner:MITSUBISHI ELECTRIC CORP

Information processing apparatus and method, and recording medium

An information processing apparatus includes an obtaining unit configured to obtain program information on programs to be broadcast in a predetermined time period; a keyword extraction unit configured to extract keywords obtained by performing morphological analysis on text data contained in the obtained program information in such a manner as to be associated with corresponding programs; a current-affairs keyword extraction unit configured to extract, as current-affairs keywords, keywords that are associated with corresponding programs to be broadcast on a plurality of different broadcast stations and that are associated with only programs to be broadcast today from among the keywords extracted by the keyword extraction unit, wherein, for each of the extracted current-affairs keywords, the number of appearances of the current-affairs keyword is summed, and an importance degree indicating an importance characteristic of the current-affairs keyword for a user is determined on the basis of the summed number of appearances.
Owner:SONY CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products