Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

39 results about "Treebank" patented technology

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data. The exploitation of treebank data has been important ever since the first large-scale treebank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of treebanks is becoming more widely appreciated in linguistics research as a whole. For example, annotated treebank data has been crucial in syntactic research to test linguistic theories of sentence structure against large quantities of naturally occurring examples.

Module for creating a language neutral syntax representation using a language particular syntax tree

A method or module for creating an Language Neutral Syntax (LNS) representation of a sentence from a language particular syntax representation such as found in the Penn Treebank for use by different applications. The method or module includes a node generator configured to create hierarchical and dependent nodes using phrasal and constituent nodes of the language particular syntax. A node dependency generator is configured to create an unordered hierarchical dependency structure for the hierarchical and dependent nodes using a semantic relation between the hierarchical and dependent nodes derived from the language particular syntax.
Owner:MICROSOFT TECH LICENSING LLC

Chinese implicit discourse relation identification method

The invention discloses a Chinese implicit discourse relation identification method. The method comprises the following steps of step 1 carrying out automatic word segmentation on a Chinese implicit discourse relation theory element pair to obtain an automatic word segmentation result; step 2 learning feature expression of Chinese implicit discourse relation theory elements based on the obtained automatic word segmentation result of the Chinese implicit discourse relation theory elements; step 3 carrying out modelling on the Chinese implicit discourse relation between the theory elements through a maximum-margin-based neural network model based on the obtained feature expression; and step 4 utilizing the obtained neural network model to identify the Chinese implicit discourse relation. According to the method, the Chinese implicit discourse relation can be more accurately identified. Through experimental verification on a Chinese discourse tree bank, in comparison with the existing English implicit discourse relation identification method, the method obtains an identification result with the higher accuracy rate in the Chinese implicit discourse relation identification.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

Automatic analysis method Chinese syntax based on corpus and tree type structural pattern match

The invention discloses an automatic analysis method of Chinese syntax based on corpora and pattern matching of tree structure. Based on the deep analysis and complete segmentation of Chinese mark corpus and according to syntactic patterns extracted from corpus and corresponding relationship of semantic collocation, the method carries out the pattern matching and switching processes of the sentences to be processed, and obtains an optimal syntax analysis result through the process of semantic disambiguation. The syntax automatic analysis system of the invention comprises an extracting, storing and calling module of syntactic pattern in syntax treebank, a sentence pattern statistics module, a syntactic pattern matching module, a local conversion module of approximate patterns and a semantic disambiguation module. Experiments prove that compared with the traditional syntax analysis, the Chinese syntax analysis method of the invention pays more attention to the combination of overall matching and local switching of the syntactic patterns, has large processing granularity and high efficiency, and increases average accuracy and recalling rate by about 10 percent.
Owner:NANJING UNIV

Method for processing unknown words in Chinese-language dependency tree banks

The invention belongs to the field of processing for natural languages of computational linguistics, and discloses a method for processing unknown words in Chinese-language dependency tree banks. The method includes steps of A, searching all synonyms of the unknown words by the aid of synonym forests; B, computing character pattern similarity degrees among the unknown words and all the synonyms of the unknown words according to character pattern features of Chinese characters; C, extracting mapped words and information quantities of word classes of the mapped words when the character pattern similarity degrees among the unknown words and the multiple synonyms are high, and improving character pattern similarity degree computation models; D, extracting the words with the maximum character pattern similarity degrees as the optimal mapped words of the unknown words and using the extracted words as explanation for the unknown words in the tree banks. The method has the advantages that unit pairs (word classes, word classes) in dependency syntactic analysis can be recovered to unit pairs (word classes, words) or unit pairs (words, word classes) on the premise that the scales of the tree banks are no longer expanded, accordingly, the information granularity can be refined, the problem of data sparseness can be solved, and the dependency syntactic analysis performance can be improved.
Owner:BEIJING INFORMATION SCI & TECH UNIV

Formalized scheme for constructing Chinese tree bank based on sentence-based grammar

The invention discloses a formalized scheme for constructing a Chinese tree bank based on sentence-based grammar and relates to the field of corpus linguistics and natural language processing. According to the formalized scheme, research results on ''dynamic words'' in the linguistic circle are introduced in the design process with the sentence-based grammar in Chinese traditional teaching grammar being a prototype. By the adoption of the formalized scheme for constructing the Chinese tree bank, the accuracy and efficiency of tree bank construction can be improved beneficially, and meanwhile communication and fusion of the three fields of information processing, grammar study and teaching practice are promoted.
Owner:彭炜明 +4

Method for searching and matching articles by way of tree graph

The invention discloses a method for constructing, searching, matching and representing articles, commodities and service information by way of a tree graph and aims to improve the efficiency of precisely retrieving and managing required articles, commodities and services in the field of electronic commerce, Internet shopping, product management and the like. A client, a background system for information interaction with the client and a method for constructing, searching, matching and representing an article set tree graph between the client and the background system are included. The background system comprises a tree graph matching and searching engine, an article type attribute tree library and an article library. Compared with the prior art, the method disclosed by the invention has the advantages that precise positioning of required articles is realized by the user, ambiguity and nondeterminacy for determining searching conditions by character recognition are avoided, the quantity of irrelevant articles represented to the user is reduced, and meanwhile, the article retrieval efficiency is improved.
Owner:李剑

Event relationship graph generation method and apparatus

The invention provides an event relationship graph generation method and apparatus. The method comprises the steps of splitting a manuscript into statements according to preset punctuation marks; extracting characters in the split statements and screening out the statements containing the characters as standby statements, wherein the characters include at least one of a personal name, a role and a personal pronoun; performing syntactic analysis on the standby statements by utilizing a pre-obtained syntactic analysis tree bank to generate an associative relationship among the characters; and generating an event relationship graph by utilizing the characters and the associative relationship. By utilizing the method, events occurring among the characters can be visually viewed, so that a user can understand substances of the events for a relatively short time and the reading time is shortened.
Owner:NEUSOFT CORP

Syntax analysis method and device for layering Chinese long sentences based on punctuation treatment

Unlike to traditional method, the new hierarchy syntactic analysis method faced Chinese long sentence comprises: 1. applying special functions of some punctuations to divide the complex long sentence into sub-sentence sequences; 2. extracting grammar rule and corresponding probability distribution information from large-scale database to analyze sentence and eliminate ambiguity. Much experiences show this invention can reduce time consumption and improves the analysis right rate and the recall rate about 7%.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

Method and system for simplifying implicit rhetorical relation prediction in large scale annotated corpus

The present invention provides a method and system directed to predicting implicit rhetorical relations between two spans of text, e.g., in a large annotated corpus, such as the Penn Discourse Treebank (“PDTB”), Rhetorical Structure Theory corpus, and the Discourse Graph Bank, and particularly directed to determining a rhetorical relation in the absence of an explicit discourse marker. Surface level features may be used to capture pragmatic information encoded in the absent marker. In one manner a simplified feature set based only on raw text and semantic dependencies is used to improve performance for all relations. By using surface level features to predict implicit rhetorical relations for the large annotated corpus the invention approaches a theoretical maximum performance, suggesting that more data will not necessarily improve performance based on these and similarly situated features.
Owner:THOMSON REUTERS ENTERPRISE CENT GMBH

Tendency text automatic classification system and achieving method of the same

The invention provides a tendency text automatic classification system and an achieving method of the same, and relates to the field of natural language processing technology, text data mining, and text automatic classification technology. The system comprises a dependence relationship analysis module, a Chinese word segmentation module, a sentence structure analysis module and a multi-layer emotion classification sentence module library, wherein the dependence relationship analysis module is used for dependence relationship analysis of Chinese sentences, the Chinese word segmentation module is used for work segmentation of the Chinese sentences, the sentence structure analysis module is used for sentence structure analysis of the Chinese sentences after work segmentation, and the multi-layer emotion classification sentence module library is used for management of business related knowledge. The tendency text automatic classification system is characterized in that the multi-layer emotion classification sentence module library is divided into 3 large classes and 120 small classes, the three classes are attitude grammar, feeling grammar and thought grammar, and the multi-layer emotion classification sentence module library is sorted in manual mode according to Chinese using rules and the business related knowledge. Sentence structure analysis of all sentence models in the multi-layer emotion classification sentence module library is conducted to build a sentence structure Treebank, and dependence relationship analysis of all the sentence models in the multi-layer emotion classification sentence module library is conducted to form a dependence relationshipgallery.
Owner:WUYI UNIV

Phrase tree to dependency tree transformation method capable of combining Vietnamese grammatical features

The invention relates to a phrase tree to dependency tree transformation method capable of combining Vietnamese grammatical features, and belongs to the technical field of natural language processing. The phrase tree to dependency tree transformation method comprises the following steps: firstly, constructing a Vietnamese phrase tree library; utilizing a center subnode filter table which combines the Vietnamese grammatical features and a dependency relationship annotator to finish the phrase tree to dependency tree transformation in the Vietnamese phrase tree library to obtain a first-level Vietnamese dependency tree library; according to the corpus of the manually annotated first-level Vietnamese dependency tree library, training to obtain a MSTParser model, utilizing the MSTParser model to carry out the expansion of the first-level Vietnamese dependency tree library to obtain an expanded second-level Vietnamese dependency tree library; and utilizing a dependency relationship corrector to correct the corpus of the expanded second-level Vietnamese dependency tree library to obtain a final three-level Vietnamese dependency tree library. The method avoids a process that the Vietnamese dependency tree library is manually collected and annotated, saves manpower and time for constructing the tree library, and obviously improves accuracy.
Owner:KUNMING UNIV OF SCI & TECH

Chunk-based Vietnamese phrase tree construction method

The invention relates to a chunk-based Vietnamese phrase tree construction method, and belongs to the technical field of natural language processing. The method comprises the following steps of: firstly carrying out upper-layer chunk labeling and basic-layer chunk labeling on a Vietnamese phrase tree label set; selecting feature sets of an upper-layer chunk and a basic-layer chunk, and constructing a chunk-based Vietnamese phrase tree library construction model; carrying out chunk analysis on word-segmented Vietnamese sentences by utilizing a chunk analysis tool, so as to obtain a chunk construction-based primary Vietnamese phrase tree library; and correcting the chunk construction-based primary Vietnamese phrase tree library by utilizing a phrase tree library corrector, so as to obtain a corrected final Vietnamese phrase tree library. According to the method provided by the invention, the process of manually collecting and labelling the Vietnamese phrase tree libraries is avoided, and the manpower and the time of constructing the tree libraries are saved; and compared with the method for constructing Vietnamese phrase tree libraries by adoption of context-free grammars and maximum entropies, the phrase tree construction method disclosed by the invention has an advantage of remarkably improving the correctness.
Owner:KUNMING UNIV OF SCI & TECH

A judicial case discrimination system and method based on event tree analysis

InactiveCN109949185AQuick caseQuick Logic VisualizationData processing applicationsText database queryingData miningComputer science
The invention provides a judicial case discrimination system based on event tree analysis. The judicial case discrimination system comprises a legal text collection module, an event tree constructionmodule, an automatic criminal name discrimination module and an automatic penalty discrimination module. The legal text collection module is used for converting legal statement text information submitted by a litigation user into noiseless text data; the event tree construction module is used for converting the legal text data into triple information related to the two entities through a natural language processing technology, and forming time sequence legal event tree information through a treebank; the automatic criminal name discrimination module is used for giving an automatic criminal name discrimination result through an event tree similarity matching algorithm; and the automatic penalty discrimination module is used for providing an automatic penalty discrimination result through anevent tree similarity matching algorithm on the premise of criminal name discrimination. The method has the advantages that the litigation user can be helped to quickly know the preliminary result ofcase judgment, and the system and method can help a judge to perform a final case discrimination.
Owner:NANJING UNIV OF POSTS & TELECOMM

Syntax tree library construction system

The invention provides a syntax tree library construction system. The syntax tree library construction system mainly comprises a word segmentation annotation module, a word meaning annotation module,a chunk connection module, a component identification and component relationship annotation module and a syntax tree proofreading module. More people can participate in syntax tree construction work,so that a large-scale, multi-field and high-quality syntax tree library is constructed. The problems that a traditional syntax tree construction method is high in cost, low in efficiency, poor in consistency, small in scale, narrow in field, slow in updating and the like are solved. The problems that labeling operation can only be conducted on a large screen and the like are also solved.
Owner:大连语智星科技有限公司

MST algorithm based Vietnamese dependency tree library construction method

InactiveCN105740234ASolve time-consuming problemsMake up for scarcityNatural language data processingSpecial data processing applicationsLibrary trainingTheoretical computer science
The invention relates to an MST algorithm based Vietnamese dependency tree library construction method and belongs to the technical field of natural language processing. The method comprises the steps of firstly constructing a Vietnamese dependency tree library training corpus base; secondly performing training by utilizing corpora of the Vietnamese dependency tree library training corpus base to obtain an MST model and then training Vietnamese sentences by utilizing the MST model to obtain a Vietnamese dependency tree library; and correcting the obtained Vietnamese dependency tree library corpus base. The Vietnamese dependency tree library constructed with the method can provide powerful support for upper-layer applications such as syntactic analysis, machine translation, information acquisition and the like of Vietnamese language; the Vietnamese dependency tree library with one hundred thousand Vietnamese sentences can be constructed; the method avoids the processes of manually collecting and marking the Vietnamese dependency tree library, reduces the labor and shortens the time for constructing the tree library; and compared with a method for constructing a Vietnamese dependency tree library by adopting a CRFParser and Chinese-Vietnamese bilingual word-alignment corpora, the method provided by the invention has the advantage that the accuracy is remarkably improved.
Owner:KUNMING UNIV OF SCI & TECH

Method for establishing Vietnamese dependency tree bank based on improved Nivre algorithm

The invention relates to a method for establishing a Vietnamese dependency tree bank based on an improved Nivre algorithm, and belongs to the technical field of natural language processing. The method comprises the steps of firstly, establishing an initial training corpus, an expansion corpus and a test corpus; secondly training two dependency parsing weak learners S1 and S2 based on the improved Nivre algorithm by utilizing the established initial training corpus to serve as two fully redundant views; thirdly, performing dependency parsing on the expansion corpus by utilizing the two trained weak learners S1 and S2 and building a Vietnamese dependency tree bank model; and finally, performing dependency parsing testing on the test corpus and finally establishing the Vietnamese dependency tree bank. According to the method, the powerful support can be provided for upper applications of syntactic analysis, machine translation, information acquisition and the like of a Vietnamese language; the process of manually marking a dependency relation of Vietnamese sentences can be effectively avoided, so that the time of manpower and material resources is saved; and a large amount of unmarked Vietnamese sentence level corpora can be effectively utilized for improving the accuracy of dependency parsing.
Owner:KUNMING UNIV OF SCI & TECH

Chinese automatic syntactic analyzer based on sentence pattern structure

The invention provides an automatic Chinese syntactic analyzer based on a sentence pattern structure, which comprises the following steps: S1, expanding a grammar mode of a regular expression to realize an expanded regular expression grammar based on a multivariate word feature sequence; S2, constructing a syntactic rule library by using the extended regular expression grammar obtained in the step S1; S3, constructing a vocabulary knowledge base and a lexical knowledge base matched with the syntactic rule base constructed in the S2; and S4, based on the vocabulary knowledge base and the lexical knowledge base constructed in the S3, performing Chinese automatic syntactic analysis of a sentence pattern structure by adopting a lexical and syntactic integrated analysis algorithm. The method has the advantages that the Chinese automatic syntactic analysis function based on a sentence pattern structure system is achieved, the construction efficiency of a large-scale sentence standard syntax tree bank is improved, and a way is laid for connection of formalized graph analysis sentences and Chinese information processing downstream applications.
Owner:北京汉雅天诚教育科技有限公司

Processing method of unregistered words in Chinese dependency tree bank

The invention belongs to the field of processing for natural languages of computational linguistics, and discloses a method for processing unknown words in Chinese-language dependency tree banks. The method includes steps of A, searching all synonyms of the unknown words by the aid of synonym forests; B, computing character pattern similarity degrees among the unknown words and all the synonyms of the unknown words according to character pattern features of Chinese characters; C, extracting mapped words and information quantities of word classes of the mapped words when the character pattern similarity degrees among the unknown words and the multiple synonyms are high, and improving character pattern similarity degree computation models; D, extracting the words with the maximum character pattern similarity degrees as the optimal mapped words of the unknown words and using the extracted words as explanation for the unknown words in the tree banks. The method has the advantages that unit pairs (word classes, word classes) in dependency syntactic analysis can be recovered to unit pairs (word classes, words) or unit pairs (words, word classes) on the premise that the scales of the tree banks are no longer expanded, accordingly, the information granularity can be refined, the problem of data sparseness can be solved, and the dependency syntactic analysis performance can be improved.
Owner:BEIJING INFORMATION SCI & TECH UNIV

A Method of Constructing Vietnamese Dependency Treebank Based on Improved Nivre Algorithm

The invention relates to a method for establishing a Vietnamese dependency tree bank based on an improved Nivre algorithm, and belongs to the technical field of natural language processing. The method comprises the steps of firstly, establishing an initial training corpus, an expansion corpus and a test corpus; secondly training two dependency parsing weak learners S1 and S2 based on the improved Nivre algorithm by utilizing the established initial training corpus to serve as two fully redundant views; thirdly, performing dependency parsing on the expansion corpus by utilizing the two trained weak learners S1 and S2 and building a Vietnamese dependency tree bank model; and finally, performing dependency parsing testing on the test corpus and finally establishing the Vietnamese dependency tree bank. According to the method, the powerful support can be provided for upper applications of syntactic analysis, machine translation, information acquisition and the like of a Vietnamese language; the process of manually marking a dependency relation of Vietnamese sentences can be effectively avoided, so that the time of manpower and material resources is saved; and a large amount of unmarked Vietnamese sentence level corpora can be effectively utilized for improving the accuracy of dependency parsing.
Owner:KUNMING UNIV OF SCI & TECH

A Phrase Tree to Dependency Tree Conversion Method Integrating Vietnamese Grammatical Features

The invention relates to a phrase tree to dependency tree transformation method capable of combining Vietnamese grammatical features, and belongs to the technical field of natural language processing. The phrase tree to dependency tree transformation method comprises the following steps: firstly, constructing a Vietnamese phrase tree library; utilizing a center subnode filter table which combines the Vietnamese grammatical features and a dependency relationship annotator to finish the phrase tree to dependency tree transformation in the Vietnamese phrase tree library to obtain a first-level Vietnamese dependency tree library; according to the corpus of the manually annotated first-level Vietnamese dependency tree library, training to obtain a MSTParser model, utilizing the MSTParser model to carry out the expansion of the first-level Vietnamese dependency tree library to obtain an expanded second-level Vietnamese dependency tree library; and utilizing a dependency relationship corrector to correct the corpus of the expanded second-level Vietnamese dependency tree library to obtain a final three-level Vietnamese dependency tree library. The method avoids a process that the Vietnamese dependency tree library is manually collected and annotated, saves manpower and time for constructing the tree library, and obviously improves accuracy.
Owner:KUNMING UNIV OF SCI & TECH

Method and system for automatic treebank transformation based on pattern embedding

The invention relates to an automatic tree bank conversion method and system based on pattern embedding, which is designed to obtain an accurate supervised conversion model. The present invention is based on the automatic tree bank conversion method of pattern embedding, and determines the word w i and the word w j pattern; the word w i and the word w j The pattern of is transformed into the corresponding pattern embedding vector; the word w in the source tree i , word w j , the smallest common ancestor node w a The dependency labels corresponding to the three are respectively transformed into dependency embedding vectors; the pattern embedding vector and the three dependency embedding vectors are spliced ​​together as the word w in the source tree i and the word w j The representation vector of the structural information of the cyclic neural network, the top-level output of the recurrent neural network is spliced ​​with the representation vector respectively, and used as the input of the perceptron MLP; the word w is obtained by biaffine calculation i and the word w j The target end depends on the arc-score value; the invention makes full use of the source-end syntax tree to describe the corresponding laws of the two labeling specifications, and finally completes the high-quality tree bank conversion.
Owner:SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products