Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

39 results about "Treebank" patented technology

In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data. The exploitation of treebank data has been important ever since the first large-scale treebank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of treebanks is becoming more widely appreciated in linguistics research as a whole. For example, annotated treebank data has been crucial in syntactic research to test linguistic theories of sentence structure against large quantities of naturally occurring examples.

Method for processing unknown words in Chinese-language dependency tree banks

The invention belongs to the field of processing for natural languages of computational linguistics, and discloses a method for processing unknown words in Chinese-language dependency tree banks. The method includes steps of A, searching all synonyms of the unknown words by the aid of synonym forests; B, computing character pattern similarity degrees among the unknown words and all the synonyms of the unknown words according to character pattern features of Chinese characters; C, extracting mapped words and information quantities of word classes of the mapped words when the character pattern similarity degrees among the unknown words and the multiple synonyms are high, and improving character pattern similarity degree computation models; D, extracting the words with the maximum character pattern similarity degrees as the optimal mapped words of the unknown words and using the extracted words as explanation for the unknown words in the tree banks. The method has the advantages that unit pairs (word classes, word classes) in dependency syntactic analysis can be recovered to unit pairs (word classes, words) or unit pairs (words, word classes) on the premise that the scales of the tree banks are no longer expanded, accordingly, the information granularity can be refined, the problem of data sparseness can be solved, and the dependency syntactic analysis performance can be improved.
Owner:BEIJING INFORMATION SCI & TECH UNIV

Tendency text automatic classification system and achieving method of the same

The invention provides a tendency text automatic classification system and an achieving method of the same, and relates to the field of natural language processing technology, text data mining, and text automatic classification technology. The system comprises a dependence relationship analysis module, a Chinese word segmentation module, a sentence structure analysis module and a multi-layer emotion classification sentence module library, wherein the dependence relationship analysis module is used for dependence relationship analysis of Chinese sentences, the Chinese word segmentation module is used for work segmentation of the Chinese sentences, the sentence structure analysis module is used for sentence structure analysis of the Chinese sentences after work segmentation, and the multi-layer emotion classification sentence module library is used for management of business related knowledge. The tendency text automatic classification system is characterized in that the multi-layer emotion classification sentence module library is divided into 3 large classes and 120 small classes, the three classes are attitude grammar, feeling grammar and thought grammar, and the multi-layer emotion classification sentence module library is sorted in manual mode according to Chinese using rules and the business related knowledge. Sentence structure analysis of all sentence models in the multi-layer emotion classification sentence module library is conducted to build a sentence structure Treebank, and dependence relationship analysis of all the sentence models in the multi-layer emotion classification sentence module library is conducted to form a dependence relationshipgallery.
Owner:WUYI UNIV

Phrase tree to dependency tree transformation method capable of combining Vietnamese grammatical features

The invention relates to a phrase tree to dependency tree transformation method capable of combining Vietnamese grammatical features, and belongs to the technical field of natural language processing. The phrase tree to dependency tree transformation method comprises the following steps: firstly, constructing a Vietnamese phrase tree library; utilizing a center subnode filter table which combines the Vietnamese grammatical features and a dependency relationship annotator to finish the phrase tree to dependency tree transformation in the Vietnamese phrase tree library to obtain a first-level Vietnamese dependency tree library; according to the corpus of the manually annotated first-level Vietnamese dependency tree library, training to obtain a MSTParser model, utilizing the MSTParser model to carry out the expansion of the first-level Vietnamese dependency tree library to obtain an expanded second-level Vietnamese dependency tree library; and utilizing a dependency relationship corrector to correct the corpus of the expanded second-level Vietnamese dependency tree library to obtain a final three-level Vietnamese dependency tree library. The method avoids a process that the Vietnamese dependency tree library is manually collected and annotated, saves manpower and time for constructing the tree library, and obviously improves accuracy.
Owner:KUNMING UNIV OF SCI & TECH

Chunk-based Vietnamese phrase tree construction method

The invention relates to a chunk-based Vietnamese phrase tree construction method, and belongs to the technical field of natural language processing. The method comprises the following steps of: firstly carrying out upper-layer chunk labeling and basic-layer chunk labeling on a Vietnamese phrase tree label set; selecting feature sets of an upper-layer chunk and a basic-layer chunk, and constructing a chunk-based Vietnamese phrase tree library construction model; carrying out chunk analysis on word-segmented Vietnamese sentences by utilizing a chunk analysis tool, so as to obtain a chunk construction-based primary Vietnamese phrase tree library; and correcting the chunk construction-based primary Vietnamese phrase tree library by utilizing a phrase tree library corrector, so as to obtain a corrected final Vietnamese phrase tree library. According to the method provided by the invention, the process of manually collecting and labelling the Vietnamese phrase tree libraries is avoided, and the manpower and the time of constructing the tree libraries are saved; and compared with the method for constructing Vietnamese phrase tree libraries by adoption of context-free grammars and maximum entropies, the phrase tree construction method disclosed by the invention has an advantage of remarkably improving the correctness.
Owner:KUNMING UNIV OF SCI & TECH

A judicial case discrimination system and method based on event tree analysis

InactiveCN109949185AQuick caseQuick Logic VisualizationData processing applicationsText database queryingData miningComputer science
The invention provides a judicial case discrimination system based on event tree analysis. The judicial case discrimination system comprises a legal text collection module, an event tree constructionmodule, an automatic criminal name discrimination module and an automatic penalty discrimination module. The legal text collection module is used for converting legal statement text information submitted by a litigation user into noiseless text data; the event tree construction module is used for converting the legal text data into triple information related to the two entities through a natural language processing technology, and forming time sequence legal event tree information through a treebank; the automatic criminal name discrimination module is used for giving an automatic criminal name discrimination result through an event tree similarity matching algorithm; and the automatic penalty discrimination module is used for providing an automatic penalty discrimination result through anevent tree similarity matching algorithm on the premise of criminal name discrimination. The method has the advantages that the litigation user can be helped to quickly know the preliminary result ofcase judgment, and the system and method can help a judge to perform a final case discrimination.
Owner:NANJING UNIV OF POSTS & TELECOMM

MST algorithm based Vietnamese dependency tree library construction method

InactiveCN105740234ASolve time-consuming problemsMake up for scarcityNatural language data processingSpecial data processing applicationsLibrary trainingTheoretical computer science
The invention relates to an MST algorithm based Vietnamese dependency tree library construction method and belongs to the technical field of natural language processing. The method comprises the steps of firstly constructing a Vietnamese dependency tree library training corpus base; secondly performing training by utilizing corpora of the Vietnamese dependency tree library training corpus base to obtain an MST model and then training Vietnamese sentences by utilizing the MST model to obtain a Vietnamese dependency tree library; and correcting the obtained Vietnamese dependency tree library corpus base. The Vietnamese dependency tree library constructed with the method can provide powerful support for upper-layer applications such as syntactic analysis, machine translation, information acquisition and the like of Vietnamese language; the Vietnamese dependency tree library with one hundred thousand Vietnamese sentences can be constructed; the method avoids the processes of manually collecting and marking the Vietnamese dependency tree library, reduces the labor and shortens the time for constructing the tree library; and compared with a method for constructing a Vietnamese dependency tree library by adopting a CRFParser and Chinese-Vietnamese bilingual word-alignment corpora, the method provided by the invention has the advantage that the accuracy is remarkably improved.
Owner:KUNMING UNIV OF SCI & TECH

Method for establishing Vietnamese dependency tree bank based on improved Nivre algorithm

The invention relates to a method for establishing a Vietnamese dependency tree bank based on an improved Nivre algorithm, and belongs to the technical field of natural language processing. The method comprises the steps of firstly, establishing an initial training corpus, an expansion corpus and a test corpus; secondly training two dependency parsing weak learners S1 and S2 based on the improved Nivre algorithm by utilizing the established initial training corpus to serve as two fully redundant views; thirdly, performing dependency parsing on the expansion corpus by utilizing the two trained weak learners S1 and S2 and building a Vietnamese dependency tree bank model; and finally, performing dependency parsing testing on the test corpus and finally establishing the Vietnamese dependency tree bank. According to the method, the powerful support can be provided for upper applications of syntactic analysis, machine translation, information acquisition and the like of a Vietnamese language; the process of manually marking a dependency relation of Vietnamese sentences can be effectively avoided, so that the time of manpower and material resources is saved; and a large amount of unmarked Vietnamese sentence level corpora can be effectively utilized for improving the accuracy of dependency parsing.
Owner:KUNMING UNIV OF SCI & TECH

Processing method of unregistered words in Chinese dependency tree bank

The invention belongs to the field of processing for natural languages of computational linguistics, and discloses a method for processing unknown words in Chinese-language dependency tree banks. The method includes steps of A, searching all synonyms of the unknown words by the aid of synonym forests; B, computing character pattern similarity degrees among the unknown words and all the synonyms of the unknown words according to character pattern features of Chinese characters; C, extracting mapped words and information quantities of word classes of the mapped words when the character pattern similarity degrees among the unknown words and the multiple synonyms are high, and improving character pattern similarity degree computation models; D, extracting the words with the maximum character pattern similarity degrees as the optimal mapped words of the unknown words and using the extracted words as explanation for the unknown words in the tree banks. The method has the advantages that unit pairs (word classes, word classes) in dependency syntactic analysis can be recovered to unit pairs (word classes, words) or unit pairs (words, word classes) on the premise that the scales of the tree banks are no longer expanded, accordingly, the information granularity can be refined, the problem of data sparseness can be solved, and the dependency syntactic analysis performance can be improved.
Owner:BEIJING INFORMATION SCI & TECH UNIV

A Method of Constructing Vietnamese Dependency Treebank Based on Improved Nivre Algorithm

The invention relates to a method for establishing a Vietnamese dependency tree bank based on an improved Nivre algorithm, and belongs to the technical field of natural language processing. The method comprises the steps of firstly, establishing an initial training corpus, an expansion corpus and a test corpus; secondly training two dependency parsing weak learners S1 and S2 based on the improved Nivre algorithm by utilizing the established initial training corpus to serve as two fully redundant views; thirdly, performing dependency parsing on the expansion corpus by utilizing the two trained weak learners S1 and S2 and building a Vietnamese dependency tree bank model; and finally, performing dependency parsing testing on the test corpus and finally establishing the Vietnamese dependency tree bank. According to the method, the powerful support can be provided for upper applications of syntactic analysis, machine translation, information acquisition and the like of a Vietnamese language; the process of manually marking a dependency relation of Vietnamese sentences can be effectively avoided, so that the time of manpower and material resources is saved; and a large amount of unmarked Vietnamese sentence level corpora can be effectively utilized for improving the accuracy of dependency parsing.
Owner:KUNMING UNIV OF SCI & TECH

A Phrase Tree to Dependency Tree Conversion Method Integrating Vietnamese Grammatical Features

The invention relates to a phrase tree to dependency tree transformation method capable of combining Vietnamese grammatical features, and belongs to the technical field of natural language processing. The phrase tree to dependency tree transformation method comprises the following steps: firstly, constructing a Vietnamese phrase tree library; utilizing a center subnode filter table which combines the Vietnamese grammatical features and a dependency relationship annotator to finish the phrase tree to dependency tree transformation in the Vietnamese phrase tree library to obtain a first-level Vietnamese dependency tree library; according to the corpus of the manually annotated first-level Vietnamese dependency tree library, training to obtain a MSTParser model, utilizing the MSTParser model to carry out the expansion of the first-level Vietnamese dependency tree library to obtain an expanded second-level Vietnamese dependency tree library; and utilizing a dependency relationship corrector to correct the corpus of the expanded second-level Vietnamese dependency tree library to obtain a final three-level Vietnamese dependency tree library. The method avoids a process that the Vietnamese dependency tree library is manually collected and annotated, saves manpower and time for constructing the tree library, and obviously improves accuracy.
Owner:KUNMING UNIV OF SCI & TECH

Method and system for automatic treebank transformation based on pattern embedding

The invention relates to an automatic tree bank conversion method and system based on pattern embedding, which is designed to obtain an accurate supervised conversion model. The present invention is based on the automatic tree bank conversion method of pattern embedding, and determines the word w i and the word w j pattern; the word w i and the word w j The pattern of is transformed into the corresponding pattern embedding vector; the word w in the source tree i , word w j , the smallest common ancestor node w a The dependency labels corresponding to the three are respectively transformed into dependency embedding vectors; the pattern embedding vector and the three dependency embedding vectors are spliced ​​together as the word w in the source tree i and the word w j The representation vector of the structural information of the cyclic neural network, the top-level output of the recurrent neural network is spliced ​​with the representation vector respectively, and used as the input of the perceptron MLP; the word w is obtained by biaffine calculation i and the word w j The target end depends on the arc-score value; the invention makes full use of the source-end syntax tree to describe the corresponding laws of the two labeling specifications, and finally completes the high-quality tree bank conversion.
Owner:SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products