Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

39 results about "Dependency grammar" patented technology

Dependency grammar (DG) is a class of modern grammatical theories that are all based on the dependency relation (as opposed to the relation of phrase structure) and that can be traced back primarily to the work of Lucien Tesnière. Dependency is the notion that linguistic units, e.g. words, are connected to each other by directed links. The (finite) verb is taken to be the structural center of clause structure. All other syntactic units (words) are either directly or indirectly connected to the verb in terms of the directed links, which are called dependencies. DGs are distinct from phrase structure grammars, since DGs lack phrasal nodes, although they acknowledge phrases. Structure is determined by the relation between a word (a head) and its dependents. Dependency structures are flatter than phrase structures in part because they lack a finite verb phrase constituent, and they are thus well suited for the analysis of languages with free word order, such as Czech, Slovak, and Warlpiri.

Apparatus and Method for Analyzing Intention

An apparatus and system for analyzing intention are provided. The apparatus for analyzing an intention applies a context-free grammar to each of one or more sentences in units of one or more phrases to perform phrase spotting on each sentence, thereby extending a recognition range for an out-of-grammar (OOG) expression. Meanwhile, the apparatus for analyzing an intention determines whether sentences that have undergone phrase spotting are grammatically valid by applying a dependency grammar to the sentences to filter an invalid sentence, and generates the intention analysis result of a valid sentence, thereby and grammatically and / or semantically verifying a sentence that has undergone speech recognition while extending a speech recognition range.
Owner:SAMSUNG ELECTRONICS CO LTD

Dependency semantic-based Chinese unsupervised open entity relationship extraction method

The invention relates to a dependency semantic-based Chinese unsupervised open entity relationship extraction method. The method comprises the following steps of preprocessing an input text: performing Chinese word segmentation, part-of-speech tagging and dependency grammar analysis on the input text; performing named entity identification on the input text; arbitrarily selecting two entities from identified entities to form candidate entity pairs; searching for a dependency path between two entities in the candidate entity pairs; and analyzing whether a syntactic structure mapped by the dependency path is matched with a normal form of a dependency semantic normal form set or not, if yes, extracting words or phrases from the residual part of the input text according to the matched normal form to serve as relational words, forming a relational triple by the extracted relational words and the candidate entity pairs, and if not, performing normal form matching of a next group of the candidate entity pairs; and outputting the relational triple. Compared with the prior art, the method has the advantages that the calculation complexity is low; the extraction efficiency is high; distance position limitation is overcome; a simple sentence also can be extracted and the like.
Owner:TONGJI UNIV

Multi-granularity semantic chunk based entity attribute and attribute value extracting method

The invention relates to a multi-granularity semantic chunk based entity attribute and attribute value extracting method, and belongs to the technical field of Web mining and information extraction. The method comprises the following steps that a corpus set is constructed and free text extraction is performed; a corpus is subjected to word segmentation, part-of-speech tagging and phrase recognition; the corpus is subjected to semantic role labeling; the corpus is subjected to dependency grammar analysis; the corpus is subjected to semantic dependency analysis; candidate entities, attributes and attribute value triads based on three granularities of words, phrases and semantic roles are extracted; the candidate entities, attributes and attribute value triads are corrected and subjected to error classification by means of a trained classifier. Compared with the prior art, the entities, attributes and attribute value triads based on three granularities of words, phrases and semantic roles are automatically extracted from a free text, the entity attribute and attribute value extraction accuracy and efficiency are improved, and the wide application prospect is achieved in the fields of theme detection, information retrieval, automatic abstracting, question and answer systems and the like.
Owner:BEIJING INSTITUTE OF TECHNOLOGYGY

Entity relationship recognition method and apparatus

The present invention relates to an entity relationship recognition method and apparatus. The method comprises obtaining a statement sequence from a target text in a corpus, and performing named entity recognition and dependency grammar marker on the statement sequence to obtain a marked text sentence; matching and retrieving the marked text sentence on basis of an entity relationship seed to obtain a training example; replacing the entity relationship seed word in the training example with predetermined identification, processing the training example after replacement combined with the named entity recognition and the dependency grammar marker, and generating a candidate rule; fuzzifying the candidate rule to obtain fuzzy rules; determining whether the fuzzy rules comprise a new rule; and retrieving the corpus according to the fuzzy rules to obtain a seed set when the fuzzy rules comprise the new rule, and using the obtained seed set as an entity relationship recognition result. Manual participation can be effectively reduced, dependence on the calibrated corpus is reduced, a new entity relationship can be found timely, and the entity relationship recognition method and apparatus are self-adaptive to entity relationship mining in different fields.
Owner:LETV HLDG BEIJING CO LTD +1

Method for automatically correcting syntax errors in English composition based on multivariate features

The invention relates to a method for automatically correcting syntax errors in an English composition based on multivariate features. The method comprises a syntax error correcting preprocessing module, a syntax error correcting model training module and a syntax error checking and correcting module, wherein the syntax error correcting preprocessing module carries out part-of-speech tagging of words, syntactic parsing of sentences and word frequency statistics of words for input training texts; the syntax error correcting model training module extracts words and part-of-speech context syntactic features thereof, words and part-of-speech structure-dependent syntactic features thereof and words and part-of-speech syntactic features thereof, calculates syntactic feature weight of words and outputs a statistical model of syntax error correcting for a part-of-speech tagging library of input words, a syntax tree structure library of sentences, a word frequency statistics library of words and a part-of-speech and syntax confusion set of words; and the syntax error checking and correcting module utilizes the statistical model of syntax error correcting and a rule model of syntax error correcting to correct syntax errors in a composition to be corrected and outputs the corrected results of the syntax errors in the English composition. The method can automatically correct eleven kinds of common English syntax errors in the English composition.
Owner:GUILIN UNIV OF ELECTRONIC TECH

Neural network and tag library-based statement similarity algorithm

The invention discloses a neural network and tag library-based statement similarity algorithm in the information retrieval field, which is characterized by comprising the following steps: (1) loading a semantic dictionary and a synonym lexicon with a neural network respectively; (2) inputting a complete statement to be analyzed; (3) analyzing the integral syntactic structure of the statement by using a dependency grammar analyzer, then layering the statement, and acquiring an effective component sequence of the statement; (4) determining a corresponding header field of the statement in an exUCL tag library according to the layering and the effective component sequence thereof; and (5) judging whether the statement has similar word pairs, if so, calculating the similarity of the statement, otherwise, re-inputting a new statement to be analyzed, and performing the similarity calculation again. The algorithm combines the advantages of dependency-based statement similarity algorithm and edit distance algorithm so that the calculation precision is greatly improved.
Owner:成都安客云网络科技有限公司

Question sentence classification method suitable for automatic question and answer system

The invention discloses a question sentence classification method suitable for an automatic question and answer system. The method is suitable for the technical field of computers. The method comprises the steps of obtaining to-be-classified question sentences, and performing word segmentation and part-of-speech tagging by utilizing a word segmentation tool; obtaining to-be-classified question sentences subjected to word segmentation operation, and performing preprocessing; obtaining preprocessed to-be-classified question sentences, finding out keywords in the question sentences to form a keyword set, calculating weights of the keywords in the keyword set according to an improved TF-IDF algorithm, and taking first N keywords according to a specific method; according to a dependency grammaranalysis method, extracting relationship features of subject-predicate, verb-object and modifier-head dependency grammars of the keywords in the question sentences; and classifying keyword vectors byutilizing a trained Naive Bayesian model to obtain a classification result. The question sentence classification accuracy and efficiency are improved.
Owner:NANJING UNIV OF POSTS & TELECOMM

User intention identification method and user intention identification system

The invention relates to a user intention identification method and system. The method comprises a key entity identification step of performing natural language processing technology analysis for userconversation texts by taking words as units to obtain named entities, which serve as user intention parameter candidates, and a user intention judgment step of performing dependency grammar analysison the user conversation texts, performing word-by-word fuzzy matching according to a preset user intention key candidate set to obtain an intention keyword, judging a dependency relationship betweenthe intention keyword and the user intention parameter candidates obtained in the key entity identification step, and outputting a user intention identification result only in the presence of the dependency relationship. According to the method and the system, a user intention can be identified more accurately and comprehensively.
Owner:CHINA UNIONPAY

Medical data collection and analysis method and system

The invention relates to a medical data collection and analysis method, comprising the following steps of 1, uploading original data to a data platform; 2, converting the original data to data in an RDF (Resource Description Framework) format by using a semantic annotation algorithm based on a conditional random field in combination with a dependency grammar; 3, associating RDF data of a same patient in the data processed in step 2 through a data mining algorithm, and storing the RDF data into an Hbase database based on a distributed file system; 4, analyzing data in the Hbase database by using a statistical method and a machine learning method to obtain analysis conclusions; 5, organizing and classifying the analysis conclusions to construct a therapeutic scheme knowledge base. According to the medical data collection and analysis method and a system, the whole clinical diagnosis and treatment data of the patient are collected pertinently, a large number of data are analyzed to clinically diagnose in assistance, forecast disease and analyze the patient; the medical data collection and analysis method and the system can clinically help doctors to make an effective, exact and individualized therapeutic scheme.
Owner:XIEHE HOSPITAL ATTACHED TO TONGJI MEDICAL COLLEGE HUAZHONG SCI & TECH UNIV

User comment attribute extraction method based on bi-directional dependency syntactic tree representation

The invention discloses a user comment attribute extraction method based on bi-directional dependency syntactic tree representation. The method comprises the steps that 1) given user comment text is preprocessed, and a dependency syntactic tree is generated; 2) a bi-directional dependency syntactic tree representation network is established, and dependency characteristics among words are extracted; 3) the dependency characteristics are input into a bi-directional LSTM nerve network, sequence characteristics among the words are extracted on the basis of the dependency characteristics, and accordingly the dependency characteristics are effectively combined with the sequence characteristics; 4) the combined characteristics are coded by using a linear chain condition random field; 5) a Viterbialgorithm is used for conducting decoding to obtain comment attributes of all text. According to the user comment attribute extraction method, the aim is effectively achieved that in user comment attribute extraction tasks, the dependency characteristics of text syntax are extracted and efficiently combined with the sequence characteristics to achieve end-to-end training; the condition random field is used for coding the combined characteristics, the Viterbi algorithm is used for decoding the combined characteristics, and the good effect can be achieved in the user comment attribute extraction tasks.
Owner:SOUTHWEST JIAOTONG UNIV

Dependency grammar analysis method and device and auxiliary classifier training method

The invention discloses a dependency grammar analysis method and device and an auxiliary classifier training method. The dependency grammar analysis method includes the steps of preliminary analysis: using a universal dependency grammar analyzer to carry out dependency grammar analysis on sentences of a target field to generate analysis results in a predetermined number, characteristic extraction: extracting the high-order characteristics of at least parts of edges from a dependency relationship tree serving as the analysis results, and classification: using an auxiliary classifier to classify the analysis results in the predetermined number based on the high-order characteristics, and selecting final dependency grammar analysis results from the analysis results in the predetermined number according to the classification results.
Owner:FUJITSU LTD

Address knowledge processing method and device based on graphs

The invention relates to an address knowledge processing method and an address knowledge processing device based on graph. The method comprises the following steps: (10) syncopating an address text into an address word sequence; (20) performing part-of-speech tagging on each address word in the address word sequence according to a predefined part-of-speech tagging set which reflects features of address words; (30) performing dependency grammar analysis on the tagged address word sequence according to a predefined address word dependency rule, and using physical address words as nodes, so as to obtain a side which reflects the dependency among the physical address words; (40) comparing with the original content of an address knowledge base, and inputting the newly added nodes or side into the address knowledge base. The invention further provides the address knowledge processing device based on the graphs. According to the address knowledge processing method and the address knowledge processing device based on the graphs, address information can be organized according to the inherent logic among addresses, so as to form the address knowledge base; the address query precision can be increased by utilizing the address knowledge base; a reasoning function based on address knowledge can be supported.
Owner:SHENZHEN AUDAQUE DATA TECH

Statement hotspot extraction method and system

The invention relates to a statement hotspot extraction method. The method comprises the following steps of extracting at least one emotion keyword from a document, wherein the document comprises at least one statement; performing dependency grammar analysis on the first statement to generate a dependency syntax tree; trimming the dependency syntax tree based on the first emotion keyword to form aresult syntax tree, wherein the first emotion keyword comes from the first statement; forming a text vector set based on the result syntax tree; and clustering the text vector set to form at least one class of interest. The method is beneficial for improving the accuracy of clustering results, does not influence original semantics, and can more accurately express hotspots concerned by customers.
Owner:CHINA UNIONPAY

Construction method and system of incremental-translation-oriented structured language model

InactiveCN102945231ATest set perplexity dropsPerplexity downSpecial data processing applicationsDiscriminantAlgorithm
The invention discloses a construction method and a construction system of an incremental-translation-oriented structured language model. The method comprises the following steps: step 1, performing dependency grammar analysis on incrementally generated translation segments to obtain dependency tree segment assembly; step 2, extracting a discriminant feature instance on the dependency tree segment assembly, and calculating a feature score of the discriminant feature instance by a discriminant dependency grammar model; step 3, performing pruning on the dependency tree segment assembly according to the feature score, taking a maximal value of the feature score as the score of the structured language model, reserving the segment having the highest score in the structured language model, and acquiring the optimized dependency tree segment assembly; and step 4, splicing the next translation segment onto the dependency tree segment assembly through a shift-specification operation, repeating the step 1, the step 2 and the step 3 until finishing the translation, and generating the complete dependency tree. According to the construction method and the construction system of the incremental-translation-oriented structured language model, the grammar information and the long-distance dependency information can be merged into the language model, the effective optimization algorithm is proposed for dynamic calculation of the structured language model in a decoding process, and the translation quality is improved.
Owner:INST OF COMPUTING TECH CHINESE ACAD OF SCI

Word sense disambiguation method fusing sentence local context with document domain information

The invention relates to a word sense disambiguation method fusing a sentence local context with document domain information, and belongs to the technical field of natural language processing. The word sense disambiguation method comprises the steps of: 1, carrying out dependency grammar analysis on a sentence where an ambiguous word is positioned, and obtaining sentence local context related words with a direct dependency relationship with the ambiguous word; 2, carrying out dependency grammar analysis on a domain document set, collecting all dependency tuples which the domain document set contains, and constructing a dependency tuple library; 3, carrying out statistic analysis on the dependency tuple library, and finding a group of domain related words with the closest relationship with the ambiguous word; 4, according to a dependency distribution similarity of the domain related words and word sense relevance between the domain related words and the local context, determining disambiguation weights of the domain related words; 5, merging the sentence local context related words with the domain related words, and constructing a related word set; and 6, according to weighted accumulation relevance of each word sense of the ambiguous word and the related word set, judging a correct word sense. According to the method disclosed by the invention, adaptability of a word sense disambiguation system on a specific domain can be improved, and disambiguation accuracy can be improved.
Owner:山东经伟晟睿数据技术有限公司

Emotion analysis method and device and electronic equipment

The invention discloses an emotion analysis method and device and an electronic device. The method comprises the steps of determining a to-be-analyzed sentence in a to-be-analyzed text; performing subject matching on each to-be-analyzed sentence based on a preset subject information base, wherein the preset subject information base comprises multiple pieces of subject information; when the targetsubject is matched in the to-be-analyzed sentence, determining the weighting coefficient of each word in the to-be-analyzed sentence to the target subject by utilizing a subject emotion self-attentionmechanism, and forming the subject emotion self-attention mechanism by combining dependency grammar modeling; determining emotion words in the to-be-analyzed sentence and polarity of the emotion words; utilizing the emotion words, the polarity of the emotion words and the weighting coefficient to determine the emotion value of the to-be-analyzed sentence to the target subject; and combining the emotion values of all the to-be-analyzed sentences matched with the target subject in the to-be-analyzed text, and determining the emotion value of the to-be-analyzed text for the target subject. According to the invention, the emotional tendency of the target subject in the text can be accurately determined.
Owner:北京百分点科技集团股份有限公司

Induction of grammar rules

A method of grammar rule induction comprises obtaining a monolingual set of phrases from a bilingual corpus of translation pairs. For each of the monolingual phrases in turn, initialising, with inactive edges formed from headwords identified in the phrase, the agenda of a dependency grammar chart parser arranged to form packed edges in the chart. Running the chart parser and adding to the agenda, for each inactive edge removed from the agenda, one or more active edges created as if all possible grammar rules existed. When the agenda is empty, ascertaining the alternations of each edge in the packed edge corresponding to the complete phrase, and finding their respective highest frequencies. For the set of phrases, summing, for each alternation, its respective highest frequencies, and ranking the sums. Then, selecting alternations in rank order to form the required set of grammar rules until the required set has become sufficient such that for each monolingual phrase there exists at least one analysis corresponding to the required set of grammar rules.
Owner:BRITISH TELECOMM PLC

Bayesian word sense disambiguation method based on mass pseudo-data

The invention particularly relates to a new bayesian word sense disambiguation method based on mass pseudo-data. The problems that a current word sense disambiguation method is poor in disambiguation effect and capable of wasting time and labor when disambiguation knowledge is obtained are solved. The new bayesian word sense disambiguation method includes the steps that through a dependency grammar analyzer, training examples containing ambiguous words in a training corpus base are subjected to syntactic analysis, and tuples with the dependence relationship with the ambiguous words are collected; then through a machine translation system, example sentences containing the tuples in a machine translation corpus base are searched. The steps are repeatedly carried out in a mode, the searched example sentences are added into a pseudo-training corpus base, and then through the training corpus base and the pseudo-training corpus base, a bayesian disambiguation model is trained; word meanings of the ambiguous words are decided through the disambiguation model, and on the basis of a small amount of manually-annotated corpuses, the data sparsity problem of word sense disambiguation can be effectively solved, the accuracy of word sense disambiguation is increased, and the new bayesian word sense disambiguation method has broad development prospects.
Owner:SHANXI UNIV

Method and equipment for recalling sentence template based on seed sentence

The invention provides a method and equipment for recalling a sentence template based on a seed sentence. The method specifically comprises the following steps of obtaining corpuses, and determining a dependency syntax tree of each sentence of each corpus, wherein the number of corpuses exceeds a certain value, and each corpus relates to the seed sentence; according to the structure similarity of the dependency grammar trees, recalling each sentence in each corpus based on a tree structure of a dependency grammar tree of the seed sentence, and determining recalled sentences; calculating the similarity between the recalled sentences and the seed sentence, and determining the relevancy between each recalled sentence and the seed sentence; according to the relevancy, selecting a recalled sentence as the sentence template. The method at least has one of the following characteristics that the type of the recalled sentences is richer; the recalled sentences basically have no grammar mistake; components of the recalled sentences are richer; the semantics deviation of the recalled sentences is little; the recalled sentences are designed with a template, and an artificial template is not needed.
Owner:广东惠禾科技发展有限公司

Method for mapping Chinese problems on basis of LDA (latent Dirichlet allocation)

The invention discloses a method for mapping Chinese problems on the basis of LDA (latent Dirichlet allocation). The method includes classifying document libraries by the aid of LDA theme models; classifying word characteristics for the problems by the aid of Softmax regression models; assigning high weights to notional words according to difference of categories of the word characteristics, assigning low weights to functional words according to the difference of the categories of the word characteristics and allowing the weights of different word characteristics of the notional words to be different from one another; finding dependency relations of terms in sentences by means of syntactic analysis on the basis of dependency grammar; assigning different weights according to the difference of components of the terms in the sentences; multiplying two portions to obtain a weight of each word in each problem; establishing relationships by the aid of weighted distribution of terms in the problems and distribution of themes and lexical terms in documents according to Bayesian rules. The method has the advantages that the documents are classified on the basis of the LDA theme models, the different weights are distributed on the reference of the word characteristics of the lexical terms in the interrogative sentences and the components in the sentences, accordingly, effects of important lexical terms can be improved during classification, and the Chinese problem mapping accuracy can be improved.
Owner:识因智能科技(北京)有限公司

Method for recognizing social network sock puppet model based on frequency sub-tree

The invention relates to a method for recognizing a social network sock puppet model based on a frequency sub-tree. The method comprises the following steps: 1) acquiring blog text data; 2) pre-processing the data; 3) utilizing dependency grammar analysis software to perform dependency grammar analysis on the blog text, and acquiring a grammar analysis result for each blog post; 4) adopting a Pre-Order-String method for expressing a dependency grammar tree acquired in the step 3); 5) utilizing the method adopted in the step 4) to acquire the analysis result for each text in someone's blog list; and 6) analyzing two accounts, to be subjected to judgment for sock puppet relation, according to the steps 1)-5), thereby acquiring a frequency dependency grammar tree of two sock puppet accounts. According to the method for recognizing the social network sock puppet model based on the frequency sub-tree, provided by the invention, after a large amount of data training, the method can be applied to the management of the social network for the network safety and the network crime trace of the government, and the sock puppet account can be quickly and effectively recognized.
Owner:BEIJING TECHNOLOGY AND BUSINESS UNIVERSITY

E-commerce comment sentiment analysis model based on part-of-speech features and viewpoint features in combination with convolutional neural network

The invention discloses an e-commerce comment sentiment analysis model based on part-of-speech features and viewpoint features in combination with a convolutional neural network. The method comprisesthe following steps: step 1, formulating rules by utilizing part-of-speech, dependency grammar analysis and semantic dependency analysis to extract viewpoint features; step 2, introducing part-of-speech features and viewpoint features by adopting a vector splicing method on the basis of word vector representation; and step 3, taking a word vector and an extended feature vector as two input channels of a convolutional neural network to perform sentiment analysis. The problem that comments are inconsistent with scores of the comments is solved, and then merchants are helped to improve the service quality and upgrade the product performance.
Owner:HARBIN UNIV OF COMMERCE

A Semantic Role Labeling and Semantic Extraction Method for Unrestricted Path Natural Language

The invention relates to a semantic role tagging and semantic extracting method of an unrestricted path natural language. The method comprises the steps that firstly Chinese path natural language linguistic data under an unrestricted condition is collected and a Chinese path natural language linguistic database is built; secondly, automatic tagging of the path natural language linguistic data is achieved by using the semantic role tagging method based on chunk analysis and dependency grammar analysis; finally, according to a semantic role tagging result, path unit division is sequentially conducted, and navigational semantic information of path units is extracted. Path natural language semantic role tagging is conducted by using the semantic role tagging method based on chunk analysis and dependency grammar analysis, according to the extracted semantic role tagging result, path unit division is conducted, and finally the semantic information of the path units is extracted. By means of the method, path unit division can be sequentially and accurately conducted, the path semantic information can be accurately extracted, and accordingly the method can provide guidance for smooth implementation of asking for directions and navigation with a robot.
Owner:NORTH CHINA ELECTRIC POWER UNIV (BAODING)

Data classification method for semantic analysis industries

The invention discloses a data classification method for semantic analysis industries. The data classification method includes the steps that voice data of telephone communication with clients is obtained; the voice data is subjected to voice recognition, and corresponding text data is obtained; the text data is preprocessed, and is divided into statements and symbols; an industry data classification library is established; the statements are subjected to dependency grammar analysis, and an industry data classification expression tree is established; based on the industry data classification expression tree, combined with the symbols and correction of the industry data classification library, corresponding industry data classification values are computed.
Owner:杭州声讯网络科技有限公司

A User Comment Attribute Extraction Method Based on Bidirectional Dependency Syntax Tree Representation

The invention discloses a user comment attribute extraction method based on bi-directional dependency syntactic tree representation. The method comprises the steps that 1) given user comment text is preprocessed, and a dependency syntactic tree is generated; 2) a bi-directional dependency syntactic tree representation network is established, and dependency characteristics among words are extracted; 3) the dependency characteristics are input into a bi-directional LSTM nerve network, sequence characteristics among the words are extracted on the basis of the dependency characteristics, and accordingly the dependency characteristics are effectively combined with the sequence characteristics; 4) the combined characteristics are coded by using a linear chain condition random field; 5) a Viterbialgorithm is used for conducting decoding to obtain comment attributes of all text. According to the user comment attribute extraction method, the aim is effectively achieved that in user comment attribute extraction tasks, the dependency characteristics of text syntax are extracted and efficiently combined with the sequence characteristics to achieve end-to-end training; the condition random field is used for coding the combined characteristics, the Viterbi algorithm is used for decoding the combined characteristics, and the good effect can be achieved in the user comment attribute extraction tasks.
Owner:SOUTHWEST JIAOTONG UNIV

Translation rule extraction method and translation method based on dependency grammar tree

The invention provides a translation rule extraction method and a translation method based on a dependency grammar tree. A translation sequence adjusting relationship is directly expressed in the translation rule that a source end is used as a head word and a dependency grammar tree fragment and a target end consisting of modifiers of the head word are used as strings, and thus the translation rule can be used for definitely guiding the translation process. According to the translation rule extracted by the method, the performance of the translation method based on the dependency grammar tree can be improved. On a data set of 1.54 million of parallel bilingual corpus, the performance of a dependency grammar tree to a string translation model is improved by 1.68 BLEU (Bilingual Evaluation Understudy) points compared with that of a component tree to the string model.
Owner:INST OF COMPUTING TECH CHINESE ACAD OF SCI

Construction method and system of incremental-translation-oriented structured language model

InactiveCN102945231BTest set perplexity dropsPerplexity downSpecial data processing applicationsDiscriminantAlgorithm
The invention discloses a construction method and a construction system of an incremental-translation-oriented structured language model. The method comprises the following steps: step 1, performing dependency grammar analysis on incrementally generated translation segments to obtain dependency tree segment assembly; step 2, extracting a discriminant feature instance on the dependency tree segment assembly, and calculating a feature score of the discriminant feature instance by a discriminant dependency grammar model; step 3, performing pruning on the dependency tree segment assembly according to the feature score, taking a maximal value of the feature score as the score of the structured language model, reserving the segment having the highest score in the structured language model, and acquiring the optimized dependency tree segment assembly; and step 4, splicing the next translation segment onto the dependency tree segment assembly through a shift-specification operation, repeating the step 1, the step 2 and the step 3 until finishing the translation, and generating the complete dependency tree. According to the construction method and the construction system of the incremental-translation-oriented structured language model, the grammar information and the long-distance dependency information can be merged into the language model, the effective optimization algorithm is proposed for dynamic calculation of the structured language model in a decoding process, and the translation quality is improved.
Owner:INST OF COMPUTING TECH CHINESE ACAD OF SCI

A medical data collection and analysis system

The invention relates to a medical data collection and analysis method, comprising the following steps: 1. Uploading original data to a data platform; 2. Converting the original data into data in RDF format by using a semantic annotation algorithm based on a combination of conditional random fields and dependency grammar ; 3. Through the data mining algorithm, the RDF data of the same patient in the data processed in the previous step are associated and stored in the Hbase database based on the distributed file system; 4. Using statistical methods and machine learning methods to analyze the data in the Hbase database Analyze the data and draw the analysis conclusion; 5. Organize and classify the analysis conclusion, and construct the treatment plan knowledge base. The present invention collects the entire clinical diagnosis and treatment data of personnel in a targeted manner. By analyzing a large amount of data, it can make clinical auxiliary diagnosis, disease early warning and analyze the patient's behavior, which can help doctors formulate more effective, accurate and personalized customized treatment plan.
Owner:XIEHE HOSPITAL ATTACHED TO TONGJI MEDICAL COLLEGE HUAZHONG SCI & TECH UNIV

Extraction Method of Entity Attributes and Attribute Values ​​Based on Multi-granularity Semantic Blocks

The invention relates to a multi-granularity semantic chunk based entity attribute and attribute value extracting method, and belongs to the technical field of Web mining and information extraction. The method comprises the following steps that a corpus set is constructed and free text extraction is performed; a corpus is subjected to word segmentation, part-of-speech tagging and phrase recognition; the corpus is subjected to semantic role labeling; the corpus is subjected to dependency grammar analysis; the corpus is subjected to semantic dependency analysis; candidate entities, attributes and attribute value triads based on three granularities of words, phrases and semantic roles are extracted; the candidate entities, attributes and attribute value triads are corrected and subjected to error classification by means of a trained classifier. Compared with the prior art, the entities, attributes and attribute value triads based on three granularities of words, phrases and semantic roles are automatically extracted from a free text, the entity attribute and attribute value extraction accuracy and efficiency are improved, and the wide application prospect is achieved in the fields of theme detection, information retrieval, automatic abstracting, question and answer systems and the like.
Owner:BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products