Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

85 results about "Phrase extraction" patented technology

System and method for automatically extracting interesting phrases in a large dynamic corpus

A phrase extraction system combines a dictionary method, a statistical / heuristic approach, and a set of pruning steps to extract frequently occurring and interesting phrases from a corpus. The system finds the “top k” phrases in a corpus, where k is an adjustable parameter. For a time-varying corpus, the system uses historical statistics to extract new and increasingly frequent phrases. The system finds interesting phrases that occur near a set of user-designated phrases. The system uses these designated phrases as anchor phrases to identify phrases that occur near the anchor phrases. The system finds frequently occurring and interesting phrases in a time-varying corpus is changing in time, as in finding frequent phrases in an on-going, long term document feed or continuous, regular web crawl.
Owner:IBM CORP

Phrase extraction using subphrase scoring

An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in an cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.
Owner:GOOGLE LLC

Phrase-based document clustering with automatic phrase extraction

Meaningful phrases are distinguished from chance word sequences statistically, by analyzing a large number of documents and using a statistical metric such as a mutual information metric to distinguish meaningful phrases from groups of words that co-occur by chance. In some embodiments, multiple lists of candidate phrases are maintained to optimize the storage requirement of the phrase-identification algorithm. After phrase identification, a combination of words and meaningful phrases can be used to construct clusters of documents.
Owner:MICRO FOCUS LLC

Knowledge network-based text indexing system and method

The invention discloses a knowledge network-based text indexing system and method. The text indexing system comprises a single text feature extraction unit, a multi-text word relation extraction unit, a knowledge tree generating unit, a knowledge tree application unit and a knowledge base storage unit. The text indexing method comprises the following steps of: partitioning words in a text input to the text indexing system, and acquiring text feature words in the text; deducing a class word TAG corresponding to the text according to node positions of a knowledge tree corresponding to the text feature words; and judging the validity of the TAG through a judgment type model based on the TAG, then extracting a reliable TAG word set, and repositioning a text feature word set through the reliable TAG word set to form a reliable text feature word set. According to the system and the method, content word extraction, class labeling and phrase extraction are integrated, so that the extraction effects can be mutually promoted; and the semantics of the words are expressed through the nodes of the knowledge network, so that different meanings are reduced.
Owner:HYLANDA INFORMATION TECH

Machine translation system and machine translation method based on syntactic analysis and hierarchical model

ActiveCN102214166ASolve the problem of non-continuous fixed collocationImprove translationSpecial data processing applicationsCollocationPart of speech
The invention discloses a machine translation system and a machine translation method based on a syntactic analysis and hierarchical model. The machine translation system comprises a word alignment module, a phrase extraction module, a part-of-speech and syntax tagging module, a syntax-based non-contiguous phrase extraction module, a non-contiguous-phrase-based translation module, and a grading output module. In the machine translation system and the machine translation method, syntactic analysis is carried out on the basis of a general contiguous-phrase-based machine translation model, so that a syntax-based phrase rule base is extracted from a bilingual sentence alignment text, the problem of non-continuous fixed collocation of the context of the whole sentence is solved, and the invention accords with the syntactic characteristics of a language. Translation is carried out based on a non-contiguous phrase rule base and a phrase alignment table, and a translation result is graded based on an assessment model, so a translation effect is effectively improved.
Owner:SAMSUNG ELECTRONICS CHINA R&D CENT +1

Public opinion vertical search analysis system and method

The invention relates to network information processing technology, and discloses a public opinion vertical search analysis system. The system for text-based network public opinion search analysis comprises a vertical search engine crawler module, a template-based information extraction module, a text orientation analysis module based on phrase extraction, a text orientation analysis module based on vocabulary statistical model. In comparison with the prior art, the accuracy of the information emotion orientation algorithm based on phrase model and the vocabulary statistical model is improved by about 5%, and the algorithm has remarkable improvement effect. Meanwhile, the execution efficiency of processing is improved by designing a multi-threading method, thereby realizing quicker and more accurate public opinion search analysis effect.
Owner:TIANJIN UNIV

Systems and methods for semantic keyword analysis

In various embodiments, a method for generating from one or more keywords a list of related topics for organic search includes receiving, by a topic tool, an input of one or more keywords for which to generate a list of related topics. The method may further include acquiring, by a crawler, content from a plurality of different web content sources via one or more networks. The method may also include applying, by the topic tool, to the acquired content an ensemble of one or more key phrase extraction algorithms, one or more graph analyses algorithms and one or more natural language processing algorithms to identify a set of semantically relevant topics scored by relevance. The method may also include generating, by the topic tool, from the set of semantically relevant topics, a knowledge graph of related topics for the input of the one or more keywords. The method may further include outputting, by the topic tool based at least partially on the knowledge graph, an enumerated list of topics ranked by at least a relevance score.
Owner:INFORMITE

Document Key Phrase Extraction Method

A computer-implemented method of extracting key phrases from a document is disclosed comprising the steps of accessing a repository comprising linked subjects, the repository comprising first and second data structures representing the relationship between said subjects using different representation criteria; pruning the first data structure by removing links between subjects based on a further relationship between said subjects in the second data structure; matching phrases in said document to subjects in the pruned first data structure; further pruning the pruned first data structure by removing unmatched subjects that are not linked to matched subjects; determining a ranking for each matched subject; and selecting key phrases using the determined subject rankings. A computer program for implementing the steps of this method when executed on a computer is also disclosed.
Owner:HEWLETT-PACKARD ENTERPRISE DEV LP

Systems and methods for semantic keyword analysis for paid search

ActiveUS20160125462A1Efficient and user-friendlyWeb data indexingRelational databasesPhrase extractionNatural language
According to various embodiments, a method for generating from one or more keywords a list of recommended keywords for using in paid search advertising includes identifying, via a tool, one or more keywords to be used in a paid search advertising campaign at an identified website. The method may further include acquiring, by a crawler, content from a plurality of different web content sources via one or more networks. The method may also include applying, by the tool, to the acquired content an ensemble of one or more key phrase extraction algorithms, one or more graph analyses algorithms and one or more natural language processing algorithms to identify a set of semantically relevant keywords ranked by a relevance score. The method may further include generating, by the tool from the set of semantically relevant keywords, a knowledge graph of recommended keywords to replace or supplement the one of more keywords. The method may further include outputting, by the tool based at least partially on the knowledge graph, an enumerated list of recommended keywords to one of replace or supplement the one or more keywords to be used for the paid search advertising campaign at the identified website.
Owner:INFORMITE

Method for obtaining question and answer pairs from unstructured text based on deep learning

The invention relates to a method for obtaining question and answer pairs from an unstructured text based on deep learning. The method comprises the following steps of performing text normalization processing; based on a deep neural network model, sentence classification and pairing and key phrase extraction are carried out; obtaining question and answer pairs in the text; crawling question and answer pairs outside the text; question and answer pair summary duplicate removal. According to the method, for the problem that question and answer pairs are difficult to obtain, the scale question andanswer pairs are automatically and efficiently obtained by effectively utilizing easily-obtained unstructured document resources in combination with the use of the deep neural network model for manual proofreading and supplementary use, so that the knowledge base construction cost is reduced, and the knowledge base construction speed is increased.
Owner:北京中科汇联科技股份有限公司

Keyphrase extraction beyond language modeling

A system for extracting a key phrase from a document includes a neural key phrase extraction model (“BLING-KPE”) having a first layer to extract a word sequence from the document, a second layer to represent each word in the word sequence by ELMo embedding, position embedding, and visual features, and a third layer to concatenate the ELMo embedding, the position embedding, and the visual features to produce hybrid word embeddings. A convolutional transformer models the hybrid word embeddings to n-gram embeddings, and a feedforward layer converts the n-gram embeddings into a probability distribution over a set of n-grams and calculates a key phrase score of each n-gram. The neural key phrase extraction model is trained on annotated data based on a labeled loss function to compute cross entropy loss of the key phrase score of each n-gram as compared with a label from the annotated dataset.
Owner:MICROSOFT TECH LICENSING LLC

Generalized reordering statistic translation method and device based on non-continuous phrase

The invention provides a generalized reordering statistic translation method and a device based on non-continuous phrases. The device consists of a word alignment module, a language model module, a phrase extraction module, a maximum entropy classifier training module, a minimum error training module and a decoder, provides a generalized reordering module for statistical machine translation basedon phrases, introduces non-continuous phrases, combines continuous phrases and non-continuous phrases by using regulations for any continuous series in a specified script to be translated so as to acquire continuous target translations as more as possibly, and combines the reordering model with a reordering sub model simultaneously to realize local and global reordering of the phrases so as to acquire final target translations for sentences in the source language. The model can grasp local and global reordering knowledge of the phrases, and can acquire the generalization capability of the phrases through non-continuous phrases. Experiment results prove that the model improves the BLUE rating of the reordering model based on the maximum entropy and a translation model based on hierarchicalphrases by about 1.54 percent and 0.66 percent.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

Chinese phrase string-based fine-grained thematic information extraction method

The invention discloses a Chinese phrase string-based fine-grained thematic information extraction method. The method comprises the following steps: firstly carrying out pre-processing such as Chinese word segmentation, stop word processing and part-of-speech tagging on an input original text set; during the pre-processing, carrying out expand vocabulary input so as to improve the correctness of Chinese word segmentation; after the pre-processing stage is finished, obtaining a processed structured text set; carrying out part-of-speech-based regular expression matching so as to obtain a preliminary phrase screening result; and carrying out statistics on string frequency information of each word, selecting seed words, and expanding the phrases to finally obtain a phrase extraction result. Experiments prove that the text extraction method can be used for effectively and concisely extracting text phrases, and has certain reliability and applicability.
Owner:SOUTH CHINA UNIV OF TECH

Method and system for key phrase extraction and generation from text

A system and method combining supervised and unsupervised natural language processing to extract keywords from text in natural language processing, the method includes receiving, through a processor, one or more entities through an input processing unit and converting the one or more entities into a standard document object. Further, parsing the standard document object through a text processing engine into one or more of a sentence and a token and selecting through a candidate identification engine one or more right candidates to be ranked. Further, assigning one or more scores to the one or more right candidates, ranking the one or more right candidates through a graph based ranking engine, creating a connected graph between the ranked one or more right candidates and assigning, through a phrase embedding engine, an edge weight to one or more edges between a right candidate and another right candidate.
Owner:INFOSYS LTD

Topic phrase extraction method

The invention relates to a topic phrase extraction method. The topic phrase extraction method includes preprocessing documents, seeking a document-topic set, a full text lexical chain set and a noun phrase set, seeking a central word set, seeking a candidate topic phrase set, and seeking a topic phrase set. The topic phrase extraction method has the advantages that topic phrases are extracted through combination between an LDA (latent Dirichlet allocation) model and a lexical chain, a knowledge base WordNet with complete semantic information outside a corpus can be utilized, a strong lexical chain can be acquired through semantic relevance calculation and strong chain rule filtration, and accordingly, the ambiguity of topic words is reduced greatly; the topic phrases are extracted according a central word extraction method and by N-P rule combination and deduplication steps, and topics are expressed by the topic phrases with rich semantic information, so that the problems such as low granularity and recognition degree of the topic words are solved, topic extraction accuracy and recall rate can be guaranteed, topic drifting is reduced, and needs of practical applications can be wellmet.
Owner:BEIJING INFORMATION SCI & TECH UNIV

Method for extracting phrases of statistical machine translation

The invention provides a method for extracting phrases of statistical machine translation, which comprises the following steps of: 1) acquiring a plurality of aligned sentence pair combinations from a bilingual language material from two directions, and calculating the priori probability of the plurality of aligned sentence pair combinations; 2) calculating the alignment probability of word pairsaccording to the sum of the priori probabilities of the word pairs of the plurality of aligned sentence pair combinations, and forming an alignment matrix by using the alignment probability of the word pairs; 3) calculating the frequency of phrase alignment according to the alignment matrix; and 4) calculating the relative frequency and the lexicalization probability of the phrase alignment according to the frequency of the phrase alignment. The method can effectively express all probable aligned phrase combinations, and improves the quality of phrase extraction, thereby being capable of improving the quality of translation which is performed according to the extracted phrases.
Owner:INST OF COMPUTING TECH CHINESE ACAD OF SCI

System and method for automatic key phrase extraction rule generation

A method, system, and non-transitory processor-readable storage medium for automatic key phrase rule generation for automatic key phrase extraction including: receiving a corpus sample including a plurality of documents containing text, receiving a plurality of identified key phrases which relate to a topic of the text of at least one corresponding document; assigning a part-of-speech to each word in the corpus sample; generating a part-of-speech pattern from each identified key phrase; and generating key phrase rules.
Owner:NICE LTD

Method and device for extracting hot word phrases from document set

The invention discloses a method and a device for extracting hot word phrases from a document set. The method comprises performing word segmentation on every clause in the document set through a word segmentation unit; judging the phrase boundary distinctness degree and or the closeness degree of the relation between words in every phrase which is formed by less than K continuous words in every clause through a judgment unit, wherein K is a positive integer and the boundary distinctness degree is used for indicating the matching freedom degree of phrases and words located on the left sides and the right sides of the phrases; at least extracting a part of phrases from the phrases which are formed by the less than K continuous words based on a judgment result of the phrase boundary distinctness degree and or the closeness degree of the relation between the words in every phrase through a hot word phrase extraction unit to serve as the hot word phrases to be output. Compared with the prior art, the hot word phrases can be accurately extracted from various corpuses.
Owner:TSINGHUA UNIV

Syntactic analysis and hierarchical phrase model based machine translation system and method

A syntactic analysis and hierarchical phrase model based machine translation system and method are provided. The machine translation system includes a word alignment module, a phrase extraction module, a gender syntactic annotation module, a syntactic based noncontiguous phrase abstract module, a noncontiguous phrase based translation module and an evaluation module. The machine translation system and method performs syntactic analysis based on a common contiguous phrase based machine translation model, and extracts a syntactic based noncontiguous phrase rule set from a bilingual aligned text, so as to address and issue of noncontiguous fixed custom in context of a whole sentence, and to comply with syntactic features of a language, translation is performed based on the noncontiguous phrase rule set and the phrase aligned table, and the translation results are evaluated based on the evaluation model, thereby improving the translation result.
Owner:SAMSUNG ELECTRONICS CO LTD +1

Method for obtaining extract from several frequently seen plants and uses of the extract

The invention relates to a method for preparing a sequoyitol-containing extract extracted from a plurality kinds of plants and an application of the extract in preventing and treating diabetes. The method comprises the following steps that: plants of trifolium, plants of leguminosae and plants of ginkgoaceae are used as raw materials and are extracted by methanol, alcohol or acetone to give extracts. The extracts undergo two-phrase extraction of water and chloroform or ethyl acetate or methylene chloride or n-butylalcohol. The solvent is reclaimed and the water part is separated and purified by macro porousresin to obtain the extract which is white powder. The main ingredient of the extract is sequoyitol with a content of between 10 and 99.9 percent. The extract obtained by the method and drugs, functional food or health care food made from the extract can be used for the treatment of diabetes and complications of diabetes.
Owner:XIANGBEI WELMAN PHARMA CO LTD

Recommendation model based on double-layer self-attention comment modeling

The invention discloses a recommendation model based on double-layer self-attention comment modeling. The model comprises a user portrait module, an article portrait module and an interaction module.The user portrait module and the article portrait module are the same in structure, firstly, related words spaced by any distance in sentences are flexibly combined by introducing self-attention in aphrase extraction layer, and article feature phrases and emotion phrases are formed; then article feature phrases are associated with emotion phrases by using self-attention in the phrase associationlayer to obtain emotion polarity of a user for each article feature for constructing a user-article portrait, and finally experimental verification is performed on the model on six data sets from Amazon 5-core. According to the method, the self-attention network is introduced into comment modeling of the recommendation system, the emotional polarity of the user to the'article features' is considered under the deep learning framework, the problems of noise and context loss caused by phrase extraction of the CNN are relieved, the user-article portrait is modeled in a fine-grained mode, and the recommendation performance is improved.
Owner:EAST CHINA NORMAL UNIV

Key phrase extraction method and device

The invention provides a key phrase extraction method and device and relates to the field of text processing technology. According to the key phrase extraction method and device, when a key phrase isdetermined, co-occurrence information of a word pair can be determined, and the key phrase in a text can be determined according to the co-occurrence information of the word pair. The co-occurrence information can represent the relation between all segmented words forming the word pair, the corresponding phrase mostly has the characteristics of a set phrase and proper nouns, and by using the co-occurrence information as a basis for determining the key phrase, the accuracy of key phrase extraction can be improved, and the precision of key phrase extraction is improved.
Owner:BEIJING QIYI CENTURY SCI & TECH CO LTD

Title high-frequency segmentation-based news hotspot phrase extraction method

The invention provides a title high-frequency segmentation-based news hotspot phrase extraction method. The method comprises the following steps of extracting a news title for each hotspot topic class; performing word segmentation on the news title, performing statistics on a word frequency of each segmented word, and screening out first n segmented words with the highest word frequencies to serveas a high-frequency word set; searching for a high-frequency segmentation boundary of the news title by using the high-frequency word set, and according to the segmentation boundary, performing segmentation on the news title to obtain candidate phrases; obtaining a candidate phrase set; and performing evaluation on each candidate phrase in the candidate phrase set, and performing screening to obtain the candidate phrases with the highest evaluation indexes, thereby serving as optimal phrases. The method has the advantages that one hotspot phrase which describes topic contents accurately in asimplified way can be extracted for each hotspot topic; a solution is provided for quick summarization and effective display of the hotspot topic contents of current news; and the information displayefficiency and the information obtaining efficiency of a user are improved.
Owner:贵州耕云科技有限公司

Method and device for extracting phrases in corpus text, storage medium and electronic equipment

The invention relates to a method and device for extracting phrases in a corpus text, a storage medium and electronic equipment, which belong to the technical field of big data. The method comprises the steps of performing word segmentation on the corpus text to obtain a plurality of words forming the corpus text; performing part-of-speech tagging on the words to obtain part-of-speech tags of thewords; utilizing the part-of-speech tags to determine a word combination meeting a preset part-of-speech dependency rule in the plurality of words; inputting the word combination into a pre-trained language model to obtain a word forming probability corresponding to the word combination; and determining the word combination corresponding to the word formation probability greater than a predetermined threshold as the extracted first phrase. The preset part-of-speech dependency rule can be obtained from a rule sharing block chain. According to the invention, the phrase extraction reliability inthe corpus text is effectively improved.
Owner:CHINA PING AN LIFE INSURANCE CO LTD

Future technology projection supporting apparatus, method, program and method for providing a future technology projection supporting service

A technology projection supporting apparatus includes a describing section extracting unit and steps for extracting a problem describing section and an effect describing section from each technical document, a technical phrase extraction unit and steps to extract a technical phrase, which indicates a matter to be achieved by a technology, from each of the problem describing section and the effect describing section, an impact determination unit and steps for determining a business impact to be made by the matter indicated by the extracted technical phrase, a naming unit and steps for naming the extracted technical phrase; and a technology map generation unit for generating a technology map. The generated technology map has axes indicating time length to be required to implement technology and business impact.
Owner:IBM CORP

Dictionary creation device for monitoring text information, dictionary creation method for monitoring text information, and dictionary creation program for monitoring text information

The purpose of the present invention is to create a dictionary for monitoring text information such that it is possible to achieve high-precision detection compared to prior art. A feature degree calculation unit (3) compares the statistics of a positive example group and a negative example group, and calculates the degree by which a given phase appears in the positive example group as the feature degree. A usefulness degree calculation unit (21) calculates a usefulness degree by using indices pertaining to the length of the phrase, the frequency at which the phrase appears within the positive example group, and the inclusion relationship between phrases for each phrase extracted by means of a phrase extraction unit (1). A detection condition determination unit (22) uses the usefulness degree calculated by means of the usefulness degree calculation unit (21) and the feature degree calculated by means of the feature degree calculation unit (3) to evaluate the appropriateness of each phrase as a detection condition by means of the product of the usefulness degree and the feature degree, and determines that the phrase is appropriate as a detection condition when the value of the product is greater than a threshold value.
Owner:NEC CORP

Device, program and method for assisting in preparing email

In an electronic mail creation support device, human relationship information on human relationship with a partner of the electronic mail is automatically created. For this, phrase extraction means (12) extracts a phrase in the electronic mail accumulated in a transmission / reception mail accumulation section (11). Phrase use frequency calculation means (13) calculates the use frequency of the extracted phrase and adds it to the use frequency of the phrase accumulated in a phrase use frequency accumulation section (14). Human relationship information creation means (16) weights, for each of the communication partners, the human relationship basic information correlated in advance to a phrase in a phrase correspondence list (15) by the use frequency of the phrase associated with the communication partner and accumulated in the phrase use frequency accumulation section (14) and creates human relationship information according to the weighted result. The human relationship information thus created is accumulated in a human relationship information accumulation section (17) for each of the communication partners.
Owner:PANASONIC CORP

Phrase extraction method and device, electronic equipment and storage medium

PendingCN110532567AEfficient Phrase Extraction ProcessingDependency AcquisitionText database queryingSpecial data processing applicationsPart of speechPhrase extraction
The invention discloses a phrase extraction method and a device, electronic equipment and a storage medium, and relates to the technical field of big data. The method comprises the steps of: segmenting a corpus text to obtain short sentences; extracting candidate phrases according to dependency relationships and part-of-speech among the words in the short sentences; and if the phrases meet the preset conditions, storing the candidate phrases into a phrase matching corpus. Therefore, the phrase matching mode conforming to the part-of-speech combination can be determined according to the dependency relationship and the part-of-speech among the words in the sentence, and the phrase extraction efficiency and accuracy of the corpus text are improved.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products