Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

1227 results about "Keyword extraction" patented technology

Keyword extraction is tasked with the automatic identification of terms that best describe the subject of a document. Key phrases, key terms, key segments or just keywords are the terminology which is used for defining the terms that represent the most relevant information contained in the document. Although the terminology is different, function is the same: characterization of the topic discussed in a document. The task of keyword extraction is an important problem in Text Mining, Information Retrieval and Natural Language Processing.

News keyword abstraction method based on word frequency and multi-component grammar

A method to extract new keywords based on word frequency and multiple grammars is provided, which belongs to the technology field of a natural language processing, and is characterized by extracting the potential models of part of speech of the multiple grammars of the keywords by researching characteristic part of speech of the keywords and adopting computer to assist excavation and taking the models as the basis of the keywords to extract arithmetic. When extracting the new keywords, firstly excavating the multiple phrases in text in accordance with the potential models of part of speech and extract candidate word set of the keywords, and then excavating potential keywords not loading from titles and add the potential keywords to the candidate keyword set. The application brings forward an improved single text word frequency/inverse text frequency value (tf/idf) format, introduces target-oriented characteristics, grades the candidate keywords, obtains the order of the candidate keywords and gives the keywords of news document after optimizing the results. Compared with the traditional keyword extraction method based on single text word frequency/inverse text frequency value (tf/idf), the method has higher recall rate under the condition of the same precision.
Owner:TSINGHUA UNIV

Method and device for extracting keyword based on graph model

The embodiment of the invention provides a method and a device for extracting a keyword based on a graph model. The method comprises the steps of acquiring a to-be-processed text, and segmenting words of the to-be-processed text to obtain candidate keywords corresponding to the to-be-processed text; finding out word vectors corresponding to the candidate keywords from a word vector model, wherein the word vector model includes the word vectors of the candidate keyword; constructing a word similarity matrix of the candidate keywords according to the word vectors; acquiring a language database corresponding to the to-be-processed text, calculating global information of the candidate keywords in the language database to obtain a global weight of the candidate keywords, and taking the global weight as an initial weight of the candidate keywords, wherein the global information represents the importance degree of the candidate keywords in the language database, and the language database at least includes a search log and a network document; and ranking the candidate keywords according to the initial weight and the word similarity matrix of the candidate keyword, and extracting the keyword of the to-be-processed text. By use of the embodiment, the keyword extraction accuracy rate is effectively improved.
Owner:BEIJING QIYI CENTURY SCI & TECH CO LTD

Deep learning-based text keyword extraction method

The invention discloses a deep learning-based text keyword extraction method. The method comprises the following steps of: firstly training a recurrent neural network model, wherein the used training data comprise a large amount of texts and keywords thereof, and the training target is maximizing text-based condition probability of the keywords; converting each text and the keyword thereof into word vectors, inputting the word vectors into the recurrent neural network model and updating network parameters by using a random gradient descent method; and after the model training is finished, converting a section of text, the keyword of which is to be extracted, into a word vector, inputting the word vector into the trained recurrent neural network model so as to generate the keyword of the section of text. According to the method disclosed by the invention, the extraction of text keywords is realized by learning an end-to-end model through data driving; and compared with the traditional statistics and linguistics-based method, the method disclosed by the invention is stronger in adaptability, and can be used for obtaining different models according to different training data so as to extract keywords according to the requirements of specific fields.
Owner:杭州量知数据科技有限公司

Topic feature text keyword extraction method

The invention discloses a topic feature text keyword extraction method. Through the method, text keyword extraction results better than those of a traditional TF-IDF method can be obtained. Accordingto the technical scheme, at a training stage, word segmentation, stop word removal, part-of-speech filtering and other preprocessing are performed on a training text, statistical analysis is performedon inverse document frequency of words, meanwhile a topic model method is utilized to learn and obtain a topic probability matrix of the words, normalization processing is performed, topic distribution entropy of the words is calculated according to the topic probability matrix of the words, global weights of the words are calculated in combination with the inverse document frequency and the topic distribution entropy, and global weight calculation results are output to a test stage; and after a test text is preprocessed, statistical analysis is performed on normalized term frequency of wordsin the test text, the normalized term frequency is combined with the global weight calculation results obtained at the training stage, comprehensive scores of the words are calculated are ordered, and a plurality of words with the highest scores in the score order are used as automatic keyword extraction results of the current test text.
Owner:10TH RES INST OF CETC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products