Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

797 results about "Lexical frequency" patented technology

Description Lexical usage frequency is known to influence the application rate of some variable processes. Specifically, variable lenition processes typically affect frequent lexical items more often than infrequent lexical items. For instance, variable t/d-deletion in English is more likely to apply to a frequent word (just)...

News keyword abstraction method based on word frequency and multi-component grammar

A method to extract new keywords based on word frequency and multiple grammars is provided, which belongs to the technology field of a natural language processing, and is characterized by extracting the potential models of part of speech of the multiple grammars of the keywords by researching characteristic part of speech of the keywords and adopting computer to assist excavation and taking the models as the basis of the keywords to extract arithmetic. When extracting the new keywords, firstly excavating the multiple phrases in text in accordance with the potential models of part of speech and extract candidate word set of the keywords, and then excavating potential keywords not loading from titles and add the potential keywords to the candidate keyword set. The application brings forward an improved single text word frequency/inverse text frequency value (tf/idf) format, introduces target-oriented characteristics, grades the candidate keywords, obtains the order of the candidate keywords and gives the keywords of news document after optimizing the results. Compared with the traditional keyword extraction method based on single text word frequency/inverse text frequency value (tf/idf), the method has higher recall rate under the condition of the same precision.
Owner:TSINGHUA UNIV

Method for loading word stock, method for inputting character and input method system

The invention provides a character input method, which comprises the following steps: loading system thesauruses; acquiring relevant information of the current input environment of users; matching and acquiring auxiliary thesauruses corresponding to the current input environment of the users; loading auxiliary thesauruses corresponding to the current input environment of the users; receiving the input information of users; searching the loaded system thesauruses and the auxiliary thesauruses to get candidates according to the received input information; receiving the selective information from users; and outputting the specified candidates. By adopting the character input method, the current input environment of the users or the input content is detected by various means to accurately determine the current requirements of users; subsequently, thesauruses are loaded selectively from a plurality of auxiliary thesauruses, thereby well meeting the dynamic requirements of users. The character input method can overcome the problem that frequencies of new words can not be adjusted in the prior art; manual setting by users is not needed; the input efficiency of users is significantly improved.
Owner:BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD

Text similarity measuring system based on multi-feature fusion

The invention provides a text similarity measuring system based on multi-feature fusion and relates to the field of intelligent information processing. According to the system, the text similarity is measured by fusing multiple features based on word frequencies, word vectors and Wikipedia labels. The invention aims to solve the problem of semantic loss caused by non-considering of contexts in a conventional text similarity measuring system and the problem of low similarity result accuracy caused by larger text length difference. The text similarity measuring system is implemented by the following steps: carrying out preprocessing such as word segmentation and stop word removal on a training text; training corpora of the processed training text as a word vector model; measuring the similarity based on the word frequencies, the similarity based on the word vectors and the similarity based on the Wikipedia labels between input text pairs to be computed, and carrying out weighted summation to obtain a final text semantic similarity measuring result. According to the system, the measurement accuracy of the text similarities can be improved, so that the requirement on intelligent information processing is met.
Owner:XINJIANG TECHN INST OF PHYSICS & CHEM CHINESE ACAD OF SCI

Chinese text emotion recognition method

The invention discloses a Chinese text emotion recognition method which includes the steps of (1) respectively building a commendatory-derogatory-term dictionary, a degree-term dictionary and a privative-term dictionary, (2) carrying out term-segmentation processing on sentences of a Chinese text to be processed, and obtaining dependence relationships and term frequency of terms, (3) selecting subject terms according to the term frequency, and signing the sentences containing the subject terms as subject sentences, (4) judging whether the terms in the subject sentences exit in the commendatory-derogatory-term dictionary, determining emotion initial values of the terms, determining modifying degree terms and privative terms of the terms according to the dependence relationships of the terms, then determining the weights of the terms according to values of the modifying degree terms in the degree-term dictionary, determining polarities according to the number of the privative terms, obtaining the emotion values of the terms, then summing the emotion values of all the terms of the subject sentences, and obtaining the emotion values of the subject sentences, and (5) summing the emotion values of all the sentences in the text, and obtaining the emotion state of the text. According to the Chinese text emotion recognition method, the emotion recognition accuracy rate of the text is greatly improved.
Owner:COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI

Distributed semantic and sentence meaning characteristic fusion-based character relation extraction method

The invention relates to a distributed semantic and sentence meaning characteristic fusion-based character relation extraction method, and belongs to the field of natural language processing. The method comprises the steps of firstly performing training in a small amount of marked corpora and a large amount of unmarked corpora by utilizing statistic word frequency features and a Bootstrapping algorithm to obtain a relational feature dictionary; secondly constructing a triple instance of a statement through an element distance optimization rule, and constructing a triple feature space by fusing distributed semantic information and semantic information; and finally performing true-false binary decision on a triple, and obtaining a character relation type by utilizing a confidence degree maximization rule. According to the method, automatic generation of the feature relation dictionary is realized; a conventional relational multi-class problem is converted into a triple true-false binary decision problem, so that a conventional machine learning classification algorithm is better adapted; and by utilizing the distributed semantic information, the accuracy of relational classification is improved.
Owner:BEIJING INSTITUTE OF TECHNOLOGYGY

Method for filtering Chinese junk mail based on Logistic regression

The invention discloses a filtering method of recursive Chinese junk E-mail, which is based on Logistic. The method comprises the following steps: first, analyzing E-mails, extracting E-mail titles, E-mail main bodies and accessory relative information, second, segmenting words for version information which is extracted, third, accounting word frequencies of entries in E-mails, calculating weights of words through utilizing TF-IDF pattern, presenting the E-mail to be characteristic vector which is weighted, fourth, utilizing an LIBLINEAR tool kit to exercise the sample of the E-mail to get an Logistic recursive module, fifth, utilizing the Logistic recursive module to classify for new E-mails, getting the probability value whether the E-mails which are got are junk E-mails. The utility which utilizes the Logistic recursive module has the advantages of simple module, little amount of parameter, and high classifying accuracy in a data set whose text number and characteristic number are both bigger, the accuracy and efficiency of filtering junk E-mails are improved through dimension reduction and improved characteristic value calculating method, and meanwhile, the problem of choosing module exercise parameter which is faced in filtering junk E-mails is effectively solved.
Owner:ZHEJIANG UNIV

Topic feature text keyword extraction method

The invention discloses a topic feature text keyword extraction method. Through the method, text keyword extraction results better than those of a traditional TF-IDF method can be obtained. Accordingto the technical scheme, at a training stage, word segmentation, stop word removal, part-of-speech filtering and other preprocessing are performed on a training text, statistical analysis is performedon inverse document frequency of words, meanwhile a topic model method is utilized to learn and obtain a topic probability matrix of the words, normalization processing is performed, topic distribution entropy of the words is calculated according to the topic probability matrix of the words, global weights of the words are calculated in combination with the inverse document frequency and the topic distribution entropy, and global weight calculation results are output to a test stage; and after a test text is preprocessed, statistical analysis is performed on normalized term frequency of wordsin the test text, the normalized term frequency is combined with the global weight calculation results obtained at the training stage, comprehensive scores of the words are calculated are ordered, and a plurality of words with the highest scores in the score order are used as automatic keyword extraction results of the current test text.
Owner:10TH RES INST OF CETC

Text feature quantification method based on comentropy, text feature quantification device based on comentropy, text classification method and text classification device

The invention discloses a text feature quantification method based on comentropy, a text feature quantification device based on comentropy, a text classification method and a text classification device. The text feature quantification method comprises the following steps that: the weight of each feature word in a document is calculated according to the word frequency of feature words in a text document and the comentropy distributed on different text classes; meanwhile, the inter-class distribution entropy of the feature words is calculated in different modes according to the unbalance performance of the scale of each class of a text set; in addition, the inverse document frequency is introduced as required according to the distribution features of each feature word in the text set; local word frequency factors are properly reduced, so that the weight distribution of each feature word in the document is reasonable; and the feature differences of different classes of texts are sufficiently reflected by generated document feature vectors. The text feature quantification device and the text classification device disclosed by the invention have a plurality of options or parameters; and the optimum text classification effect can be achieved through regulation. The text feature quantification method has the advantages that the text classification accuracy is improved, and the performance on different text sets is stable.
Owner:CENT SOUTH UNIV

Automatic text summarization method based on enhanced semantics

The invention discloses an automatic text summarization method based on enhanced semantics. The method comprises the following steps of: preprocessing a text, arranging words from high to low according to the word frequency information, and converting the words to id; using a single-layer bi-directional LSTM to encode the input sequence and extracting text information features; using a single-layer unidirectional LSTM to decode the encoded text semantic vector to obtain the hidden layer state; calculating a context vector to extract the information, most useful the current output, from the input sequence; after decoding, obtaining the probability distribution of the size of a word list, and adopting a strategy to select summarization words; in the training phase, fusing the semantic similarity between the generated summarization and the source text to calculate the loss, so as to improve the semantic similarity between the summarization and the source text. The invention utilizes the LSTM depth learning model to characterize the text, integrates the semantic relation of the context, enhances the semantic relation between the summarization and the source text, and generates the summarization which is more suitable for the subject idea of the text, and has a wide application prospect.
Owner:SOUTH CHINA UNIV OF TECH

Hospital information search engine and system based on knowledge base

The invention relates to a medical search engine and a system based on a repository. The engine works as follows: capturing a Chinese medical health directory to establish an original medical webpage database, extracting related information on webpage in the original medical webpage database and extracting comment information on hospitals, departments and doctors, so as to establish a medical comment information database, carrying out medical comment attribute field extraction of the abstracted related information by means of term frequency statistics and questionnaire to extract viewpoint phrase, analyzing viewpoint phrase orientation, determining an analytic result showing whether the comment information is positive or negative, determining the ranking of hospitals, departments and doctors, ordering search results according to a medical repository, and providing a user with highly structured and highly related information. In order to overcome the disadvantages of the result information of a common search engine such as unstructured form and low correlation degree and accuracy, the medical search engine and the system establish the medical repository to provide a user with highly structured medical information, and increase both correlation degree and accuracy for the user during querying medical information; moreover, the medical search engine and the system can effectively increase the accuracy and the recall rate of search results.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

Entity identification method based on Chinese electronic medical records

InactiveCN108628824AAdvancing Medical Automated Question AnsweringMedical data miningNatural language data processingMedical recordManual annotation
The invention provides an entity identification method based on Chinese electronic medical records, and relates to the technical field of medical entity identification. In order to overcome the defects of the lack of a public Chinese electronic medical record annotation corpus in China currently, by constructing and managing a medical dictionary, a semi-automatic corpus annotation method is put forward, and the complexity of manual annotation is reduced. Secondly, the problems are solved that existing electronic medical record entity recognition methods based on characteristics mostly aim at ordinary texts or general electronic medical record texts, and unique characteristics of the Chinese electronic medical records are not considered. By means of the method, besides basic characteristicsof the general text, the unique chapter information characteristics of the Chinese electronic medical records are also extracted; core word characteristics obtained by counting character frequenciesand word frequencies are added into extension characteristics after the collected dictionary is subjected to single-character and word segmentation, a relationship of words is also added to the extension characteristics by clustering word vectors, and the accuracy of the entity identification of the Chinese electronic medical records is effectively improved.
Owner:上海熙业信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products