Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

804 results about "Lexical frequency" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Description Lexical usage frequency is known to influence the application rate of some variable processes. Specifically, variable lenition processes typically affect frequent lexical items more often than infrequent lexical items. For instance, variable t/d-deletion in English is more likely to apply to a frequent word (just)...

Training and using pronunciation guessers in speech recognition

InactiveUS7467087B1Reduce frequencyReliable phonetic spellingSpeech recognitionSpeech synthesisWord modelSpeech identification

The error rate of a pronunciation guesser that guesses the phonetic spelling of words used in speech recognition is improved by causing its training to weigh letter-to-phoneme mappings used as data in such training as a function of the frequency of the words in which such mappings occur. Preferably the ratio of the weight to word frequency increases as word frequencies decreases. Acoustic phoneme models for use in speech recognition with phonetic spellings generated by a pronunciation guesser that makes errors are trained against word models whose phonetic spellings have been generated by a pronunciation guesser that makes similar errors. As a result, the acoustic models represent blends of phoneme sounds that reflect the spelling errors made by the pronunciation guessers. Speech recognition enabled systems are made by storing in them both a pronunciation guesser and a corresponding set of such blended acoustic models.

Training and using pronunciation guessers in speech recognition

Training and using pronunciation guessers in speech recognition

Training and using pronunciation guessers in speech recognition

Owner:CERENCE OPERATING CO

Methods and Systems for Improved Data Input, Compression, Recognition, Correction, and Translation through Frequency-Based Language Analysis

InactiveUS20100131900A1Improving optical character recognitionEfficient processingInput/output for user-computer interactionCathode-ray tube indicatorsCommon wordDocument preparation

System and method for improving data input by using word frequency to text predict input. Other systems and methods include analyzing words already contained in a document (e.g. spell checking and OCR) and using word frequency to create a proxy system to reduce the space required to store data, allowing for more efficient usage of storage and enhancing the embedded content of matrix codes. The system displays the most common words in a language based upon the previously entered or displayed word(s), or the previously entered or displayed character or characters. Words with the most common frequency of use with the prior word(s) are displayed in a table to enable the user to quickly select one of the displayed words for rapid data entry. The input device can be a touch-sensitive display or non-touch sensitive type device.

Methods and Systems for Improved Data Input, Compression, Recognition, Correction, and Translation through Frequency-Based Language Analysis

Methods and Systems for Improved Data Input, Compression, Recognition, Correction, and Translation through Frequency-Based Language Analysis

Methods and Systems for Improved Data Input, Compression, Recognition, Correction, and Translation through Frequency-Based Language Analysis

Owner:SIEGEL ABBY L

Text similarity, acceptation similarity calculating method and system and application system

ActiveCN101079026AImprove performanceSpecial data processing applicationsFrequency vectorDegree of similarity

The invention discloses a calculating method of text similarity degree and vocabulary meaning similarity degree and system and application system, which comprises the following steps: basing on vocabulary data bank; proceeding initialize; calculating; getting initial vocabulary meaning similarity degree among vocabulary in the vocabulary data bank; basing on the initial vocabulary meaning similarity degree; calculating initial semantic similarity degree among text; iterating semantic similarity degree among each text and vocabulary meaning similarity degree among vocabulary till constriction; constructuring final vocabulary meaning similar matrix with final vocabulary similarity degree; transforming the text vocabulary frequency vector of the initial text to the new text vocabulary text vocabulary frequency vector; calculating text similarity degree in the text collection. This invention can improve related property of current text especially about short text.

Text similarity, acceptation similarity calculating method and system and application system

Text similarity, acceptation similarity calculating method and system and application system

Text similarity, acceptation similarity calculating method and system and application system

Owner:蒙圣光 +1

News keyword abstraction method based on word frequency and multi-component grammar

InactiveCN101196904AEfficient miningWide applicabilitySpecial data processing applicationsPart of speechComputer-aided

A method to extract new keywords based on word frequency and multiple grammars is provided, which belongs to the technology field of a natural language processing, and is characterized by extracting the potential models of part of speech of the multiple grammars of the keywords by researching characteristic part of speech of the keywords and adopting computer to assist excavation and taking the models as the basis of the keywords to extract arithmetic. When extracting the new keywords, firstly excavating the multiple phrases in text in accordance with the potential models of part of speech and extract candidate word set of the keywords, and then excavating potential keywords not loading from titles and add the potential keywords to the candidate keyword set. The application brings forward an improved single text word frequency / inverse text frequency value (tf / idf) format, introduces target-oriented characteristics, grades the candidate keywords, obtains the order of the candidate keywords and gives the keywords of news document after optimizing the results. Compared with the traditional keyword extraction method based on single text word frequency / inverse text frequency value (tf / idf), the method has higher recall rate under the condition of the same precision.

News keyword abstraction method based on word frequency and multi-component grammar

News keyword abstraction method based on word frequency and multi-component grammar

News keyword abstraction method based on word frequency and multi-component grammar

Owner:TSINGHUA UNIV

Method for loading word stock, method for inputting character and input method system

ActiveCN101373468AMeet dynamic needsImprove input efficiencySpecial data processing applicationsSpecific program execution arrangementsRelevant informationHuman–computer interaction

The invention provides a character input method, which comprises the following steps: loading system thesauruses; acquiring relevant information of the current input environment of users; matching and acquiring auxiliary thesauruses corresponding to the current input environment of the users; loading auxiliary thesauruses corresponding to the current input environment of the users; receiving the input information of users; searching the loaded system thesauruses and the auxiliary thesauruses to get candidates according to the received input information; receiving the selective information from users; and outputting the specified candidates. By adopting the character input method, the current input environment of the users or the input content is detected by various means to accurately determine the current requirements of users; subsequently, thesauruses are loaded selectively from a plurality of auxiliary thesauruses, thereby well meeting the dynamic requirements of users. The character input method can overcome the problem that frequencies of new words can not be adjusted in the prior art; manual setting by users is not needed; the input efficiency of users is significantly improved.

Method for loading word stock, method for inputting character and input method system

Method for loading word stock, method for inputting character and input method system

Method for loading word stock, method for inputting character and input method system

Owner:BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD

Text similarity computing method

InactiveCN103838789AAdvertisementsSemantic analysisTheoretical computer scienceChinese word

The invention discloses a text similarity computing method. The method comprises the following steps of text representation and text similarity computing. The aim of text representation is that a text document of product description is converted into a vector for description. In the text similarity computing method, natural language processing technologies such as Chinese words segmentation, stop word removing, word frequency statistics and the like are used for converting all the description texts of products into vectors; the text similarity is computed by a method based on a Hamming distance, and the other advantage of the Hamming distance is that the computing speed is very high. Due to the fact that the method of statistical machine learning is used, so that the text similarity computing method is more stable and effective compared with a method based on rules.

Text similarity computing method

Text similarity computing method

Text similarity computing method

Owner:DALIAN LINGDONG TECH DEV

Network criticism oriented viewpoint subject identifying method and system

InactiveCN101727487AOvercome domain limitationsAvoid difficultiesSpecial data processing applicationsViewpointsAlgorithm

The invention discloses network criticism oriented viewpoint subject identifying method and system. The method comprises the following steps of: a. text inputting: inputting a criticism source and all criticism texts; b. text preprocessing: carrying out word division and speech part marking on the input texts, removing stop words, punctuations and specific empty words and calculating the word frequency information of the words; c. subject word judgment: calculating a word weight value, and if the word weight value is larger than a set threshold value, judging that a subject word is a viewpoint subject word; d. subject constructing: combining scattered viewpoint subject words into an integrated viewpoint subject; e. subject screening: confirming an effective viewpoint subject by filtering the viewpoint subject. The invention overcomes the field limitation of viewpoint analyzing method and system, identifies the viewpoint subject in a whole angle without constructing a body library, effectively overcomes the difficulty existing in single-sentence viewpoint analysis and automatically identifies the viewpoint subject in a phrase mode of a wide field and network criticism data which are dynamically changed along with the time.

Network criticism oriented viewpoint subject identifying method and system

Network criticism oriented viewpoint subject identifying method and system

Network criticism oriented viewpoint subject identifying method and system

Owner:THE PLA INFORMATION ENG UNIV

Multi-document auto-abstracting method facing to inquiry

InactiveCN101620596ASolve problemsMeet individual requirementsSpecial data processing applicationsFrequency vectorDocument preparation

The invention relates to a multi-document auto-abstracting method facing to inquiry, which comprises the following steps: performing preprocessing on the inquiry and documents; performing topic segmentation and semantic paragraph clustering on the preprocessed documents to obtain subtopics; expressing the inquiry and the sentences in each of the subtopics in the form of a word frequency vector, and calculating the correlation measurement of the inquiry and the subtopics; screening the subtopics according to the correlation measurement of the inquiry and the subtopics, sequencing the subtopics according to the importance of the subtopics, and selecting the front T important subtopics to obtain an ordered sequence of the subtopics correlative with the inquiry; and circularly obtaining representative sentences from the subtopic sequence in turn, and connecting the representative sentences together to generate an abstract. The method uses the topic segmentation technique so that the abstract is in a limited length range and comprises the important information in a document set as much as possible, provides more targeted services, can adjust the content of the abstract according to a user inquiry topic, and can achieve the interactions with users.

Multi-document auto-abstracting method facing to inquiry

Multi-document auto-abstracting method facing to inquiry

Multi-document auto-abstracting method facing to inquiry

Owner:NORTHEASTERN UNIV

Theme word vector and network structure-based theme keyword extraction method

ActiveCN108052593AEfficient discoveryThe result is accurateCharacter and pattern recognitionSpecial data processing applicationsNetwork structureDocumentation

The invention discloses a theme word vector and network structure-based theme keyword extraction method, and particularly relates to the technical field of extracting keywords from texts. The theme word vector and network structure-based theme keyword extraction method comprises the following steps of: carrying out theme clustering on a text corpus on the basis of an LDA theme model, and obtaining100 keywords, relevancies of which with each theme are top 100 in the theme; expressing each word in the text corpus as a word vector by utilizing word2vec, obtaining a semantic similarity between every two words through calculation, and respectively calculating 5 words, semantic similarities of which with each keyword in the keywords are top 5, wherein the keywords and the words, the semantic similarities of which with each keyword are top 5 form a new keyword set; and constructing a keyword network and obtaining the top 20 words in each set to serve as keywords of the theme. According to the method, keywords which have relatively high word frequencies in documents can be extracted, and keywords which have relatively word frequencies and are strongly associated with themes can be effectively discovered.

Theme word vector and network structure-based theme keyword extraction method

Theme word vector and network structure-based theme keyword extraction method

Theme word vector and network structure-based theme keyword extraction method

Owner:SHANDONG UNIV OF SCI & TECH

Text similarity measuring system based on multi-feature fusion

ActiveCN104699763AOvercoming the problem of low precisionUnstructured textual data retrievalSpecial data processing applicationsInformation processingMulti feature fusion

The invention provides a text similarity measuring system based on multi-feature fusion and relates to the field of intelligent information processing. According to the system, the text similarity is measured by fusing multiple features based on word frequencies, word vectors and Wikipedia labels. The invention aims to solve the problem of semantic loss caused by non-considering of contexts in a conventional text similarity measuring system and the problem of low similarity result accuracy caused by larger text length difference. The text similarity measuring system is implemented by the following steps: carrying out preprocessing such as word segmentation and stop word removal on a training text; training corpora of the processed training text as a word vector model; measuring the similarity based on the word frequencies, the similarity based on the word vectors and the similarity based on the Wikipedia labels between input text pairs to be computed, and carrying out weighted summation to obtain a final text semantic similarity measuring result. According to the system, the measurement accuracy of the text similarities can be improved, so that the requirement on intelligent information processing is met.

Text similarity measuring system based on multi-feature fusion

Text similarity measuring system based on multi-feature fusion

Text similarity measuring system based on multi-feature fusion

Owner:XINJIANG TECHN INST OF PHYSICS & CHEM CHINESE ACAD OF SCI

Chinese text emotion recognition method

InactiveCN103678278AImprove accuracySpecial data processing applicationsLexical frequencyEmotion recognition

The invention discloses a Chinese text emotion recognition method which includes the steps of (1) respectively building a commendatory-derogatory-term dictionary, a degree-term dictionary and a privative-term dictionary, (2) carrying out term-segmentation processing on sentences of a Chinese text to be processed, and obtaining dependence relationships and term frequency of terms, (3) selecting subject terms according to the term frequency, and signing the sentences containing the subject terms as subject sentences, (4) judging whether the terms in the subject sentences exit in the commendatory-derogatory-term dictionary, determining emotion initial values of the terms, determining modifying degree terms and privative terms of the terms according to the dependence relationships of the terms, then determining the weights of the terms according to values of the modifying degree terms in the degree-term dictionary, determining polarities according to the number of the privative terms, obtaining the emotion values of the terms, then summing the emotion values of all the terms of the subject sentences, and obtaining the emotion values of the subject sentences, and (5) summing the emotion values of all the sentences in the text, and obtaining the emotion state of the text. According to the Chinese text emotion recognition method, the emotion recognition accuracy rate of the text is greatly improved.

Chinese text emotion recognition method

Chinese text emotion recognition method

Chinese text emotion recognition method

Owner:COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI

Processing method and device for input information and input method system

ActiveCN101398834AMeet input requirementsImprove accuracySpecial data processing applicationsInput/output processes for data processingUser inputData mining

The invention provides a processing method and a system aimed at input information; wherein, the method comprises the steps as follows: input information records coming from a plurality of users are collected; the input information records comprise word information and input environment information; data processing is carried out according to the collected input information records; the relationship between the word information and the input environment is established, thus gaining a plurality of parallel information sets; the method creatively puts forward that the input habit information (such as input words and word frequency and the like) of a plurality of users is recorded by environment and collected to data processing equipment (such as a server); subsequently, the information is optimized, thus providing an input method lexicon sorted by environment property, thus meeting the input requirement better and more exactly; furthermore, when the lexicon is adopted for inputting, the lexicon can be dynamically matched with the input environment or input content of the user, thus greatly improving the accuracy of preferably selected words during the input process of the user.

Processing method and device for input information and input method system

Processing method and device for input information and input method system

Processing method and device for input information and input method system

Owner:BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD

Filtering method for spam based on supporting vector machine

InactiveCN101106539ASolve the problem of unequal cost of misjudgmentIncrease the weight valueOffice automationData switching networksSupport vector machineRelevant information

The invention discloses a junk mail filtering method based on support vector machine (SVM). The steps are as following: 1) analyze the mail and extract the message relevant to title, text and character set; 2) carry out divided syncopation to the extracted text message content; 3) make statistics of word frequency in mail and utilize TF-IDF formula to map the mail text to vector; 4) utilize LibSVM to train the mail sample and obtain support vector machine model; 5) utilize support vector machine model to classify new mail and obtain the probability value of junk mails; 6) utilize threshold value adjustment to guarantee a lower level of false positive rate of normal mails to junk mails and ultimately judge whether mails are junk mails. The invention utilizes the advantage of highest single model classification accuracy of the support vector machine, improves the correctness of junk mail filtering, according to the text feature and activity feature and at the same time, also effectively solves the problem of unequal miscarriage cost in junk mail filtering.

Filtering method for spam based on supporting vector machine

Filtering method for spam based on supporting vector machine

Filtering method for spam based on supporting vector machine

Owner:ZHEJIANG UNIV

Text emotion classifying method in stock field

InactiveCN102023967AStrong real-timeLow costSpecial data processing applicationsSupport vector machineFeature selection

The invention provides a text emotion classifying method in the stock field, belonging to the technical field of stock tendentiousness analysis. The text emotion classifying method is characterized in that feature selection is carried out on an enlarged stock emotion word through public news information comprising stock news and the like and by using an improved evaluation group; feature weighting selection is carried out on emotion word in the stock Chinese text by using absolute word frequency weighting; and finally, tendentiousness analysis is carried out on stock news by using a Bayes, K-NN or SVM (support vector machine) text emotion classifying algorithm. The method provided by the invention has the advantages of simplicity and feasibility, and is convenient for calculation.

Text emotion classifying method in stock field

Text emotion classifying method in stock field

Text emotion classifying method in stock field

Owner:TSINGHUA UNIV

Distributed semantic and sentence meaning characteristic fusion-based character relation extraction method

InactiveCN106484675AImprove accuracyRealize automatic generationSemantic analysisSpecial data processing applicationsSemantic informationExtraction methods

The invention relates to a distributed semantic and sentence meaning characteristic fusion-based character relation extraction method, and belongs to the field of natural language processing. The method comprises the steps of firstly performing training in a small amount of marked corpora and a large amount of unmarked corpora by utilizing statistic word frequency features and a Bootstrapping algorithm to obtain a relational feature dictionary; secondly constructing a triple instance of a statement through an element distance optimization rule, and constructing a triple feature space by fusing distributed semantic information and semantic information; and finally performing true-false binary decision on a triple, and obtaining a character relation type by utilizing a confidence degree maximization rule. According to the method, automatic generation of the feature relation dictionary is realized; a conventional relational multi-class problem is converted into a triple true-false binary decision problem, so that a conventional machine learning classification algorithm is better adapted; and by utilizing the distributed semantic information, the accuracy of relational classification is improved.

Distributed semantic and sentence meaning characteristic fusion-based character relation extraction method

Distributed semantic and sentence meaning characteristic fusion-based character relation extraction method

Distributed semantic and sentence meaning characteristic fusion-based character relation extraction method

Owner:BEIJING INSTITUTE OF TECHNOLOGYGY

Image meaning automatic marking method based on marking significance sequence

InactiveCN1920820AImprove accuracyImprove performanceImage data processing detailsSpecial data processing applicationsPattern recognitionComputer science

The invention relates to an image mantic automatic mark method based on mark importance sequence, wherein it comprises: (1) classifying the training images, to form a serial of same image groups; (2) building mantic skeleton for each image, to express its image with mantic skeleton, at the same time, calculating the keyword importance sequence of image and the importance sequence of image sub block; (3) using static method to automatic mark the image. The invention considers the importance of sub block of image area and the importance sequence of training text, to support the image search based on mantic, without word frequency distortion distribution.

Image meaning automatic marking method based on marking significance sequence

Image meaning automatic marking method based on marking significance sequence

Image meaning automatic marking method based on marking significance sequence

Owner:ZHEJIANG UNIV

Sudden event emergency information mining method based on social media

InactiveCN106021508AReal-time accessEasy accessSpecial data processing applicationsText database clustering/classificationSocial mediaInformation mining

The present invention discloses a sudden event emergency information mining method based on social media. The method comprises the steps of (S1) using an open platform API or a web crawler to acquire social media data, the social media data being a document set; (S2) using a MongoDB cluster to store the document set; (S3) preprocessing the document set; (S4) using an LDA to mark the preprocessed document set, and obtaining a known sample; (S5) forming a word characteristic set by all words in each document of the known sample, word frequency of each word characteristic in the document being the weight of the word characteristic in the document; (S6) constructing a short text real-time classification model; (S7) classifying real-time sudden events by using the short text classification model, and predicting themes of the sudden events; and (S8) performing information mining according to the social media data of the classified sudden events. Classification of social media short texts can be automatically and rapidly achieved, and therefore sudden event emergency information is mined.

Sudden event emergency information mining method based on social media

Sudden event emergency information mining method based on social media

Sudden event emergency information mining method based on social media

Owner:WUHAN UNIV

Method for filtering Chinese junk mail based on Logistic regression

InactiveCN101227435AFew adjustment parametersImprove classification effectOffice automationData switching networksFeature vectorRelevant information

The invention discloses a filtering method of recursive Chinese junk E-mail, which is based on Logistic. The method comprises the following steps: first, analyzing E-mails, extracting E-mail titles, E-mail main bodies and accessory relative information, second, segmenting words for version information which is extracted, third, accounting word frequencies of entries in E-mails, calculating weights of words through utilizing TF-IDF pattern, presenting the E-mail to be characteristic vector which is weighted, fourth, utilizing an LIBLINEAR tool kit to exercise the sample of the E-mail to get an Logistic recursive module, fifth, utilizing the Logistic recursive module to classify for new E-mails, getting the probability value whether the E-mails which are got are junk E-mails. The utility which utilizes the Logistic recursive module has the advantages of simple module, little amount of parameter, and high classifying accuracy in a data set whose text number and characteristic number are both bigger, the accuracy and efficiency of filtering junk E-mails are improved through dimension reduction and improved characteristic value calculating method, and meanwhile, the problem of choosing module exercise parameter which is faced in filtering junk E-mails is effectively solved.

Method for filtering Chinese junk mail based on Logistic regression

Method for filtering Chinese junk mail based on Logistic regression

Method for filtering Chinese junk mail based on Logistic regression

Owner:ZHEJIANG UNIV

Topic feature text keyword extraction method

InactiveCN108763213AReduce the influence of human subjective factorsReduce workloadNatural language data processingSpecial data processing applicationsPart of speechAlgorithm

The invention discloses a topic feature text keyword extraction method. Through the method, text keyword extraction results better than those of a traditional TF-IDF method can be obtained. Accordingto the technical scheme, at a training stage, word segmentation, stop word removal, part-of-speech filtering and other preprocessing are performed on a training text, statistical analysis is performedon inverse document frequency of words, meanwhile a topic model method is utilized to learn and obtain a topic probability matrix of the words, normalization processing is performed, topic distribution entropy of the words is calculated according to the topic probability matrix of the words, global weights of the words are calculated in combination with the inverse document frequency and the topic distribution entropy, and global weight calculation results are output to a test stage; and after a test text is preprocessed, statistical analysis is performed on normalized term frequency of wordsin the test text, the normalized term frequency is combined with the global weight calculation results obtained at the training stage, comprehensive scores of the words are calculated are ordered, and a plurality of words with the highest scores in the score order are used as automatic keyword extraction results of the current test text.

Topic feature text keyword extraction method

Topic feature text keyword extraction method

Topic feature text keyword extraction method

Owner:10TH RES INST OF CETC

Character input method for all-purpose keyboard and processing device thereof

InactiveCN101719022AAvoid Manual CalibrationIncrease typing speedInput/output processes for data processingAlgorithmCandidate key

The invention discloses a character input method for an all-purpose keyboard, which comprises the following steps of: responding the clicking input for a key by a user, and generating and recording the key value of the key and the coordinate of a clicking point; computing the keying probability of all candidate keys according to the key value and the coordinate, and determining a keying sequence, wherein the candidate keys comprise a clicked key and a plurality of character or number keys adjacent to the clicked key; searching the word frequency of a word corresponding to each key value in the keying sequence; carrying out weighing computation for the keying probability of each key value and the word frequency of the word corresponding to the key value in the keying sequence; and sequencing the weighing computation results of all words so as to obtain a candidate word sequence. The invention also discloses a character input processing device which comprises a receiving unit, a probability computing unit, a word frequency inquiry unit and a weighing computation unit. The invention comprehensively considers the word frequencies of the words and further improves the input speed.

Character input method for all-purpose keyboard and processing device thereof

Character input method for all-purpose keyboard and processing device thereof

Character input method for all-purpose keyboard and processing device thereof

Owner:HANVON CORP

Word library generation method, input method and input method system

ActiveCN101271459AImprove intelligenceImprove input efficiencySpecial data processing applicationsInput/output processes for data processingUser inputData mining

The present invention discloses an intelligent word-choosing input method, including: determining the current effective type of an input user and obtaining the candidates from a word base according to a compiling character string input by the user. The word base includes word frequency information, type information and corresponding type feature values; the type information and the corresponding type feature values are obtained by the statistics of the word in the language material information of the corresponding type. The method also includes calculating the corresponding output weight value of the candidate according to the type feature values of the candidate under the current effective type; sequencing according to the output weight value of the candidate and the universal word frequency information in the word base of the candidate and outputting the candidate according to the sequencing result. The present invention can effectively increase the intelligence of the input method and the input efficiency of the user with good user experience without adding the user operation.

Word library generation method, input method and input method system

Word library generation method, input method and input method system

Word library generation method, input method and input method system

Owner:BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD

Text feature quantification method based on comentropy, text feature quantification device based on comentropy, text classification method and text classification device

InactiveCN105224695AAvoid estimationAccurate feature quantificationSpecial data processing applicationsText database clustering/classificationFeature vectorText categorization

The invention discloses a text feature quantification method based on comentropy, a text feature quantification device based on comentropy, a text classification method and a text classification device. The text feature quantification method comprises the following steps that: the weight of each feature word in a document is calculated according to the word frequency of feature words in a text document and the comentropy distributed on different text classes; meanwhile, the inter-class distribution entropy of the feature words is calculated in different modes according to the unbalance performance of the scale of each class of a text set; in addition, the inverse document frequency is introduced as required according to the distribution features of each feature word in the text set; local word frequency factors are properly reduced, so that the weight distribution of each feature word in the document is reasonable; and the feature differences of different classes of texts are sufficiently reflected by generated document feature vectors. The text feature quantification device and the text classification device disclosed by the invention have a plurality of options or parameters; and the optimum text classification effect can be achieved through regulation. The text feature quantification method has the advantages that the text classification accuracy is improved, and the performance on different text sets is stable.

Text feature quantification method based on comentropy, text feature quantification device based on comentropy, text classification method and text classification device

Text feature quantification method based on comentropy, text feature quantification device based on comentropy, text classification method and text classification device

Text feature quantification method based on comentropy, text feature quantification device based on comentropy, text classification method and text classification device

Owner:CENT SOUTH UNIV

Method and device for short message filtration

ActiveCN101877837AFilter stableQuickly keep up with changesMessaging/mailboxes/announcementsSpecial data processing applicationsAdaptive learningFiltration

The embodiment of the invention discloses a short message filtering method, a classifier consists of vectors which comprise a plurality of characteristic items, and the method comprises the following steps: classifying and filtering a received short message and obtaining the probabilities that the short message is a junk message or a normal message; if the absolute value of the difference of the junk message probability and the normal message probability is less than a preset threshold value, obtaining a feedback result of the short message; according to the feedback result and after self-adaptive learning, updating the classifier; and if the short message has new words which are not in a hotspot word library, updating the classifier after word frequency sequencing according to preset conditions. The embodiment also discloses a short message filtering device, and by utilizing the embodiment, the content change of the short message can be followed up dynamically, thus adjusting the short message filtering mode and improving the short message filtering capacity.

Method and device for short message filtration

Method and device for short message filtration

Method and device for short message filtration

Owner:HUAWEI TECH CO LTD +1

Text classification feature extraction method and text classification method and device

ActiveCN106897428AAccurate Text ClassificationSpecial data processing applicationsText database clustering/classificationFeature extractionText categorization

The invention discloses a text classification feature extraction method. According to the method, a feature word set is acquired from multiple training texts in a training set, the property correlation between each feature word in the feature word set and a certain category and the word frequency of each feature word in the category are determined, and the feature word with the property correlation meeting a first preset condition and the feature word with the word frequency meeting a second preset condition are selected from the feature word set to serve as classification feature words of the corresponding category. The invention furthermore provides a corresponding text classification method, a text classification feature extraction device and a text classification device.

Text classification feature extraction method and text classification method and device

Text classification feature extraction method and text classification method and device

Text classification feature extraction method and text classification method and device

Owner:TENCENT TECH (SHENZHEN) CO LTD

Automatic text summarization method based on enhanced semantics

InactiveCN108804495APrevent deviationQuality improvementNatural language data processingSpecial data processing applicationsSemantic vectorWord list

The invention discloses an automatic text summarization method based on enhanced semantics. The method comprises the following steps of: preprocessing a text, arranging words from high to low according to the word frequency information, and converting the words to id; using a single-layer bi-directional LSTM to encode the input sequence and extracting text information features; using a single-layer unidirectional LSTM to decode the encoded text semantic vector to obtain the hidden layer state; calculating a context vector to extract the information, most useful the current output, from the input sequence; after decoding, obtaining the probability distribution of the size of a word list, and adopting a strategy to select summarization words; in the training phase, fusing the semantic similarity between the generated summarization and the source text to calculate the loss, so as to improve the semantic similarity between the summarization and the source text. The invention utilizes the LSTM depth learning model to characterize the text, integrates the semantic relation of the context, enhances the semantic relation between the summarization and the source text, and generates the summarization which is more suitable for the subject idea of the text, and has a wide application prospect.

Automatic text summarization method based on enhanced semantics

Automatic text summarization method based on enhanced semantics

Automatic text summarization method based on enhanced semantics

Owner:SOUTH CHINA UNIV OF TECH

Method and device for matching texts

InactiveCN102411583AAvoid problems requiring calculations over all textAvoid calculationDigital data information retrievalSpecial data processing applicationsMatching methodsData library

The invention discloses a method and a device for matching texts. The method comprises the following steps of: acquiring new texts in the current period according to content information collected in the current period and storing the new texts in a database; performing word segmentation on the input new texts, and extracting keywords; calculating the weight of each extracted keyword in each text in the database according to a prestored frequency list of words; periodically updating the frequency list of the words according to the occurrence frequency of each word in each text in the database;calculating the similarity between each new text and each text in the database or calculating the similarity of any two texts in the database according to the calculated weight of each keyword in each text in the database; and determining the relevant text of each text stored in the database according to the calculated similarity. In the method, the problem that all the texts are need to be calculated during matching each time in the prior art is solved in the mode of establishing and updating the frequency list of the words, the matching operation work load is reduced and the system performance is improved.

Method and device for matching texts

Method and device for matching texts

Method and device for matching texts

Owner:ALIBABA CLOUD COMPUTING LTD

Hospital information search engine and system based on knowledge base

InactiveCN101441636AImprove accuracyImprove relevanceSpecial data processing applicationsInformation repositoryViewpoints

The invention relates to a medical search engine and a system based on a repository. The engine works as follows: capturing a Chinese medical health directory to establish an original medical webpage database, extracting related information on webpage in the original medical webpage database and extracting comment information on hospitals, departments and doctors, so as to establish a medical comment information database, carrying out medical comment attribute field extraction of the abstracted related information by means of term frequency statistics and questionnaire to extract viewpoint phrase, analyzing viewpoint phrase orientation, determining an analytic result showing whether the comment information is positive or negative, determining the ranking of hospitals, departments and doctors, ordering search results according to a medical repository, and providing a user with highly structured and highly related information. In order to overcome the disadvantages of the result information of a common search engine such as unstructured form and low correlation degree and accuracy, the medical search engine and the system establish the medical repository to provide a user with highly structured medical information, and increase both correlation degree and accuracy for the user during querying medical information; moreover, the medical search engine and the system can effectively increase the accuracy and the recall rate of search results.

Hospital information search engine and system based on knowledge base

Hospital information search engine and system based on knowledge base

Hospital information search engine and system based on knowledge base

Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

Entity identification method based on Chinese electronic medical records

InactiveCN108628824AAdvancing Medical Automated Question AnsweringMedical data miningNatural language data processingMedical recordManual annotation

The invention provides an entity identification method based on Chinese electronic medical records, and relates to the technical field of medical entity identification. In order to overcome the defects of the lack of a public Chinese electronic medical record annotation corpus in China currently, by constructing and managing a medical dictionary, a semi-automatic corpus annotation method is put forward, and the complexity of manual annotation is reduced. Secondly, the problems are solved that existing electronic medical record entity recognition methods based on characteristics mostly aim at ordinary texts or general electronic medical record texts, and unique characteristics of the Chinese electronic medical records are not considered. By means of the method, besides basic characteristicsof the general text, the unique chapter information characteristics of the Chinese electronic medical records are also extracted; core word characteristics obtained by counting character frequenciesand word frequencies are added into extension characteristics after the collected dictionary is subjected to single-character and word segmentation, a relationship of words is also added to the extension characteristics by clustering word vectors, and the accuracy of the entity identification of the Chinese electronic medical records is effectively improved.

Entity identification method based on Chinese electronic medical records

Entity identification method based on Chinese electronic medical records

Entity identification method based on Chinese electronic medical records

Owner:上海熙业信息科技有限公司

Method for extracting novel field words

InactiveCN106095736AEffective filteringGuaranteed word rateNatural language data processingSpecial data processing applicationsCosine similarityAlgorithm

The invention discloses a method for extracting novel field words by the aid of combination of word2vec and Bootstrapping iteration. The method includes preprocessing field word corpus; segmenting preprocessed field texts by the aid of n-gram; counting word frequencies, left and right adjacent word numbers, left and right word entropy and mutual information six-dimensional statistics of segmented character strings; setting a group of parameters by the aid of kmeans; carrying out preliminary evaluation; carrying out filtering to obtain first round of results; respectively computing sums of cosine similarity of each candidate word and seed sets by the aid of word vector spaces and a group of field seed data; setting sum threshold values and carrying out secondary evaluation so as to extract the novel words of fields. The word vector spaces are obtained by means of word2vec training. The method has the advantages that the method is applicable to extracting the novel words from the large-scale field corpus and excellent in portability; the problem of difficulty in filtering non-field words with verb-object constructions, reduplication and the like can be fundamentally solved by the aid of the method.

Method for extracting novel field words

Method for extracting novel field words

Method for extracting novel field words

Owner:EAST CHINA NORMAL UNIV

Method for accomplishing scene style word input

ActiveCN101149757AMeet individual needsImprove experienceSpecial data processing applicationsPersonalizationOperational system

This invention relates to a technique which can dynamically adjust the input strategy according to input situation, the invention obtains the scene type of current application program through calling operating system functions, get thesaurus or thesaurus adjustment configuration file appropriate to the type of scene through matching scene mapping table, dynamically adjust thesaurus or word frequency or other input characteristics to meet the needs of individual users and improve the input efficiency, and enhance the user experience.

Method for accomplishing scene style word input

Owner:SHENZHEN SHI JI GUANG SU INFORMATION TECH

Popular searches

Acoustic model Speech sound Lettering Acoustics Subvocal recognition Language analysis Human language Input device Data input Frequency of use