Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

35 results about "Bigram" patented technology

A bigram or digram is a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words. A bigram is an n-gram for n=2. The frequency distribution of every bigram in a string is commonly used for simple statistical analysis of text in many applications, including in computational linguistics, cryptography, speech recognition, and so on. Gappy bigrams or skipping bigrams are word pairs which allow gaps (perhaps avoiding connecting words, or allowing some simulation of dependencies, as in a dependency grammar).

Automatic microblog text abstracting method based on unsupervised key bigram extraction

The invention discloses an automatic microblog text abstracting method based on unsupervised key binary word extraction. The automatic microblog text abstracting method comprises the steps of preprocessing a microblog; standardizing a binary word; extracting a key binary word based on a mixed TF-IDF (term frequency-inverse document frequency), TexRank and an LDA (local data area); sequencing sentences based on the intersection similarity and a mutual information strategy; extracting abstract sentences based on a similarity threshold value; generating abstract by reasonably combining the abstract sentences. According to the automatic microblog text abstracting method, the binary word is used as a minimum vocabulary unit, and the binary word has richer text information than words, so that the sentences based on the key binary word is higher in noise immunity and accuracy than the sentences based on key word extraction; meanwhile, when the abstract sentences are extracted, the similarity threshold value is introduced to control redundancy, so that the abstract is higher in recall rate. The abstract generated by the method is accurate, simple and comprehensive; the efficiency and the quality that a user acquires knowledge are obviously improved, and the time of the user is greatly saved.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

A Text Positive and Negative Sentiment Classification Method

InactiveCN107423371BImprove the efficiency of scientific computingMaximize class discriminationSemantic analysisSpecial data processing applicationsLexical itemClassification methods
The invention discloses a text positive and negative emotion classification method. The method comprises the steps that all texts in a text set are preprocessed to form a noiseless positive and negative text set; unigram word segmentation and bigram word segmentation are performed on positive and negative texts; after stop words are removed, a non-repeat multidimensional feature vector space is formed; inverse document frequency calculation is performed on variant word frequency of all-dimensional feature vectors in the multidimensional feature vector space; and finally after training is performed with a formed lexical item-document matrix being a supervised classifier support vector machine and an input factor of logic regression in combination with marked positive and negative emotion category tags, a final text linear classifier prediction model is obtained, that is, emotion classification can be performed on a new unknown text. Through the method, the characteristic that emotional words in a marked corpus have innate classification capability is effectively utilized, a new calculation method is proposed to maximize category discrimination of the emotional words, and therefore the precision of text emotion classification through a computer is improved.
Owner:HUBEI NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products