Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

321 results about "Vector space model" patented technology

Vector space model or term vector model is an algebraic model for representing text documents (and any objects, in general) as vectors of identifiers, such as, for example, index terms. It is used in information filtering, information retrieval, indexing and relevancy rankings. Its first use was in the SMART Information Retrieval System.

Network hot event detection method based on text classification and clustering analysis

The invention discloses a network hot event detection method based on text classification and clustering analysis. The method solves the problem that the efficiency and accuracy rate of the existing network hot event detection method based on clustering analysis need to be improved. The method comprises the steps that feature words are respectively selected for various classes of files through feature extraction and feature selection by utilizing a training corpus; each training text and test text are represented as vectors in all of the feature spaces by utilizing a vector space model method, and the weight of each dimension of the vectors is determined by utilizing a TF-IDF (term frequency-inverse document frequency) method, and then each test text is classified; the classified test texts in different classes are respectively subjected to clustering analysis, so the hot cluster of each class is obtained, the feature word representing the hot event is obtained through further analysis, and then the word property and other aspects of each feature word are analyzed; the description of each hot event is generated by utilizing relevant language knowledge and necessary linguistic organization. With the network hot event detection method based on text classification and clustering analysis, the detection efficiency and accuracy rate of hot events can be effectively improved.
Owner:NANJING UNIV OF POSTS & TELECOMM

Multi-model fused short text classification method

The invention discloses a multi-model fused short text classification method. The multi-model fused short text classification method comprises a learning method and a classification method. The learning method comprises the following steps: carrying out word segmentation and filtration on short text training data to obtain a word set; calculating the IDF value of each word; calculating the TFIDF values of all the words and constructing a text vector VSM; and carrying out text learning on the basis of a vector space model, and constructing an ontology tree model, a keyword overlapping model, a naive Bayesian model and a support vector machine model. The classification method comprises the following steps: carrying out word segmentation and filtration on a to-be-classified short text; generating a text vector on the basis of the support vector machine model; respectively classifying by using the ontology tree model, the keyword overlapping model, the naive Bayesian model and the support vector machine model to obtain single model classification results; and fusing the single model classification results to obtain a final classification result. According to the method disclosed in the invention, multiple classification modes are fused and the short text classification correctness is improved.
Owner:XI AN JIAOTONG UNIV

Text feature extraction method based on categorical distribution probability

The invention discloses a text feature extraction method based on categorical distribution probability. The text feature extraction method based on the categorical distribution probability extracts text feature words by means of the manner according to which categorical distribution difference estimation is carried out on words of a text to be categorized. Mean square error values of probability distribution of each word at different categories are worked out by means of category word frequency probability of the words. A certain number of words with high mean square error values are extracted to form a final feature set. The obtained feature set is used as feature words of a text categorizing task to build a vector space model in practical application. A designated categorizer is used for training and obtaining a final category model to categorize the text to be categorized. According to the text feature extraction method based on the categorical distribution probability, category distribution of the words is accurately measured in a probability statistics manner. Category values of the words are estimated in a mean square error manner so as to accurately select features of the text. As far as the text categorizing task is concerned, a text categorizing effect of balanced linguistic data and non-balanced linguistic data is obviously improved.
Owner:EAST CHINA NORMAL UNIV

LDA (latent dirichlet allocation) and VSM (vector space model) based similar Chinese herb literature recommendation method

ActiveCN103823848AFast and efficient similar recommendationRobustSpecial data processing applicationsLexical itemVector space model
The invention discloses an LDA (latent dirichlet allocation) and VSM (vector space model) based similar Chinese herb literature recommendation method. The method includes: adopting an IKAnalyzer to perform word segmentation on topics and summary information of literature on the basis of a terminological dictionary for Chinese herbs, constructing a vector space, performing dimensionality reduction on the vector space, constructing a semantic dictionary, numbering all lexical items in the dictionary in sequence, performing vectorization through each document on the basis of the semantic dictionary, constructing term vectors of each document, utilizing LDA and a Gibbs sampling algorithm to perform training to obtain probability distribution of each document on themes, then computing a value of similarity between every two documents by the aid of KL divergence, computing cosine similarity of the term vectors of each document on the basis of term frequency, performing joint weighting on the two kinds of similarities prior to performing similarity sorting, and then making recommendation. By the method, the literature, similar both in content and theme, in the Chinese herb literature can be recommended to users, and recommendation results are closer to user requirements.
Owner:ZHEJIANG UNIV

Natural language intention understanding method in man-machine interaction

The invention discloses a natural language intention understanding method in man-machine interaction.The method comprises the steps that intention labeling is conducted on text natural language instruction data, and each sentence of text is labeled with an intention; the text is vectorized, on the basis of a traditional text vector space model, information of parts of speech of a text instruction is fused, and a new text representation model, namely, a vector space model of the parts of speech is defined; a stackable denoising auto-encoder is applied to natural language instruction intention understanding, and the high-order characteristic of the instruction is extracted; at last, training and prediction are conducted through a support vector machine, and intention understanding of the natural language instruction is achieved.According to the natural language intention understanding method in man-machine interaction, more semantic information in the natural language instruction can be excavated, the recognition rate of intention understanding is increased, the stackable denoising auto-encoder is adopted, random noise is added during the training process, the actual application scene is more approached, and a model obtained from training has higher generalization capacity.
Owner:SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products