Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

323 results about "Vector space model" patented technology

Vector space model or term vector model is an algebraic model for representing text documents (and any objects, in general) as vectors of identifiers, such as, for example, index terms. It is used in information filtering, information retrieval, indexing and relevancy rankings. Its first use was in the SMART Information Retrieval System.

Part-of-speech tagging using latent analogy

Methods and apparatuses to assign part-of-speech tags to words are described. An input sequence of words is received. A global fabric of a corpus having training sequences of words may be analyzed in a vector space. A global semantic information associated with the input sequence of words may be extracted based on the analyzing. A part-of-speech tag may be assigned to a word of the input sequence based on POS tags from pertinent words in relevant training sequences identified using the global semantic information. The input sequence may be mapped into a vector space. A neighborhood associated with the input sequence may be formed in the vector space wherein the neighborhood represents one or more training sequences that are globally relevant to the input sequence.
Owner:APPLE INC

System and method for quantitatively representing data objects in vector space

A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.
Owner:GOOGLE LLC

Word sense disambiguation

Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.
Owner:RELATIVITY ODA LLC

Suffix Tree Similarity Measure for Document Clustering

The subject innovation provides for systems and methods to facilitate weighted suffix tree clustering. Conventional suffix tree cluster models can be augmented by incorporating quality measures to facilitate improved performance. Further the quality measure can be employed in determining cluster labels that show improvements in accuracy over conventional means. Additionally “stopnodes” can be defined to facilitate traversing suffix tree models efficiently. Quality measurements can be determined based in part on weighting factors applied to terms in a vector model, said terms being mapped from a suffix tree model.
Owner:CITY UNIVERSITY OF HONG KONG

Three-folded webpage text content recognition and filtering method based on the Chinese punctuation

A method based on Chinese website punctuation triple recognition and text content filtering. The method based on existing URL, the website information keywords in the method of filtration - filtration rate and the low rate of filtration of the whole problem, Bringing on a method for composite based on the URL and on keywords, as well as text-based knowledge representation method of vector space website text content filtering. Applying to a method Based on black-and-white list of URL filtering and Chinese punctuation statistical characteristics to effectively remove navigation information, relevant linked information, advertising linked information, copyright information and other Web content noise information to extract content of text; adopting vector space model text knowledge representation, By calculating vector text template and unhealthy information in the feature vector cosine angle, and set the threshold, compared to the text of the class. The invention can be widely used in the filtering of undesirable information network and website personalized information services.
Owner:DALIAN UNIV OF TECH

Real time environment model generation system

A vehicle environment monitoring system is provided that is based on a three-dimensional vector model. The three-dimensional vector model of the vehicle's environment is generated on the basis of the image data captured by at least one three-dimensional camera. Out of the image data, particular data are extracted for generating the three-dimensional vector model in order to reduce the data volume. For data extraction, a data extraction algorithm is applied that is determined in accordance with at least one parameter that relates to the situation of the vehicle. Therefore, targeted data extraction is performed for generating a three-dimensional model that is particularly adapted for an application that is desired in the current vehicle situation. The applications of the vector model include driver assistance, external monitoring and vehicle control, as well as recording in an event data recorder. In one implementation, a sequence of three-dimensional vector models, representing a three-dimensional space-and-time model, is generated.
Owner:HARMAN BECKER AUTOMOTIVE SYST

Method for mining data in construction regulation field based on associative regulation mining technology

The invention discloses a method for mining data in construction regulation field based on associative regulation mining technology; 1. a construction regulation text vector space model is generated, 2. a construction regulation data vector space model is generated, 3. the construction regulation data vector space model is subject transposition to generate a construction regulation data feature vector space model, namely, a frequent feature set is generated, and 4. construction regulation data association degree is calculated and an association rule is output. The method can mine the data in construction regulation field, provides higher recall ratio for a user inquiring data, recommends associative query contents, and solves the technical problem that the existing association analysis technologies can not carry out association analysis on outlier data.
Owner:XI'AN UNIVERSITY OF ARCHITECTURE AND TECHNOLOGY

Network hot event detection method based on text classification and clustering analysis

The invention discloses a network hot event detection method based on text classification and clustering analysis. The method solves the problem that the efficiency and accuracy rate of the existing network hot event detection method based on clustering analysis need to be improved. The method comprises the steps that feature words are respectively selected for various classes of files through feature extraction and feature selection by utilizing a training corpus; each training text and test text are represented as vectors in all of the feature spaces by utilizing a vector space model method, and the weight of each dimension of the vectors is determined by utilizing a TF-IDF (term frequency-inverse document frequency) method, and then each test text is classified; the classified test texts in different classes are respectively subjected to clustering analysis, so the hot cluster of each class is obtained, the feature word representing the hot event is obtained through further analysis, and then the word property and other aspects of each feature word are analyzed; the description of each hot event is generated by utilizing relevant language knowledge and necessary linguistic organization. With the network hot event detection method based on text classification and clustering analysis, the detection efficiency and accuracy rate of hot events can be effectively improved.
Owner:NANJING UNIV OF POSTS & TELECOMM

Multi-model fused short text classification method

The invention discloses a multi-model fused short text classification method. The multi-model fused short text classification method comprises a learning method and a classification method. The learning method comprises the following steps: carrying out word segmentation and filtration on short text training data to obtain a word set; calculating the IDF value of each word; calculating the TFIDF values of all the words and constructing a text vector VSM; and carrying out text learning on the basis of a vector space model, and constructing an ontology tree model, a keyword overlapping model, a naive Bayesian model and a support vector machine model. The classification method comprises the following steps: carrying out word segmentation and filtration on a to-be-classified short text; generating a text vector on the basis of the support vector machine model; respectively classifying by using the ontology tree model, the keyword overlapping model, the naive Bayesian model and the support vector machine model to obtain single model classification results; and fusing the single model classification results to obtain a final classification result. According to the method disclosed in the invention, multiple classification modes are fused and the short text classification correctness is improved.
Owner:XI AN JIAOTONG UNIV

Short text clustering and hotspot theme extraction method based on TF-IDF characteristics

The invention discloses a short text clustering and hotspot theme extraction method based on TF-IDF characteristics. The method includes the following steps of firstly, conducting Chinese word segmentation on short text samples, and screening out high-frequency vocabularies; secondly, automatically conducting TF-IDF characteristic extraction and generation on each short text sample on the basis of the screened-out high-frequency vocabularies, and establishing a whole sample characteristic vector spatial model; thirdly, reducing spatial dimensions of the samples through singular value decomposition (SVD); fourthly, clustering the short text samples through the combination of the cosine law and the k-means method, and finding potential hotspot themes in each cluster through a visual analysis means. By means of the method, the characteristic selection problem, the sample control dimension reduction problem and the clustering problem of short texts can be well solved; meanwhile, visual analysis on the clustering result can be achieved by means of the visual technology; finally, extraction and analysis are conducted on hotspot themes.
Owner:TIANJIN UNIV

Document similarity calculating method and similar document whole-network retrieval tracking method

InactiveCN106095737AAccurate processing of similarityAccurate analysis and statisticsSemantic analysisText database indexingCosine similarityTwo-vector
The invention relates to a document similarity calculating method and a similar document whole-network retrieval tracking method. The technical scheme is characterized in that the document similarity calculating method includes: S01, performing word segmentation on an original document and a target document to acquire respective word segmentation sets; S02, performing preprocessing and feature weighting: utilizing TF-IDF technology to calculate weight of each word segmentation, extracting core key words, utilizing Word2vec to dig correlation degree among different word segmentation in the documents, and performing semantic analysis on each document; S03, adopting a vector space model and a cosine similarity algorithm: utilizing a cosine value of an included angle of two vectors in vector space to evaluate similarity of the documents, wherein the cosine value is between 0 and 1, and the greater the cosine value is, the higher the similarity of the documents is. The document similarity calculating method and the similar document whole-network retrieval tracking method are suitable for news information redistribution tracking and transmissibility statistics.
Owner:HANGZHOU FANEWS TECH

Knowledge graph construction method based on deep learning

The invention relates to a knowledge graph construction method based on deep learning. The method comprises the steps that a target text statement is given, and a two-way long short term memory recurrent neural network model and a conditional random field model are used to recognize target entities in the target text statement; a context sensitive two-way long short term memory recurrent neural network model and a feedforward neural network model are used to extract the relation between every two target entities; a vector space model is used to normalize the target entities, and the normalized target entities are mapped to a concept; and a knowledge graph is constructed according to the target entities, the relation between the target entities and the concept. According to the method, the deep learning technology is applied to construction of the knowledge graph, entity recognition models of a two-way recurrent neural network and a conditional random field are adopted to recognize the target entities in the target text statement, feature engineering in the entity recognition process and the relation extraction process is reduced, the burden and trouble brought by manual design and feature adjustment are relieved, and knowledge in a text is mined precisely.
Owner:WUHAN UNIV

Word sense disambiguation

Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.
Owner:RELATIVITY ODA LLC

Suffix tree similarity measure for document clustering

The subject innovation provides for systems and methods to facilitate weighted suffix tree clustering. Conventional suffix tree cluster models can be augmented by incorporating quality measures to facilitate improved performance. Further the quality measure can be employed in determining cluster labels that show improvements in accuracy over conventional means. Additionally “stopnodes” can be defined to facilitate traversing suffix tree models efficiently. Quality measurements can be determined based in part on weighting factors applied to terms in a vector model, said terms being mapped from a suffix tree model.
Owner:CITY UNIVERSITY OF HONG KONG

Multi-document summarization method based on text segmentation

The invention belongs to the technical field of multi-document summarization and provides a multi-document summarization method based on text segmentation, which comprises the following steps of: using HowNet to obtain a concept, building a concept vector space model, conducting text segmentation by adopting an improved DotPlotting model and a sentence concept vector space, calculating sentence weight by using the built concept vector space model, generating a summary according to the sentence weight, the text segmentation and the similarity situation, and evaluating the generated summary by using the ROUGE-N evaluation method and using F_Score as an evaluation index. According to the result, the multi-document summarization by using a text segmentation technique is effective, relevant documents provided by users can be gathered to form a summary by adopting the multi-document summarization method, the summary is displayed to the users in a proper way, the information acquisition efficiency is greatly improved, the practicability is high and the popularization and application values are greater.
Owner:广西超宏科技有限公司

Quick multi-keyword semantic sorting search method for protecting data privacy in cloud computing

The invention relates to a quick multi-keyword semantic sorting search method for protecting data privacy in cloud computing. A domain weighted scoring concept is introduced in document scoring, and keywords in different domains such as a title, an abstract and the like are endowed with different weights to be distinguished; a retrieval keyword is subjected to semantic expansion, semantic similarity is calculated, a three-factor sorting method is designed by combining the semantic similarity, the domain weighted scoring and a correlation score, and a cloud server can perform accurate sorting on search results and return a sorting result to a search user; and for the defect of low query efficiency of a searchable encryption scheme, a vector block segmentation marking matching algorithm is designed, and a document vector created by a vector space model is subjected to block segmentation to generate a marking vector with a relatively small dimension number. According to the method, the query efficiency can be improved, the index creation time can be shortened, and semantic ciphertext keyword search is realized.
Owner:FUZHOU UNIV

Text influenced molecular indexing system and computer-implemented and/or computer-assisted method for same

An extension of the vector space model for computing chemical similarity using textual and chemical descriptors is described. The method uses a chemical and / or textual description of a molecule / chemical and a decomposes a molecule / chemical descriptor matrix by a suitable technique such as singular value decomposition to create a low dimensional representation of the original descriptor space. Similarities between a user probe and the textual and / or chemical descriptors are then computed and ranked.
Owner:AXONTOLOGIC

Method and device for obtaining synonym

The invention relates to a method and a device for obtaining synonyms. The method comprises: obtaining a text set, performing word segmentation on the text set to generate a first term set; filtering null words of the first term set through a stop word list, to generate a second term set; performing distance editing processing on any two words in the second term set, to generate a first synonym set; establishing a vector space model on the words in the first term set; according to the model, obtaining space vectors of each pair of synonyms, calculating cosine similarity value of each pair of synonyms, performing cosine threshold filtration strategy identification on each pair of synonyms, to generate a second synonym set; performing part-of-speech tagging on the words in the second synonym set, to generate a third synonym set; and processing the words in the third synonym set by a unary model, to obtain synonyms. Thus, the method and the device realize that retrieved synonyms are more accurate, and ambiguous words and null words do not exist, so as to intelligently retrieve related webpage of synonyms.
Owner:ADVANCED NEW TECH CO LTD

Text similarity matching method on basis of vector space model

The invention discloses a text similarity matching method on the basis of a vector space model. The text similarity matching method includes extracting keywords of texts, clustering all the keywords and generating a keyword concept tree; and computing the similarity of the texts according to the created keyword concept tree of the keywords in the texts to be translated, and acquiring texts in a translation depository according to the similarity. The texts in the translation depository are matched with the texts to be translated. According to the technical scheme, the test similarity matching method has the advantages that relations among the texts can be relatively accurately reflected, so that the similarity of the texts can be sufficiently reflected.
Owner:IOL WUHAN INFORMATION TECH CO LTD

Text feature extraction method based on categorical distribution probability

The invention discloses a text feature extraction method based on categorical distribution probability. The text feature extraction method based on the categorical distribution probability extracts text feature words by means of the manner according to which categorical distribution difference estimation is carried out on words of a text to be categorized. Mean square error values of probability distribution of each word at different categories are worked out by means of category word frequency probability of the words. A certain number of words with high mean square error values are extracted to form a final feature set. The obtained feature set is used as feature words of a text categorizing task to build a vector space model in practical application. A designated categorizer is used for training and obtaining a final category model to categorize the text to be categorized. According to the text feature extraction method based on the categorical distribution probability, category distribution of the words is accurately measured in a probability statistics manner. Category values of the words are estimated in a mean square error manner so as to accurately select features of the text. As far as the text categorizing task is concerned, a text categorizing effect of balanced linguistic data and non-balanced linguistic data is obviously improved.
Owner:EAST CHINA NORMAL UNIV

Chinese text parallel data mining method based on hierarchy

The invention relates to a Chinese text parallel data mining method based on hierarchy, comprising the steps of: step 1: a establishing vector space model of Chinese texts: performing work segmentation regarding to the entire Chinese text set to obtain a word segmentation form and a feature term set containing all removed duplicated terms in the text set of each text, then using the feature term set to count the term frequency-inverse document frequency (TFIDF) of each text, and establishing the text vector space model according to the TFIDF; step 2: performing dimension reduction regarding to a feature item vector of the text vector space model; and step 3: clustering texts using DCURE algorithm based on hierarchy. The method is efficient in word segmentation of Chinese texts with high accuracy, requires no input of parameters like radius of neighborhood for the clustering process, can mine irregular cluster and is insensitive to noise, employs distributed calculating, has high efficiency in mining mass texts and improves calculating speed of feature weight.
Owner:UESTC COMSYS INFORMATION

LDA (latent dirichlet allocation) and VSM (vector space model) based similar Chinese herb literature recommendation method

ActiveCN103823848AFast and efficient similar recommendationRobustSpecial data processing applicationsLexical itemVector space model
The invention discloses an LDA (latent dirichlet allocation) and VSM (vector space model) based similar Chinese herb literature recommendation method. The method includes: adopting an IKAnalyzer to perform word segmentation on topics and summary information of literature on the basis of a terminological dictionary for Chinese herbs, constructing a vector space, performing dimensionality reduction on the vector space, constructing a semantic dictionary, numbering all lexical items in the dictionary in sequence, performing vectorization through each document on the basis of the semantic dictionary, constructing term vectors of each document, utilizing LDA and a Gibbs sampling algorithm to perform training to obtain probability distribution of each document on themes, then computing a value of similarity between every two documents by the aid of KL divergence, computing cosine similarity of the term vectors of each document on the basis of term frequency, performing joint weighting on the two kinds of similarities prior to performing similarity sorting, and then making recommendation. By the method, the literature, similar both in content and theme, in the Chinese herb literature can be recommended to users, and recommendation results are closer to user requirements.
Owner:ZHEJIANG UNIV

LDA fusion model and multilayer clustering-based news topic detection method

The invention belongs to the field of data mining, natural language processing and information retrieval, and provides a news topic detection method. For the defect of a TF-IDF-based vector space algorithm in semantics and the defects of time complexity and accuracy of textual level clustering, feature extraction, representation modeling, similarity calculation and quick and accurate text clustering methods for a large amount of news texts are improved. The LDA fusion model and multilayer clustering-based news topic detection method comprises the following steps of 1: building a similarity model by using a vector space model (VSM); 2: finally obtaining accurate parameter settings; 3: organically fusing two text models; 4: judging whether a topic is a new topic or not; 5: calculating the similarity until all documents are clustered; and 6: adding an ISP&AH clustering algorithm of AHC based on the step 5. The method is mainly applied to the design and manufacturing occasions.
Owner:TIANJIN UNIV

Topic information acquisition method based on network topology

The invention relates to a topic information acquisition method based on network topology. An initial web page set is obtained from a search engine and is expressed as a vector set through purification, word division and removal of stop words, and a vector space model is used to calculate the text similarity. A network structure is utilized to perform linkage analysis to extracted URLs first, the linkage is filtered through directory hierarchies of the URLs, and then the weights of the URLs are modified according to the scaleless property of a network to perform the prior absorption selection. At the same time, unrelated topic areas are feedback, and the lengths of buffer areas of unrelated URLs are set through the distance between the URLs and a seed set. The heat of acquired topics is calculated to select one topic to obtain a new reply.
Owner:BEIJING JIAOTONG UNIV

Natural language intention understanding method in man-machine interaction

The invention discloses a natural language intention understanding method in man-machine interaction.The method comprises the steps that intention labeling is conducted on text natural language instruction data, and each sentence of text is labeled with an intention; the text is vectorized, on the basis of a traditional text vector space model, information of parts of speech of a text instruction is fused, and a new text representation model, namely, a vector space model of the parts of speech is defined; a stackable denoising auto-encoder is applied to natural language instruction intention understanding, and the high-order characteristic of the instruction is extracted; at last, training and prediction are conducted through a support vector machine, and intention understanding of the natural language instruction is achieved.According to the natural language intention understanding method in man-machine interaction, more semantic information in the natural language instruction can be excavated, the recognition rate of intention understanding is increased, the stackable denoising auto-encoder is adopted, random noise is added during the training process, the actual application scene is more approached, and a model obtained from training has higher generalization capacity.
Owner:SHANGHAI JIAO TONG UNIV

Information recommendation method and system combining image content and keywords

The invention discloses an information recommendation method and system combining image content and keywords. The information recommendation method combining image content and keywords comprises the steps that keyword information of images in an image library and image content information containing color features and textural features are extracted, the keyword information and the image content information are expressed as a vector space model, and a corresponding keyword information matrix and an image content information matrix are obtained; the keyword information matrix and the image content information matrix are processed by utilizing a linear sparse model, and a similarity chart is obtained by calculating the similarity among the images; an image similar to a target image is inquired from the similarity chart according to the target image searched by a user, and an original recommendation list is formed; the original recommendation list is arranged to obtain a final recommendation list, and the final recommendation list is displayed.
Owner:TCL CORPORATION

Job recommending method

The invention discloses a job recommending method, and belongs to the technical field of recommending systems. The job recommending method has the advantages that the Matthew effect is avoided, the problem of cold start is solved, and the populations are well utilized to realize personalized recommending. The job recommending method comprises the following steps of obtaining user data and job data; establishing a user preference vector space model and a job vector space model; according to the user preference vector space model and the job vector space model, calculating multi-domain scoring values based on contents, obtaining first scoring values of jobs, and sequencing, so as to obtain a job set; when one job is submitted and belongs to the job set, calculating the scoring valves of the corresponding job based on the similarity of user background information according to the user preference vector space model and the job data, and obtaining second scoring valves of the corresponding job; according to the first scoring valves and the second scoring valves, obtaining the mixed scoring valves of the corresponding job, and sequencing, so as to obtain a recommending list.
Owner:COMMUNICATION UNIVERSITY OF CHINA

Implementation method for directionally running internet advertisements

The invention relates to an implementation method for directionally running internet advertisements, comprising the following steps: step S110 of clustering all target websites according to a theme at first, calculating similarity among web pages with a vector space model according to a clustering algorithm, and clustering web pages according to similarity; step S120 of marking a theme for each class of websites, then judging and counting a crowd attribute for web pages under each theme, wherein the crowd attribute comprises gender, age, income, identity, education, interest and family status; step S130 of analyzing to-be-run advertisements, wherein analyzed contents comprise judgment for adjustment type and analysis for crowd attribute of run advertisement; step S140 of matching web page advertisements according to the crowd attribute analyzed by advertisements and web pages, and running advertisements on corresponding web pages. The implementation method for directionally running internet advertisements provided by the invention is capable of increasing accuracy and is characterized by good practical applicability.
Owner:浙江盘兴数智科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products