Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

785 results about "Topic model" patented technology

In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats, and "the" and "is" will appear equally in both. A document typically concerns multiple topics in different proportions; thus, in a document that is 10% about cats and 90% about dogs, there would probably be about 9 times more dog words than cat words. The "topics" produced by topic modeling techniques are clusters of similar words. A topic model captures this intuition in a mathematical framework, which allows examining a set of documents and discovering, based on the statistics of the words in each, what the topics might be and what each document's balance of topics is.

Method and system for identifying a key influencer in social media utilizing topic modeling and social diffusion analysis

A system and method for identifying a key influencer in a social media environment for enterprise marketing utilizing topic modeling and social diffusion analysis. A user interest profile can be generated by analyzing historical data stored in a database utilizing. A social graph can be generated and an influence measuring process based on the social graph data can be performed utilizing a static diffusion model and a dynamic diffusion model to calculate a set of key influencers. The dynamic diffusion model considers time stamp information to assess an impact of each user communication on the growth of a conversation within a time period. The key influencer can be identified in a specific topic area and a number of total users that can be reached via the influencer within a specific time window can be predicted.
Owner:XEROX CORP

Domain entity disambiguation method for fusing word vectors and topic model

The invention relates to a domain entity disambiguation method for fusing word vectors and a topic model, and belongs to the technical field of natural language processing and deep learning. The method comprises the steps of obtaining candidate entity sets of to-be-disambiguated entities; obtaining vector forms of the to-be-disambiguated entities and candidate entities, obtaining categorical referents of the to-be-disambiguated entities in combination with a hyponymy relation domain knowledge base, and performing context similarity and categorical referent similarity calculation; performing word vector training on documents in different topic classifications by utilizing the LDA topic model and a Skip-gram word vector model, obtaining word vector representations of different meanings of apolysemous word, extracting a topic domain keyword of a text by using a K-Means algorithm, and performing domain topic keyword similarity calculation; and finally, fusing three feature similarities, and taking the candidate entity with the highest similarity as a final target entity. The method is superior to a conventional disambiguation method and can well meet the demands of actual applications.
Owner:KUNMING UNIV OF SCI & TECH

Garment fit portrayal system and method

An on-line garment fit portrayal system configured to operate on a specialized server linked over the internet or any network using standard web services to at least one web-enabled player device capable of common image format display is described. The system comprises a server-side garment model storage capability containing one or more garments, with parameters for each garment chosen from the groupings that include a garment piece parts list, piece spatial properties, piece mechanical parameters, piece optical parameters, and assembly information; a server-side modeler mechanism configured for generating a three-dimensional model of a subject's body from individual body data, the model being represented by body data stored in a body model storage capability; a server-side simulator mechanism operatively coupled with the garment model storage capability and the body model storage capability for simulating a three-dimensional form fit of a garment represented in the garment model storage capability onto a body represented in the body model storage capability, the simulator mechanism producing a portrayal subject model; and a server-side rendering mechanism operatively coupled with the simulator mechanism for portraying a perspective view on any web-enabled device's display screen of the portrayal subject model representing a three-dimensional form fit of the garment on the subject's body.
Owner:EMBODEE PR LLC

Cross-modal subject correlation modeling method based on deep learning

The invention belongs to the technical field of cross-media correlation learning, and particularly relates to a cross-modal subject correction modeling method based on deep learning.The method includes two main algorithms of multi-modal file expression based on deep vocabularies and correlation subject model modeling fusing cross-modal subjection correction learning.A deep learning technology is utilized for constructing deep semantic vocabularies and deep vision vocabularies to describe a semantic description part and an image part in a multi-modal file.Based on multi-modal file expression, a cross-modal correlation subject model is constructed to model a whole multi-modal file set, so that the generation process of the multi-modal file and the correlation between different modals are described.The accuracy is high, and adaptability is high.The cross-modal subject correction modeling method has important meaning for efficient cross-media information retrieval in consideration of multi-modal semantic information on the basis of the large-scale multi-modal file (a text and an image), can improve retrieval correlation and promote user experience, and has great application value in the field of cross-media information retrieval.
Owner:FUDAN UNIV

Community-based author and academic paper recommending system and recommending method

The invention relates to a community-based author and academic paper recommending system and a recommending method. A double-layer quotation network consisting of an author layer and an academic paper layer is formed by utilizing a quotation relation between an author and the academic paper and the community information, then a user interesting model is established according to a historic behavior record of the user and the academic paper set read by the user, finally the user demand is analyzed according to the obtained double-layer quotation network and the user interesting model, and the author and academic paper thereof can be recommended to the user. The system is provided with an academic paper capturing module, an academic paper preprocessing module, a double-layer quotation network establishing module, a user interesting model establishing module and an individualized academic paper recommending module as well as a database. By adopting the recommending system and recommending method, not only can the correlation of the study content among users be used for establishing an author community through a subjective model, but also multiple attribute values of the to-be-recommended author and academic paper inside the community can be calculated, and the weakness that the calculation of the existing recommending algorithm is large can be improved; and meanwhile, multiple attribute values of the author and academic paper can be simultaneously calculated, so that the recommend result is more diversified, and the user requirement can be better met.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Microblog user interest recognizing method based on text mining

The invention discloses a microblog user interest recognizing method based on text mining, and belongs to the field of text mining and natural language processing. The method includes the steps of collecting the newest topical microblog text data of a microblog text set and microblog text data of a designated user, standardizing the collected microblog text data, recognizing the newest microblog words and renewing a new word dictionary for the standardized topical microblog text data through the microblog new word recognition method, conducting Chinese character word separation on the standardized microblog text data of the designated user through the new word dictionary word separation method to achieve text vector expression, clustering the microblog text data, expressed through text vectors, of the designated user, recombining original microblog text data, extracting new text set features through a topic model, presetting topic dictionaries, calculating the weight of each topic dictionary based on the new text set features to obtain the final topic, and enabling the final topic to serve as the microblog user interest recognition, thereby improving accuracy of feature extraction.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA

Question and answering (QA) system realization method based on deep learning and topic model

The invention discloses a question and answering (QA) system realization method based on deep learning and a topic model. The method comprises the steps of: S1, inputting a question sentence to the Twitter LDA topic model to obtain a topic type of the question sentence, extracting a corresponding topic word, and indicating the input question sentence and the topic word as word vectors; S2, inputting word vectors of the input question sentence to a recurrent neural network (RNN) for encoding to obtain an encoded hidden-layer state vector of the question sentence; S3, using a joint attention mechanism and combining local and global hybrid semantic vectors of the question sentence by a decoding recurrent neural network for decoding to generate words; S4, using a large-scale conversation corpus to train a deep-learning topic question and answering model based on an encoding-decoding framework; and S5, using the trained question and answering model to predict an answer to the input questionsentence, and generating answers related to a question sentence topic. The method makes up for the lack of exogenous knowledge of question and answering models, and increases richness and diversity of answers.
Owner:SOUTH CHINA UNIV OF TECH

Text classification method

A text classification method comprises following steps: dividing the initial training text collection into a plurality of subsets including the text in the same category based on the category, extracting the corresponding probability topic model from each subset; generating new text to balance the categories of the subsets by the corresponding probability topic model; constructing a classifier based on the balance training text collection corresponding to plural subsets; and processing text classification by the classifier. The invention can improve the classification effect of the text classification method under the condition of data skew.
Owner:UNIV OF SCI & TECH OF CHINA

Information throttle based on compliance with electronic communication rules

ActiveUS20160330084A1Communication speed is reducedDecrease increase functionalitySpecial service provision for substationService provisioningSocial mediaElectronic communication
Throttles electronic devices based on compliance with rules for electronic communications such as emails, texts, or postings on social media sites. Rules may for example prohibit certain topics, language, or behaviors such as online bullying. If system detects a violation of the electronic communications rules, it may block access or reduce performance on one or more electronic devices as a consequence. In some cases, devices or selected apps or services may continue to function, but at a reduced level. Conversely the system may provide rewards for conforming with the rules. Throttling of devices may also depend on other factors, such as homework completion, test results, grades, and environmental conditions. Machine learning techniques may be applied to determine when electronic communications may violate the rules. For example, probabilistic topic models may be applied to determine the topics of electronic communications, and to assess whether these topics violate the rules.
Owner:ETURI CORP

Personalized research direction recommending system and method based on themes

ActiveCN103425799AUnobscuredOvercome the defect of increasingly narrow field of viewSpecial data processing applicationsPersonalizationField of view
The invention discloses a personalized research direction recommending system and method based on themes. Paper topics read by users and preference of the users for related paper topics can be obtained through the recommending system according to all the papers read by the users and according to the themes of the papers obtained when training is conducted through a theme model training module, therefore, the recommending system can recommend a new research direction for the users to widen the vision of the users. The innovation key of the personalized research direction recommending system and method based on the themes is to construct a three- layer graph model according to the relationship between the users and the papers and the relationship between the papers and the themes, to calculate preference values of the users for the themes according to the three-layer graph model, to obtain a user-theme preference weight matrix, and to calculate similar user set between the users and other users based on the weight matrix. The preference degree of the themes which are not touched by the users is predicted according to the similarity value of the similar users in the similar user set and according to the preference values of the similar users for the themes, and the research direction, namely, the research theme, is recommended for the users according to the prediction result.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Large scale unsupervised hierarchical document categorization using ontological guidance

A classification method includes constructing queries from category descriptors representing categories of a taxonomy of hierarchically organized categories. The query constructed for a category c includes a query component based on descriptors of the category c and at least one query component based on descriptors of an ancestor or descendant category of the category c. A documents database is queried using the constructed queries to retrieve pseudo-relevant documents. Language models for the categories of the taxonomy are extracted from the pseudo-relevant documents by inferring a hierarchical topic model representing the taxonomy. An input document is classified by optimizing mixture weights of a weighted combination of categories of the hierarchical topic model respective to the input document.
Owner:XEROX CORP

System And Method For Using Banded Topic Relevance And Time For Article Prioritization

A system and method for using banded topic relevance and time for article prioritization is provided. Articles of digital information and at least one social index are maintained. The social index includes topics that each relate to one or more of the articles. Fine-grained topic models matched to the digital information for each topic are retrieved. The articles are succinctly classified under the topics using the fine-grained topic models. Each of the articles is relevancy scored within the topic under which the article was classified. The articles are arranged into discrete bands by relevance score. The articles are temporally sorted within the discrete bands. The articles are presented within the discrete bands.
Owner:PALO ALTO RES CENT INC

Online-increment evolution topic model based automatic software classifying method

An online-increment evolution topic model based automatic software classifying method includes acquiring relevant software texts, grouping and preprocessing by a preset time slice; generating a probability model of an online evolution topic model, computing the number of the optimum topics according to project description texts grouped according to the time slice, and incrementally computing topic word distribution and topic text distribution of the project description texts within the current time slice; acquiring a text d of an unknown classifying topic, computing topic word distribution of n topics subordinative to the text d according to the topic word distribution and the topic text distribution, classifying the text d into corresponding topics, and automatically adding semantic tags to the topics based on the word list and word inquiry method, and finally completing classification of software projects. By the online-increment evolution topic model based automatic software classifying method, new topics appearing in open source communities can be found in time, software projects can be automatically classified, a software developer can search out required open source software projects according to software topics conveniently, and accordingly, software development efficiency is improved, and quality and assurance of the open source communities are improved.
Owner:NAT UNIV OF DEFENSE TECH

Online advertisement classified pushing method and system based on consumer behavior data analysis and classification technology

The present invention relates to an online advertisement classified pushing method and system based on a consumer behavior data analysis and classification technology. Compared with the prior art, the online advertisement classified pushing method and system overcome the defect that potential customers cannot be mined to carry out network online advertisement pushing. The online advertisement classified pushing method comprises the following steps of: carrying out data collection and preprocessing, i.e. collecting behavior data of consumers from online mobile terminals, establishing a data pool, carrying out preprocessing operation on the data in the data pool and providing data support for subsequent data analysis and modeling; aiming at the behavior data of the consumers, carrying out modeling, i.e. establishing a topic model facing the behavior data of the consumers so as to mine relations between the consumers and online advertisement categories as well as a purchase time period; and aiming at the consumers, carrying out effective classification and aiming at different consumer categories, pushing the corresponding types of advertisements online. According to the present invention, by collecting the behavior data of the consumers on various mobile terminals, carrying out analysis and modeling on behaviors of the consumers and mining consumption habits of different consumers, effective classification of the consumers is implemented.
Owner:JINJUAN MEDIA TECH CO LTD

System and method for using real-time keywords for targeting advertising in web search and social media

As the result of a keyword search, real time and social news stream Web search results are retrieved and analyzed to build a topic model of n-grams. The n-grams of the topic model are treated as ad-based keywords to determine advertisements to be displayed in conjunction with the real time Web search results. The real time Web search results and the advertisements are then be presented or displayed for user consumption or review.
Owner:III HLDG 1

Personalized user tag modeling and recommendation method based on unified probability model

The invention discloses a personalized user tag modeling and recommendation method based on a unified probability model, comprising the following steps: S1, carrying out statistics on tagging behaviors of users on a social tagging site; S2, carrying out formal definition on questions tagged by the users; S3, establishing a topic model based on user tagging, wherein the topic model is a unified probabilistic model and called a UdT model; S4, establishing a frame of a tag recommendation system based on the UdT model, wherein the frame is recommended through learning the interest of the users and according to semantic information included in the interest; and S5, verifying the frame of the tag recommendation system. Experimental results show that by using the method of the invention, user interest can be effectively explored and the accuracy of tag recommendation can be improved.
Owner:TSINGHUA UNIV

Method for automatically detecting obvious object sequence in video based on learning

The invention discloses an automatic inspection method of a significant object sequence based on studying videos. In the method of the invention, static significant features are firstly calculated, and then dynamic significant features are calculated and self-adaptively combined with the static significant features to form a significant feature restriction; the space continuity of each image of frame is calculated; the time continuity of significant objects in neighboring images is calculated. The similarity between all possible significant objects is calculated by the method; a significant object sequence obtained through the former calculation is utilized to calculate the overall subject model and calculate corresponding energy contribution; the overall optimum solution is solved by dynamic planning so as to obtain the overall optimum significant object sequence; the iteration is continued for solving if a convergence condition is not satisfied, otherwise a rectangle box sequence is outputted as the optimum significant object sequence. The method of the invention can effectively settle the choosing of the static and dynamic significant features, the optimum integration of various restraint conditions and the high effective calculation of target sequence inspections.
Owner:XI AN JIAOTONG UNIV

Topic feature text keyword extraction method

The invention discloses a topic feature text keyword extraction method. Through the method, text keyword extraction results better than those of a traditional TF-IDF method can be obtained. Accordingto the technical scheme, at a training stage, word segmentation, stop word removal, part-of-speech filtering and other preprocessing are performed on a training text, statistical analysis is performedon inverse document frequency of words, meanwhile a topic model method is utilized to learn and obtain a topic probability matrix of the words, normalization processing is performed, topic distribution entropy of the words is calculated according to the topic probability matrix of the words, global weights of the words are calculated in combination with the inverse document frequency and the topic distribution entropy, and global weight calculation results are output to a test stage; and after a test text is preprocessed, statistical analysis is performed on normalized term frequency of wordsin the test text, the normalized term frequency is combined with the global weight calculation results obtained at the training stage, comprehensive scores of the words are calculated are ordered, and a plurality of words with the highest scores in the score order are used as automatic keyword extraction results of the current test text.
Owner:10TH RES INST OF CETC

Text-subject-model-based data processing method for commodity classification

The invention provides a text-subject-model-based data processing method for commodity classification. The method comprises the following steps of: importing Chinese and English vocabulary related to a service into a universal word library of a word segmentation system, and importing white name English words related to the service for brands and common commodity English; further expanding a stop word library of the word segmentation system; segmenting words for a description text part of a commodity, so that each commodity can have a bag of words which is not related to sequence; counting word segmentation results to acquire uncommon vocabulary with high frequency, and thus constructing a preferential word library; and appointing a general classification quantity, setting related parameters, executing quick Gibbs sampling, acquiring potential semantic association, comparing the latent semantic association with the preferential word library, the universal word library and the stop word library respectively, calculating comparison results to obtain the most possible classification of the commodity, and marking the classification by using the bags of words. In consideration of latent semantics, the influence of subjective factors of editorial staff is reduced, so that the commodity classification is accurate.
Owner:BAIDU COM TIMES TECH (BEIJING) CO LTD

Topic model based document keyword extraction method and system

The invention discloses a topic model based document keyword extraction method and system. The document keyword extraction method comprises the following steps of document information preprocessing, document structure graph construction, document topic distribution extraction, word weight extraction and keyword generation. The document keyword extraction system comprises the following modules: a document information preprocessing module, a document structure graph construction module, a document topic distribution extraction module, a word weight extraction module and a keyword generation module. According to the method and system, extracted keywords are more reasonable and related to a topic of a document more closely; and partial deficiencies in the keyword extraction field at present are overcome, a better document summarization effect is achieved, and a user can conveniently and quickly know an abstract of the document.
Owner:SOUTH CHINA UNIV OF TECH

Time window based LDA microblog topic trend detection method and apparatus

The invention discloses a time window based LDA microblog topic trend detection method and apparatus. The method comprises: extracting a topic word from a word set by utilizing an LDA model in each time window, and obtaining global topics; performing similarity computing on the global topics, and performing K-means clustering to obtain hot topics conforming to public opinion analysis; extracting feature words of each hot topic in each time window in sequence in combination with the hot topic through the LDA topic model; and in combination with results of the feature words, computing a popular value of the hot topic in each time window, and drawing a trend graph of the hot topic. The apparatus comprises a first acquisition module, a second acquisition module, an extraction module and a drawing module. According to the detection method and apparatus, the precision of microblog topic detection is improved, so that a trend index is more expressive, and a more accurate basis is provided for analyzing a hot topic trend.
Owner:TIANJIN UNIV

News recommendation method and topic characterization method based on RNN and attention mechanism

The invention relates to an RNN and attention mechanism-based news recommendation method and a topic representation method, which combine a traditional topic model with a neural network word vector and can effectively improve the accuracy of semantic extraction and representation of news content texts. The RNN is used for describing the seriality characteristics of news browsing of the user, so that the timeliness of the personalized news recommendation content can be greatly improved; influence weights of different news on recommendation prediction are distinguished by utilizing an attentionmechanism, user interest migration can be captured, and the accuracy and novelty of personalized news recommendation contents are improved; and finally, in combination with an attention mechanism of aDBSCAN density clustering algorithm; performing heuristic discovery on the new topics and the old topics through density clustering, and dynamically calculating influence weights of news by utilizinga topic clustering result, so that the novelty of the recommended topics is improved.
Owner:HUAQIAO UNIVERSITY

LDA-based text classification method

The invention provides an LDA-based text classification method. The method comprises the following steps of: extracting and inputting a feature word set into a text classification model so as to calculate the probability of each type in A predetermined types to which a text belongs, and taking the type with the maximum probability value as the type to which the text belongs; previously training an LDA topic model by using a training corpus according to a set topic number K, so as to K topic associated word sets; previously verifying the text classification model by using a type-specific verification corpus, so as to obtain a classification correctness of each type in the A types; when classification is carried out by using the text classification model, directly outputting a result if the classification correctness, obtained by the text classification model, of the type achieves a set threshold value; and otherwise, calculating the weighted values of K topics corresponding to the text by using the LDA topic model, selecting the topic with the maximum weighted value, forming an expanded feature word set by the first Y words in the associated words of the topic, and carrying out classification again by using the text classification model. The method provided by the invention is strong in scene adaptability and high in result usability.
Owner:NINGBO UNIV

Graph model-based automatic abstracting method

ActiveCN105243152AMeasuring Semantic RelevanceAchieve complementary effectsSpecial data processing applicationsCosine similaritySubject matter
The invention relates to the field of automatic abstracting, and discloses a graph model-based automatic abstracting method. According to the technical scheme, an LDA probability topic model is applied to measurement of semantic correlation between sentences and improvement of the measurement effect of sentence correlation; and an idea of topic correlation and position sensitivity of the sentences is provided, so that abstract generation is relatively reasonable and effective. The method comprises the following steps: firstly, obtaining topic probability distribution of a document and word probability distribution of the topic through training the LDA topic model, determining the topic probability distribution of the sentences and effectively converting a semantic similarity measurement between the sentences into a similarity measurement problem of the topic probability distribution of the sentences; with the sentences as nodes, building edges by referring tothe cosine similarity and according to the semantic similarity between the sentences and generating a text graph representing the document; calculating the topic correlation between the sentences according to the topic probability distribution of the sentences and the topic probability distribution of the document; and calculating the position sensitivity and the like of the sentences according to the positions of the sentences in the document.
Owner:TONGJI UNIV

Automatic extraction method for text labels in combination with theme model and semantic analyses

The invention relates to an automatic extraction method for text labels in combination with theme model and semantic analyses, pertaining to the technical field of computer application. The method comprises pre-treatment, LDA modeling, context analyses and label extraction.The pre-treatment comprises following steps: removing low-frequency words, removing stop words and removing label information, wherein stop words are auxiliary words without any information, words showing sentence grammar structures, all function words and punctuations. The LDA modeling process comprises following steps: obtaining two matrixes after processing the LDA model: one is a file-theme matrix of N*K with each element corresponding to a hidden theme distribution of each file and the other is a K*M theme-word matrix with each element corresponding to a word distribution of each theme. Based on a conventional counting method, the method takes correlations of words in files into consideration and fully utilizes one key feature of context information so that label information of files is obtained.
Owner:DATAGRAND TECH INC

News event evolution analysis method based on time sequence distribution information and topic model

The invention discloses a news event evolution analysis method based on time sequence distribution information and a topic model and relates to the field of text analysis. The method comprises the following steps: firstly, dividing a corpus into a plurality of sub-corpuses according to time by analyzing distribution characteristics, presented on a time sequence, of a news report, and by using a K-Means clustering algorithm; secondly, sequentially performing topic modeling on each sub-corpus by using the topic model, and learning the model through a Gibbs sampling method to obtain topic distribution information of each sub-corpus; finally, connecting topics between which the distance is minimum in series by calculating a Jensen-Shannon distance between each two topics in the adjacent sub-corpuses, wherein the topics are connected in series to obtain a main topic of an event, and auxiliary topics except the main topic in each sub-corpus are concerns and new developments of the event in each stage. According to the method, the mainline of event development in a news prediction and new concerns burst in each stage can be better described.
Owner:TONGJI UNIV

Evolution analysis device and method for contents of network topics

The invention provides an evolution analysis device and an evolution analysis method for the contents of network topics. The evolution analysis device for the contents of the network topics comprises a network event data collection device, a network event data pretreatment device, a topic content evolution analysis device and an output device. The evolution analysis method for the contents of the network topics includes the steps of network event data collection, network event pretreatment, similarity calculation, topic multi-center establishment, topic center renewal and output. A plurality of aspects of the contents corresponding to the topic can be found by the invention, and a multi-center structure is used for establishing a corresponding topic model so as to more accurately and comprehensively describe the topics; the dynamic evolution and development process of the contents of the topics can be displayed by the establishment and the renewal of the multi-centers of the topics, namely the whole process of generation, development, climax and ending of the topics. The method of the invention does not depend on the processing sequence of report, and can adapt to the circumstance of cross occurrence of news reports with different emphasises.
Owner:HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products