Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

254 results about "Text cluster" patented technology

Short text clustering method based on deep semantic feature learning

The invention discloses a short text clustering method based on deep semantic feature learning. The method includes the steps that dimensionality reduction representation is performed on original features under the restraint of local information preservation through traditional feature dimensionality reduction, binarization is performed on an obtained low-dimension actual value vector, and error back propagation is performed with the binarized vector being supervisory information of a convolutional neural network structure to train a model; non-supervision training is performed on a term vector through an outer large-scale corpus, vectorization representation is performed on all words in text according to the word order, and the vectorized words serve as implicit semantic features of initial input feature learning text of the convolutional neural network structure; after deep semantic feature representation is obtained, a traditional K-means algorithm is adopted for performing clustering on the text. By means of the method, extra natural language processing and other specialized knowledge are not needed, design is easy, deep semantic features can be learnt, besides, the learnt semantic features have unbiasedness, and good clustering performance can be achieved more effectively.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

Method for constructing public opinion knowledge map based on hot events

The present invention discloses a method for constructing a public opinion knowledge map based on hot events, and belongs to the field of natural language processing. The method comprises: obtaining microblogging texts in real time, processing each microblogging text, constructing text clusters, calculating a topic category to which each text cluster belongs, identifying hot events in each clusterby category, and collecting statistics of multi-dimensional attributes of each hot event; identifying key people and organizations involved in the discussion of the hot events and obtaining the multi-dimensional attributes of the key people and organizations; and constructing a multi-dimensional attribute system and a relationship type among events, people and organizations, taking the relationship among the events, people and organizations as association, and constructing a public opinion knowledge map. According to the method disclosed by the present invention, the hot events, people and organizations can be described from multiple dimensions, and all-directional analysis of hot events, people and organizations can be implemented; and according to the actual needs, the weight of different topic categories can be set, and construction of the public opinion knowledge map of different topics can be realized.
Owner:NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT

Clustering method and system of parallelized self-organizing mapping neural network based on graphic processing unit

The invention relates to a clustering method and system of a parallelized self-organizing mapping neural network based on a graphic processing unit. Compared with the traditional serialized clustering method, the invention can realize large-scale data clustering in a faster manner by parallelization of an algorithm and a parallel processing system of the graphic processing unit. The invention mainly relates to two aspects of contents: (1) firstly, designing the clustering method of the parallelized self-organizing mapping neural network according to the characteristic of high parallelized calculating capability of the graphic processing unit, wherein the method comprises the following steps of obtaining a word-frequency matrix by carrying out parallelized statistics on the word frequency of keywords in a document, calculating feature vectors of a text by parallelization to generate a feature matrix of data sets, and obtaining a cluster structure of massive data objects by the parallelized self-organizing mapping neural network; and (2) secondly, designing a parallelized text clustering system based on a CPU / GPU cooperation framework by utilizing the complementarity of the calculating capability between the graphic processing unit (GPU) and the central processing unit (CPU).
Owner:HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL

Text clustering multi-document automatic abstracting method and system for improving word vector model

The invention discloses a text clustering multi-document automatic abstracting method and a system for improving a word vector model. The CBOW of the Hierachic Softmax belongs to the field of large-scale model training, and the CBOW of the Hierachic Softmax belongs to the field of large-scale model training. Based on the method, a TesorFlow deep learning framework is introduced into word vector model training; the problem of time efficiency of a large-scale training set is solved through streaming processing calculation, TF-IDF is introduced firstly during sentence vector representation, thenthe semantic similarity of a semantic unit to be extracted is calculated, weighting parameters are set for comprehensive consideration, and a semantic weighted sentence vector is generated; beneficialeffects are as follows. The advantages and disadvantages of semantics, deep learning and machine learning are comprehensively considered; density clustering and convolutional neural network algorithms are applied. Intelligent degree is high, according to the method, the statement with high relevancy with the central content can be quickly extracted to serve as the abstract of the text, various machine learning algorithms are applied to the automatic text abstract to achieve a better abstract effect, the method is possibly the main research direction in future in the field, and in addition, the system according to the invention supplies a tool for automatic extraction of a document abstract based on the method.
Owner:上海晏鼠计算机技术股份有限公司

Relationship linking method based on knowledge map

The invention relates to a relationship linking method based on a knowledge map. The method comprises the steps that firstly, a ternary group < subject, relation, object > list containing a certain relation is found using a SparQL query statement from a knowledge mapping domain, and a relation text is matched from an unstructured text; a similarity matrix of the relation text is obtained by using an LSWMD algorithm, then clustering is conducted on the relation text by using a density peak clustering algorithm, and a relation text cluster is obtained; the position of all the words in the cluster is extracted based on the relation text cluster, fitting is conducted using the beta distribution, and a word distribution mode of the relation text cluster is obtained; for the candidate relation text of unestablished relation in the unstructured text of an open domain, the vector is constructed using the word distribution mode, a GBDT classifier is used for carrying out the identification, and linking with the knowledge mapping domain is achieved. According to the relationship linking method based on the knowledge map, the problem of insufficient link between a natural language and the knowledge map is effectively solved, and it is helpful for the computer to understand the natural language better.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Iteration text clustering method based on self-adaptation subspace study

The invention discloses an iteration text clustering method based on self-adaptation subspace study. The method includes the following steps: (1) initiation: text linguistic data is expressed as a text vector space, initial K clusters are generated through an affine propagation clustering method, and all text clustering categories are expressed as an initial category affiliation indication matrix; and (2) iteration between the subspace projection and the clusters: the initial category affiliation indication matrix is used as prior knowledge, a maximum average neighborhood edge is used as a target to solve a subspace projection matrix, the text vector space is projected to a subspace, K clusters are generated through the affine propagation clustering method in the subspace, and a category affiliation indication matrix is updated; and a convergent function is calculated based on the subspace projection matrix and the category affiliation indication matrix till the function is converged, iteration exits, and text clustering is finished. The iteration text clustering method does not limit the capacity and distribution of text data, subspace solution and clusters are fused under a uniform frame, and an overall optimal clustering result is obtained through an iteration strategy.
Owner:广东南方报业传媒集团新媒体有限公司

Class center compression transformation-based text clustering method in search engine

The invention discloses a class center compression transformation-based text clustering method in a search engine. The method comprises the following steps of: by using an improved tf-idf formula, calculating word weight of each file in a text set, calculating an initial class center, mining a synonym word set and a concurrent high-frequency word set, calculating a word center and performing primary classification according to similarity of the initial class center with each file; compressing the center word according to information such as title word, article length, synonyms and concurrent associated words, thereby guaranteeing that the same word only occurs in some class centers with high similarity with the word; clustering the file by using a new cluster center again; calculating core similarity of each class; splitting the biggest class; combining smaller classes to produce a new class; iterating compression, clustering and split operation until the number of the classes converges; and guaranteeing that the similarity of the text in the same class with the cluster center reaches a certain threshold value. The clustering accuracy is obviously higher than those of the conventional methods such as KMeans and DBSCAN (Density-based Spatial Clustering of Applications with Noise).
Owner:珠海市颢腾智胜科技有限公司

Text clustering method on basis of automatic threshold fish swarm algorithm

The invention discloses a text clustering method on the basis of an automatic threshold fish swarm algorithm. The text clustering method includes computing a similarity matrix of feature vectors of texts, acquiring an initial equivalent partitioning threshold of each text by a corresponding row of elements of the similarity matrix, performing initial equivalent partitioning for the texts and determining an initial clustering number and an initial clustering center; and adopting the artificial fish swarm algorithm in a combination manner, updating the state of each artificial fish according to global optimal information and local optimal information, searching a global optimal clustering center and clustering initial clustering results again. The text clustering method has the advantages that the initial clustering number and the initial clustering center are acquired by a process for automatically acquiring the thresholds, the global optimal clustering center is searched by the aid of the artificial fish swarm algorithm, accordingly, shortcomings that the traditional clustering method is sensitive to initial values and only relies on local data characteristics and the like are overcome, and the text clustering accuracy and the text clustering intelligence can be improved.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products