Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A text aggregation method and system

An aggregation method and text technology, applied in the field of text clustering, can solve problems such as lack of context information, incomplete matching between short text content and preset types, and unsatisfactory effects, so as to reduce computational complexity and ensure text aggregation efficiency Effect

Active Publication Date: 2019-05-28
无码科技(杭州)有限公司
View PDF6 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Clustering algorithms can aggregate and generate topics contained in news. Clustering algorithms are usually only used in long texts, and the effect is usually not ideal when used in short texts, because long texts have relatively more vocabulary, which can provide a stable and rich Long text features are expressed; while short texts have less vocabulary and lack of context information, making it difficult to form valuable clusters
For short text integration, the general method is to use classification algorithms, but the classification algorithm requires manual preset topic types, but due to the multi-dimensionality of natural language data, it is easy to have the problem that short text content does not exactly match the preset types

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text aggregation method and system
  • A text aggregation method and system
  • A text aggregation method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 2

[0065] Such as figure 2 As shown, Embodiment 2 of the present invention discloses a text aggregation system for aggregating long texts and short texts. The implementation of the system can be realized by referring to the process of the above method, and the repetition will not be repeated. The system includes:

[0066] The topic generating module 201 is configured to cluster long texts to obtain topics corresponding to the long texts; the long texts include titles. Specifically, the topic generation module 201 first uses the TF-IDF algorithm to process the long text to obtain the feature words of the long text, then vectorizes the feature words to obtain the feature vector of the long text, and then uses the Single-Pass algorithm to obtain the feature words of the long text. The similarity of vectors clusters similar texts in long texts. Specifically, before clustering, the cosine similarity algorithm is used to calculate the similarity between feature vectors. When the simi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text aggregation method and a text aggregation system, which are used for aggregating long texts and short texts. The method comprises the following steps of clustering the long texts to obtain topics corresponding to the long texts, the long texts comprising titles; establishing a classification model, and obtaining an abstract of the long text and an entity set; establishing a first mapping set and a second mapping set by using the topic, the title, the abstract and the entity set; using the first mapping set and the second mapping set to train the classification model to obtain a trained classification model; obtaining an abstract of a long text to be measured, establishing a third mapping set by using the abstract of the long text to be measured and a short text to be measured, and obtaining a text aggregation result by using the third mapping set and the trained classification model. According to the method and the system, the entity characteristics of thelong text and the short text are utilized to screen out the short text containing the same entity as the long text, so that the calculation complexity is reduced, and the text aggregation efficiencyis ensured.

Description

technical field [0001] The present invention relates to the technical field of text clustering, and more specifically, to a text aggregation method and system. Background technique [0002] There are many sources of information in real life, including professional media websites, self-media platforms, and social media. It has become a development trend to integrate multiple semantically related information and short comments. For example, in scenarios such as displaying search results and presenting news information, most of them are displayed in the form of topics rather than single texts, which can integrate multiple news sources, reduce information redundancy, and provide users with richer information. [0003] In the process of integrating information and short comments, that is, in the process of integrating long texts and short texts, clustering algorithms are generally required. Clustering algorithms can aggregate and generate topics contained in news. Clustering alg...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F16/34G06F16/36
Inventor 夏静姬成龙吴东野冯大辉
Owner 无码科技(杭州)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products