Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A text aggregation method and system

An aggregation method and text technology, applied in the field of text clustering, can solve problems such as lack of context information, incomplete matching of short text content and preset types, and difficulty in forming valuable clusters, so as to reduce computational complexity and ensure Effects of Text Aggregation Efficiency

Active Publication Date: 2021-07-09
无码科技(杭州)有限公司
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Clustering algorithms can aggregate and generate topics contained in news. Clustering algorithms are usually only used in long texts, and the effect is usually not ideal when used in short texts, because long texts have relatively more vocabulary, which can provide a stable and rich Long text features are expressed; while short texts have less vocabulary and lack of context information, making it difficult to form valuable clusters
For short text integration, the general method is to use classification algorithms, but the classification algorithm requires manual preset topic types, but due to the multi-dimensionality of natural language data, it is easy to have the problem that short text content does not exactly match the preset types

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text aggregation method and system
  • A text aggregation method and system
  • A text aggregation method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 2

[0065] Such as figure 2 As shown, Embodiment 2 of the present invention discloses a text aggregation system for aggregating long texts and short texts. The implementation of the system can be realized by referring to the process of the above method, and the repetition will not be repeated. The system includes:

[0066] The topic generating module 201 is configured to cluster long texts to obtain topics corresponding to the long texts; the long texts include titles. Specifically, the topic generation module 201 first uses the TF-IDF algorithm to process the long text to obtain the feature words of the long text, then vectorizes the feature words to obtain the feature vector of the long text, and then uses the Single-Pass algorithm to obtain the feature words of the long text. The similarity of vectors clusters similar texts in long texts. Specifically, before clustering, the cosine similarity algorithm is used to calculate the similarity between feature vectors. When the simi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a text aggregation method and system for aggregating long texts and short texts. The method includes the steps of: clustering long texts to obtain topics corresponding to long texts, and the long texts include Title; establish a classification model, and obtain the abstract and entity collection of long text; use the topic, title, abstract and entity collection to establish the first mapping set and the second mapping set; use the first mapping set and the second mapping Set training described classification model, obtain the classification model after training; Obtain the abstract of long text to be measured, utilize the abstract of described long text to be measured and the short text to be measured to establish the 3rd mapping set, utilize described 3rd mapping set and The trained classification model obtains text aggregation results; the method and system use the entity features of long texts and short texts to filter out short texts that contain the same entities as long texts, reducing computational complexity and ensuring text aggregation efficiency.

Description

technical field [0001] The present invention relates to the technical field of text clustering, and more specifically, to a text aggregation method and system. Background technique [0002] There are many sources of information in real life, including professional media websites, self-media platforms, and social media. It has become a development trend to integrate multiple semantically related information and short comments. For example, in scenarios such as displaying search results and presenting news information, most of them are displayed in the form of topics rather than single texts, which can integrate multiple news sources, reduce information redundancy, and provide users with richer information. [0003] In the process of integrating information and short comments, that is, in the process of integrating long texts and short texts, clustering algorithms are generally required. Clustering algorithms can aggregate and generate topics contained in news. Clustering alg...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F16/34G06F16/36
Inventor 夏静姬成龙吴东野冯大辉
Owner 无码科技(杭州)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products