Unlock instant, AI-driven research and patent intelligence for your innovation.

A Quantifiable Granularity Topic Extraction Method

An extraction method and topic technology, applied in the field of text analysis, can solve problems such as high computational complexity and failure to meet the requirements of granular topic extraction and analysis, and achieve the effect of easy description and easy understanding

Inactive Publication Date: 2011-12-14
FUDAN UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The computational complexity of existing methods is significantly higher
[0006] It can be seen that it is very important to realize automatic topic extraction with quantifiable granularity, but the existing methods have shortcomings in granularity indication and granular topic extraction algorithm design, and cannot meet the requirements of granular topic extraction and analysis.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Quantifiable Granularity Topic Extraction Method
  • A Quantifiable Granularity Topic Extraction Method
  • A Quantifiable Granularity Topic Extraction Method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] (1) Download the text set to be analyzed from the Internet.

[0026] According to the pre-arranged topic keywords, search for relevant topic texts from the Internet, and obtain these text records through data analysis based on the HTTP (Hypertext Transfer Protocol) protocol, save them locally, and extract the text information to obtain topic information collection of text.

[0027] (2) Preprocessing of text sets

[0028] Segment each text and remove some common stop words, so as to obtain a vocabulary T corresponding to the text set. Each line of the vocabulary is a word, and there are no repeated records in the vocabulary.

[0029] (3) Construct word frequency matrix

[0030] For each document in the text set d i , construct a row vector v i ={c i1 , c i2 , c i3 ,...,c iX}, where X represents the number of all words in the vocabulary T, c ij The calculation method is as follows:

[0031] c ij = ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of text analysis, and in particular relates to a topic extraction method with quantifiable granularity. The present invention converts the word frequency matrix of the text set into a matrix representing word energy through DCT transformation, and performs energy division of the transformation matrix according to the energy distribution characteristics in this matrix, so that the topic granularity corresponds to the granularity parameter expected by the user stand up. On the basis of energy segmentation, the DCT inverse transformation is performed to obtain a feature space corresponding to the granularity. In this space, the existing topic extraction method is used to extract granular topics, thereby completing topic extraction with quantifiable granularity. The invention provides an effective method for users to extract granular topics more accurately, and is suitable for various topic extraction occasions requiring granular understanding.

Description

technical field [0001] The invention belongs to the technical field of text analysis, and in particular relates to a data analysis method for extracting topic granular feature descriptions from a text set. Background technique [0002] At present, the Internet has become a main way and space for information sharing. A large amount of text information is generated on the Internet every day, such as various news reports, product introductions, various network comments, and so on. In addition, many massive information databases, such as patent information databases, scientific papers literature databases, etc., contain rich text information, and can be quickly shared through the Internet. Discovering various hidden topics from these massive text information sources is a requirement of many applications, such as automatic analysis of certain product reviews on the Internet. It is the premise of this process to allow computers to automatically discover topics from text informati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 曾剑平吴承荣
Owner FUDAN UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More