Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Text Incremental Dimensionality Reduction Method Based on Random Forest

A random forest and text technology, applied in unstructured text data retrieval, text database clustering/classification, computer components, etc., can solve problems such as low efficiency, large data volume, and high dimensionality of text data, and achieve high efficiency , high precision, effective effect of large data sets

Active Publication Date: 2021-10-08
TONGJI UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Text data has the characteristics of high dimensionality, large amount of data, and rich semantics. However, it is difficult for existing data dimensionality reduction methods to achieve good results in terms of efficiency and accuracy.
Statistics-based dimensionality reduction methods often ignore the semantic content of text data, and semantic-based dimensionality reduction methods are often inefficient

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Text Incremental Dimensionality Reduction Method Based on Random Forest
  • A Text Incremental Dimensionality Reduction Method Based on Random Forest
  • A Text Incremental Dimensionality Reduction Method Based on Random Forest

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0047] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to a text incremental dimension reduction method based on random forest, comprising the following steps: 1) dividing original text data into multiple subsets to construct original text feature map clusters; 2) dividing the text feature map clusters of each subset into Expressed in the form of a data table; 3) Sampling the records in the data table with replacement to establish a random forest data training set, and obtain a random forest by constructing a classification tree; 4) Convert the text feature map clusters of the new text subset into In the form of a data table, input all the records in the data table into the trained random forest, summarize the voting results of each classification tree to obtain a new category for each record, and complete the merging of the two text feature map clusters according to the classification results, Thus, incremental dimensionality reduction of existing text features is achieved. Compared with the prior art, the present invention has the advantages of high precision, strong expansibility, no need for feature variable selection, and no overfitting.

Description

technical field [0001] The invention relates to the field of data dimensionality reduction in machine learning and natural language processing, in particular to a text incremental dimensionality reduction method based on random forests. Background technique [0002] With the rapid development of technologies such as the Internet of Things, cloud computing, and big data, data has been more comprehensively acquired and rationally utilized, but problems such as data diversity, massiveness, and high-dimensionality have followed. Therefore, in order to use the data better, the data needs to be preprocessed. Data dimensionality reduction is to map data from high-dimensional space to low-dimensional space, remove irrelevant or redundant data, and retain data that reflects the essence of the original data. Using the dimensionally reduced data to perform tasks such as data search, data processing, and data mining can increase the accuracy of data search, reduce the amount of data ca...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06K9/62
CPCG06F16/35G06F18/214
Inventor 向阳陈晓军贾圣宾郭鑫
Owner TONGJI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products