Unlock instant, AI-driven research and patent intelligence for your innovation.

Webpage text classification system based on maximum interval criterion

A maximum interval and text classification technology, which is applied in text database clustering/classification, unstructured text data retrieval, network data retrieval, etc., to achieve the effect of improving performance, strong applicability, and high accuracy

Active Publication Date: 2021-11-09
SUZHOU UNIV
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When the above algorithm has a large text corpus, the corpus is highly unbalanced, or there are many rare words, the parameter setting is a challenge

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage text classification system based on maximum interval criterion
  • Webpage text classification system based on maximum interval criterion
  • Webpage text classification system based on maximum interval criterion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments, so that those skilled in the art can better understand the present invention and implement it, but the examples given are not intended to limit the present invention.

[0036] Such as figure 1 As shown, the web page text classification system based on the maximum interval criterion in the preferred embodiment of the present invention includes the following modules:

[0037] The text preprocessing module is used for preprocessing the original text data and extracting the text data;

[0038] Described pretreatment comprises:

[0039] Text segmentation: Based on different languages, combine different word segmentation algorithms for text segmentation.

[0040] Text cleaning: Combining the domain and tasks of the text corpus, remove characters, numbers and texts that may interfere with text analysis: and, using the standard stop word list, remove s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a webpage text classification system based on a maximum interval criterion, and the system comprises: a text preprocessing module, which is used for preprocessing original text data and extracting the text data; a text representation module, which is used for calculating the weight of a feature item in combination with the vector space representation of the text, and representing the extracted text data; a feature item sorting module, which is used for carrying out correlation sorting on the feature items based on a maximum interval criterion; and a text classification module, which is used for constructing a classification model by utilizing the training set texts after the feature selection, and classifying the test set texts after the feature selection by utilizing the classification model. According to the webpage text classification system based on the maximum interval criterion, when a small number of feature items are selected, feature words with higher discrimination performance can be selected, the performance of webpage text classification is improved, and the webpage text classification system has the advantages of being high in applicability and high in accuracy.

Description

technical field [0001] The invention relates to the technical field of text classification, in particular to a web page text classification system based on the maximum interval criterion. Background technique [0002] As the main medium for people to express and receive information, text data accounts for the vast majority of Internet resources. Therefore, it is very necessary to efficiently mine valuable information from massive text data. Among them, text classification, as a text processing technology, is widely used in the fields of topic detection, sentiment analysis, spam filtering, and web page classification. Especially in the webpage classification task, it is a big challenge to search for information in such a large range, and arranging documents into different categories reduces the search space for user queries. [0003] Text classification based on machine learning technology mainly includes steps such as text preprocessing, text representation and weighting, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F16/957G06F40/279
CPCG06F16/35G06F16/957G06F40/279
Inventor 张莉金玲彬苏畅之赵雷王邦军
Owner SUZHOU UNIV