Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Webpage classification method, terminal equipment and storage medium

A webpage classification and webpage technology, applied in neural learning methods, website content management, network data retrieval, etc., can solve the problems of not being widely applicable to webpage data, limited scope of application, low generalization ability, etc., and achieve sparse webpage features Problems, broad applicability, effects of addressing limitations

Active Publication Date: 2020-12-25
XIAMEN MEIYA PICO INFORMATION
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method is less flexible and has the problem of low generalization ability; 2) building a classification model based on webpage content, most of the webpage features currently used are too single, such as only using text information or only using image information as the feature representation of webpages, It cannot fully represent the content information of the webpage, ignoring the information carried by other structural data, often ignoring the key information and causing the features to be more sparse, which has obvious limitations
Existing webpage classification methods have the following deficiencies: (1) by comparing webpage content or URLs at present, it is usually necessary to build a large-scale comparison library. High error rate and poor generalization; (2) At present, the classification model is built based on webpage content. Since only single-structure data is considered in the modeling process, but the information structure of webpage content is diverse, some webpages may only have text or pictures, etc.
Therefore, it is easy to see that the classification method based on single-structure data cannot be widely applied to all web page data, and cannot solve the problem of feature sparsity. The scope of application is very limited, and the model effect cannot be guaranteed.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage classification method, terminal equipment and storage medium
  • Webpage classification method, terminal equipment and storage medium
  • Webpage classification method, terminal equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] The embodiment of the present invention provides a webpage classification method, such as figure 1 As shown, the method includes the following steps:

[0031] S1: Collect multiple types of web pages, construct graph structures based on at least two types of features in each web page, and mark the types of web pages, and then form a training set with all graph structures with type labels.

[0032] The construction of the graph structure includes the construction of nodes and the construction of edges. Nodes in this embodiment include picture nodes corresponding to picture types, text nodes corresponding to text types, and webpage nodes corresponding to webpage structure types, such as figure 2 As shown, the nodes beginning with "O" represent different web page nodes, the nodes beginning with "W" represent different text nodes, and the nodes beginning with "P" represent different image nodes.

[0033] 1. Image node

[0034] In this embodiment, the picture nodes use th...

Embodiment 2

[0076] The present invention also provides a webpage classification terminal device, which includes a memory, a processor, and a computer program stored in the memory and operable on the processor, and the implementation of the present invention is realized when the processor executes the computer program. Steps in the above method embodiment of Example 1.

[0077] Further, as an executable solution, the web page classification terminal device may be computing devices such as desktop computers, notebooks, palmtop computers, and cloud servers. The web page classification terminal device may include, but not limited to, a processor and a memory. Those skilled in the art can understand that the composition structure of the above-mentioned webpage classification terminal device is only an example of the webpage classification terminal device, and does not constitute a limitation to the webpage classification terminal device, and may include more or less components than the above, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a webpage classification method, terminal equipment and a storage medium, and the method comprises the steps: S1, collecting various types of webpages, constructing a graph structure according to at least two types of features in each webpage, marking the types of the webpages, and enabling all graph structures with type marks to form a training set; S2, constructing a graph convolutional neural network model, training the graph convolutional neural network model through the training set, and taking the trained model as a webpage classification model; and S3, for the to-be-classified webpage, constructing a graph structure according to the at least two types of features in the step S1, and determining a webpage type corresponding to the graph structure through a webpage classification model. According to the method, additional heterogeneous information such as texts and pictures in the web pages is fully learned to construct the web page classification model, and compared with an existing web page classification method, the limitation of the web page classification method based on a single data structure can be effectively solved, and the problem of web page feature sparseness can be obviously solved.

Description

technical field [0001] The invention relates to the field of webpage classification, in particular to a webpage classification method, terminal equipment and a storage medium. Background technique [0002] With the rapid popularization of Internet technology, Internet applications are also booming. High-quality, personalized content is constantly emerging, and more and more netizens can share rich network resources. But at the same time, some illegal and criminal activities are also hidden in it, and a large amount of false information, advertising information, Internet fraud and other illegal and illegal information are released on the Internet, which seriously endangers the property safety of the majority of netizens. How to discover and identify this kind of bad text information and purify the network space requires an efficient and intelligent web page analysis method. [0003] The content information structure of the web page is diverse, with pictures, texts, videos a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/958G06K9/62G06N3/04G06N3/08
CPCG06F16/958G06N3/049G06N3/08G06N3/045G06F18/241G06F18/214
Inventor 陈志明赵建强庄灿波刘晓芳曾鹏
Owner XIAMEN MEIYA PICO INFORMATION
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products