Unlock instant, AI-driven research and patent intelligence for your innovation.

URL classification method and system

A classification method and classification system technology, applied in the field of information classification, can solve the problem of low accuracy and achieve the effect of high accuracy

Inactive Publication Date: 2018-07-27
SHANGHAI KANGFEI INFORMATION TECH CO LTD
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The technical problem to be solved by the present invention is that, in the prior art, the accuracy of classifying URLs is not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • URL classification method and system
  • URL classification method and system
  • URL classification method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] figure 1 It shows a flow chart of a method for classifying URLs provided by Embodiment 1 of the present invention, which is described in detail as follows in conjunction with the accompanying drawings:

[0049] In this embodiment, at first the URL to be classified is searched in the URL classification library, and when the URL to be classified cannot find the corresponding category in the URL classification library, the web page corresponding to the URL is analyzed to extract and express the content of the web page feature phrases, and perform lexical analysis on the feature phrases to obtain classification marks expressing user behavior, and classify according to the URL and the classification marks to update the URL classification library.

[0050] Step S101 , judging whether there is classification information of the URL to be classified in the preset URL classification library.

[0051] URL category information is set in the URL category library. Classification ma...

Embodiment 2

[0073] figure 2 It shows a flow chart of a method for classifying URLs provided by Embodiment 2 of the present invention, which is described in detail as follows in conjunction with the accompanying drawings:

[0074] Step S201, intercepting the character string of the URL to be classified.

[0075] The feature character string is a representative character string in the URL and can represent a type of URL. For example, the URL is "bbs.phicomm.com / article / title?s=123", and the corresponding feature string is: "phicomm.com / article". The present invention does not limit the specific feature string interception method. Generally speaking, the feature string includes at least the main part of the domain name and the fields in the upper-level directory.

[0076] Step S202, querying the URL classification library according to the feature string, to determine whether there is classification information of the URL to be classified in the URL classification library.

[0077] Step ...

Embodiment 3

[0084] image 3 A structural block diagram of a URL classification system provided by Embodiment 3 of the present invention is shown, and is described in detail as follows in conjunction with the accompanying drawings:

[0085] The URL classification system includes:

[0086] A judging module 31, configured to judge whether there is classification information of the URL to be classified in the preset URL classification library;

[0087] Feature phrase acquisition module 32, for when there is no classification information of the URL to be classified in the URL classification library, from the webpage corresponding to the URL to be classified, obtain the feature phrase that expresses the content of the webpage;

[0088] Classification mark generation module 33, is used for carrying out lexical analysis to described feature phrase, to generate the classification mark that expresses user's behavior;

[0089] The classification module 34 is configured to generate corresponding cl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a URL classification method and system, and relates to the technical field of information classification. The URL classification method comprises the steps of judging whether classification information of to-be-classified URLs exists in a preset URL classification library or not; when the classification information of the to-be-classified URLs does not exist in the URL classification library, obtaining feature word groups for expressing webpage contents from webpages corresponding to the to-be-classified URLs; performing vocabulary analysis on the feature word groups togenerate classification marks for expressing user behaviors; and according to the to-be-classified URLs and the classification marks corresponding to the to-be-classified URLs, generating the corresponding classification information, and performing recording in the URL classification library. All the URLs can be classified; and very high accuracy is achieved.

Description

technical field [0001] The invention relates to the technical field of information classification, in particular to a URL classification method and system. Background technique [0002] A Uniform Resource Locator (URL) is a concise representation of the location and access method of a resource that can be obtained from the Internet. It is the address of a standard resource on the Internet, also known as a web page address. [0003] Currently, analyzing URLs accessed by users is a technique used when tagging users. However, at present, this technology is generally embodied by analyzing the composition of the URL or obtaining the category of the URL through a clustering method. However, the composition of URLs is ever-changing, and it is impossible to classify URLs simply from the composition of URLs with high accuracy. If analyzed from the perspective of clustering, the number of training samples based on the current URL is limited, and there is a large deviation in the tra...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/9566
Inventor 黄世纬
Owner SHANGHAI KANGFEI INFORMATION TECH CO LTD