Automatic identification method for network literature directory type web pages
An automatic identification, catalog-type technology, applied in special data processing applications, instruments, electrical digital data processing, etc., to achieve good identification results
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0017] Such as figure 1 Shown is a flow chart of the method of the present invention. The method for automatically recognizing a web page of a network literature catalog provided by an embodiment of the present invention includes the following steps:
[0018] Step 1: Obtain the data body of the current webpage. The data body is the part between and in the HTML tags in the html source file.
[0019] Step 2: Extract all the character strings corresponding to the hyperlink tags containing the hyperlink addresses in the data body, and store the character strings corresponding to each of the hyperlink tags as an array element in a string array one in. The hyperlink is marked as an html tag , The hyperlink mark containing the hyperlink address is the hyperlink mark containing the "herf=" parameter ; The method of extracting all the character strings corresponding to the hyperlink tags containing the hyperlink address in the data body is: judging whether the data body contains " "To m...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 