Unlock instant, AI-driven research and patent intelligence for your innovation.

Method for acquiring internet subject information and device thereof

A subject information and collection method technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as slow speed, low analysis efficiency, complex DOM tree, etc., and achieve the effect of improving accuracy

Inactive Publication Date: 2010-05-05
SHENZHEN LONG VISION
View PDF0 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0014] The DOM tree is relatively complex, the analysis efficiency is relatively low, and the speed is slow; and there are many types of DOM trees, there are great differences and difficulties in obtaining correct theme information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for acquiring internet subject information and device thereof
  • Method for acquiring internet subject information and device thereof
  • Method for acquiring internet subject information and device thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 2

[0192] The device in the second embodiment also includes:

[0193] The information downloading module 14 is used for downloading the extensible markup language XML page, extracting the list information; and downloading the Uniform Resource Locator URL in the list information, and sending it to the source code obtaining module 10 for processing.

[0194] Specifically, if it is necessary to collect news topic information, the information download module 14 then downloads the XML page, and extracts the news list information therefrom; if collecting the log topic information, the information download module 14 then extracts the log list information from the downloaded XML page; and downloads all The Uniform Resource Locator URL in the above list information;

[0195] In a specific embodiment, the information download module 14 is used to execute step 200 and step 201 in the second method embodiment;

[0196] Thereafter, the source code obtaining module 10 obtains the HT...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for acquiring internet subject information and a device thereof, wherein the method comprises the steps of: acquiring a hyper text makeup language HTML source code of an internet webpage; dividing the HTML source code into different character strings by taking a div label as a mark label, and forming the different character strings into a character string table; and analyzing each character string in the character string table one by one, and when the number of the character outside an HTML label in some character string is larger than that of the character in the HTML label, and the number of the character outside the HTML label is larger than a set base number, taking the content included in the character strings as the subject information. The internet subject information acquiring method and the device thereof divides the HTML source code into a plurality of character strings with the div label and analyzes the character strings, thereby obtaining the subject information, being capable of processing webpage information of different webpage moulds on the internet, and improving the accuracy for acquiring the subject information.

Description

technical field [0001] The invention relates to a processing technology of Internet information, in particular to a method and device for collecting Internet subject information. Background technique [0002] Browsing webpage information on the Web, you will find that they usually contain two parts, one part reflects the theme information of the webpage, such as the news information part in a news webpage, we call it "theme" information; the other part is Navigation bars, advertisement information, copyright information, and questionnaires that have nothing to do with the subject content are called "noise" information. Noise information is usually distributed around the subject information, sometimes mixed in the subject content, but they have no content correlation. [0003] Noise information usually appears in the form of link navigation text (anchor text). Therefore, the noise information will cause interlinked webpages to often have no content relevance. In this way, t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 黎柯
Owner SHENZHEN LONG VISION