Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for capturing details of title with script function as link

A function and detail technology, applied in the field of JAVA platform, can solve the problem that details cannot be directly captured

Inactive Publication Date: 2018-10-19
ZHUHAI HENGQIN SHENGDA ZHAOYE TECH INVESTMENT CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The technical problem solved by the present invention is to provide a method for capturing the details of titles linked as script functions; the problem that the details of titles linked as script functions cannot be directly captured
This solves the problem that the details of the title linked to the script function cannot be directly captured

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for capturing details of title with script function as link

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0015] Such as figure 1 Shown, the present invention adopts following steps:

[0016] Step 1. Use the crawler tool to load the URL where the title is located, and obtain the content of the title tag that needs to be crawled on the current page; for example:

[0017] String INFO_URL = "http: / / www.***.com";

[0018] Document doc = JSoupUtil.getConn(INFO_URL).get();

[0019] / / All title element information in the titles tag

[0020] Elements elementsMenu = doc. select("#titles");

[0021] Elements elementsClick = elementsMenu.get(0).getElementsByAttribute("onclick");

[0022] Step 2. Traverse all tags to obtain the JS function in the onclick tag; for example:

[0023] for (Element e : elementsClick) {

[0024] String mainTitle = e. text();

[0025] / / Get the js function content

[0026] String jsClick = e.attr("onclick");

[0027]}

[0028] Step 3: Analyze the source code of the web page where the title is located, obtain the path of the file where the script function i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of a Java platform, in particular to a method for capturing details of a title with a script function as a link. The method of the invention includes the following steps: firstly a crawler tool is used to scan all the title contents to be captured, and the link script function in the title contents is obtained; then the source code of the page where thetitle is located is checked and analyzed to obtain the path of the file where the script function is located, and the path is loaded to the method; the ScriptEngine is used to execute the script function to obtain the path corresponding to the details of the link; and finally, the crawler tool is used to load and parse the path, and the required details are captured. The invention solves the problem that the details of the title with the script function as the link cannot be directly captured.

Description

technical field [0001] The invention relates to the technical field of the JAVA platform, in particular to a method for grabbing the details of titles linked as script functions. Background technique [0002] When crawling webpage intelligence information, it is often encountered that when jumping from the title to the details, the URL is not directly used to jump, but the JS function is used. There are only some parameter information required for the jump, and there is nothing like URL The same clearly points to the details page, which is powerless for the current crawler tools to crawl, because the current crawler tools are based on loading the URL to obtain the page content; in order to solve this problem, it is necessary to implement a method that can The function of parsing JS functions to change URLs. Contents of the invention [0003] The technical problem solved by the present invention is to provide a method for grabbing the details of titles linked as script fun...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 陈林张来卿庞严冬
Owner ZHUHAI HENGQIN SHENGDA ZHAOYE TECH INVESTMENT CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products