Website navigation bar information extraction method and device, electronic equipment and storage medium

A technology of information extraction and navigation bar, which is applied in the field of website navigation bar information extraction, can solve problems such as failure to extract, non-standard code writing, and difficult implementation of extracting navigation bar, and achieve the effect of improving accuracy and extraction efficiency

Pending Publication Date: 2020-09-04
深圳市小满科技有限公司
View PDF5 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In order to display corporate culture, products, introductions, contact information and other information on corporate official websites, links to key information are usually displayed at the top or left of the page in the form of a navigation bar. In order to accurately establish a content index for corporate official websites, it is necessary to Extract the navigation bar information, but it is difficult to extract the navigation bar due to the freedom of the HTML la

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Website navigation bar information extraction method and device, electronic equipment and storage medium
  • Website navigation bar information extraction method and device, electronic equipment and storage medium
  • Website navigation bar information extraction method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0064] Example one

[0065] figure 1 It is a flowchart of a method for extracting information from a navigation bar of a website provided in the first embodiment of the present invention.

[0066] In this embodiment, the method for extracting navigation bar information of a website can be applied to electronic devices. For electronic devices that need to extract navigation bar information of a website, the website provided by the method of the present invention can be directly integrated on the electronic device. The function of extracting information from the navigation bar, or running in the electronic device in the form of a Software Development Kit (SKD).

[0067] Such as figure 1 As shown, the method for extracting information from the navigation bar of a website specifically includes the following steps. According to different requirements, the order of the steps in the flowchart can be changed, and some can be omitted.

[0068] S11: Download the main page source code and any su...

Example Embodiment

[0125] Example two

[0126] figure 2 It is a structural diagram of a device for extracting information from a navigation bar of a website provided in the second embodiment of the present invention.

[0127] In some embodiments, the navigation bar information extraction device 20 of the website may include multiple functional modules composed of program code segments. The program code of each program segment in the navigation bar information extraction device 20 of the website may be stored in the memory of the electronic device, and executed by the at least one processor to execute (see figure 1 Description) Extract the navigation bar information of the website.

[0128] In this embodiment, the navigation bar information extraction device 20 of the website can be divided into multiple functional modules according to the functions it performs. The functional modules may include: an analysis module 201, a rejection module 202, an extraction module 203, a merging module 204, a dedupli...

Example Embodiment

[0186] Example three

[0187] Refer to image 3 As shown, it is a schematic structural diagram of an electronic device provided in Embodiment 3 of the present invention. In a preferred embodiment of the present invention, the electronic device 3 includes a memory 31, at least one processor 32, at least one communication bus 33, and a transceiver 34.

[0188] Those skilled in the art should understand that image 3 The illustrated structure of the electronic device does not constitute a limitation of the embodiment of the present invention. It may be a bus-type structure or a star structure. The electronic device 3 may also include more or less other hardware than shown in the figure. Or software, or different component arrangements.

[0189] In some embodiments, the electronic device 3 is an electronic device that can automatically perform numerical calculation and / or information processing in accordance with pre-set or stored instructions. Its hardware includes, but is not limite...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of text extraction. The invention provides a website navigation bar information extraction method and device, electronic equipment and a storage medium, and the method comprises the steps: downloading source codes of a main page and any subpage of a to-be-extracted enterprise website domain name, obtaining a first HTML code, analyzing the first HTML code into a first node DOM tree, and obtaining a second HTML code, and analyzing the second HTML code into a second node DOM tree; removing outer links of the first node DOM tree and the second node DOMtree to obtain a third node DOM tree and a fourth node DOM tree; extracting navigation bar information by using an NAV tag method, an A tag density method, a maximum public area method and a keyword link block method, then performing duplicate removal and filtering, calculating a node score of each node, and outputting the navigation bar information of the enterprise to be extracted. According tothe method, the navigation bar information is extracted through the NAV tag method, the A tag density method, the maximum public area method and the keyword link block method, so that the accuracy andthe efficiency of extracting the navigation bar information in the page are improved.

Description

technical field [0001] The invention relates to the technical field of text extraction, in particular to a method, device, electronic equipment and storage medium for extracting information from a navigation bar of a website. Background technique [0002] In order to display corporate culture, products, introductions, contact information and other information on corporate official websites, links to key information are usually displayed at the top or left of the page in the form of a navigation bar. In order to accurately establish a content index for corporate official websites, it is necessary to Extract the navigation bar information, but it is difficult to extract the navigation bar due to the freedom of the HTML language used to write the web page and the non-standard code writing. [0003] The existing technology uses the NAV tag method, but this method requires the page to use the HTML5 version and the developer strictly follows the development manual specification to...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/958G06F16/954
CPCG06F16/986G06F16/954
Inventor 祁俊辉
Owner 深圳市小满科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products