Form acquisition method and device
A form and DOM tree technology, applied in the Internet field, can solve problems such as low recognition rate and unrecognized forms
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0088] The embodiment of the present invention provides a form acquisition method, please refer to figure 1 , which is a schematic flowchart of a form acquisition method provided by an embodiment of the present invention. As shown in the figure, the method includes the following steps:
[0089] S101. Acquire a DOM tree of a page accessed by a user.
[0090] S102. Determine boundary information of a form included in the page according to the nodes of the DOM tree.
[0091] S103. Using the boundary information, extract form information from the DOM tree as a candidate conversion form.
[0092] S104. Identify whether the candidate conversion form is a valid conversion form.
Embodiment 2
[0094] Based on the form acquisition method provided in the first embodiment above, the embodiment of the present invention specifically describes the method for acquiring the DOM tree of the page accessed by the user in S101. This step S101 may specifically include:
[0095] For example, in the embodiment of the present invention, the method for obtaining the DOM tree of the page accessed by the user may include but not limited to: first, obtain the Uniform Resource Locator (Uniform ResourceLocator, URL) of the page accessed by the user from the user access log. Then, according to the URL of the page accessed by the user, the page corresponding to the URL is accessed to obtain the DOM tree of the page accessed by the user.
[0096] In a specific implementation process, statistical tools may be used in advance to collect statistics on user visits to the website to generate user access logs, which may include all user visit records on the website. Wherein, each access record m...
Embodiment 3
[0100] Based on the form acquisition method provided in the first embodiment above, this embodiment of the present invention specifically describes the method of determining the boundary information of the form contained in the page according to the nodes of the DOM tree in S102. The step S102 may specifically include:
[0101] It can be understood that, currently, in the Hyper Text Mark-up Language (HTML) standard, form tags usually use form tags, and all sub-nodes under the nodes of the form tag in the DOM tree belong to form information. However, there are also many non-standardized pages. The DOM tree of these pages will use the div tag as the label of the form. Therefore, it is difficult to identify the candidate conversion form. This embodiment provides a method for locating the form in the DOM tree. way, which will be described in detail below.
[0102] In the embodiment of the present invention, after obtaining the DOM tree of the page accessed by the user, the bounda...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


