Web form identification method

An identification method and form technology, applied in the field of data parsing, can solve the problem of low success rate of HTML web pages, and achieve the effect of improving the success rate of parsing

Active Publication Date: 2017-10-31
BEIJING QIHOO TECH CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0015] The technical problem to be solved by this application is to provide a method and device for parsing markup files, so as to effectively solve the problem of low success rate when parsing HTML webpages in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web form identification method
  • Web form identification method
  • Web form identification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0073] In order to make the above objectives, features and advantages of the application more obvious and understandable, the application will be further described in detail below in conjunction with the drawings and specific implementations.

[0074] At present, the use of markup languages ​​to describe or store data has become the most important data presentation and storage method, such as HTML, HTML5, eXtensible HyperText Markup Language (XHTML), and Extensible Markup Language (Extensible Markup Language, XML) etc. One of the most important features of this type of markup language is that they use a set of markup tags to organize or store data. The marked files described in this application below refer to files that organize data with marked tags.

[0075] Reference figure 1 , Shows a schematic flow chart of a method for parsing a marked file in this application, which is specifically as follows:

[0076] Step 101: Obtain label objects in the markup file to generate a label set....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a webpage form recognizing method. The problem that the success rate on analysis of marker files is low in the prior art is solved. The webpage form recognizing method comprises the following steps of generating a label gather by acquiring label objects in the marker files; grouping the label objects according to public properties of the label objects in the label gather; acquiring one or a plurality of grouping labels from a grouping result; analyzing a mapping table according to the preset marker files; matching the properties of the label objects in one or a plurality of grouping labels; and acquiring data for analysis of the marker files from the matched grouping labels. The label objects are grouped according to the public properties of the label objects, so that the original disordered label objects in the marker files are in correlation, further matching analysis is facilitated, and the success rate on analysis of the markers files is effectively increased.

Description

[0001] The present invention patent application is a divisional application of a Chinese invention patent application with the filing date of March 30, 2012, the application number being 201210091311.4, and the name "A method and device for analyzing marked documents". Technical field [0002] This application relates to the technical field of data analysis, and in particular to a method and device for analyzing markup files. Background technique [0003] At present, Internet technology has deeply affected people's lives, such as e-mail, forums, and web games have become an indispensable part of people's daily work and entertainment. However, most of the above Internet applications require users to register and log in before they can be used, so users need to memorize a large number of user names and passwords. For account security, users usually need to set a more complicated combination of numbers, letters, and special symbols, which further increases the difficulty of rememberi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F8/38G06F16/95
Inventor 杭程李超万勇任寰
Owner BEIJING QIHOO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products