Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A crawler method for solving reverse crawling of fonts

A font and crawler technology, applied in the network field, can solve problems such as data errors, achieve high versatility, solve font anti-crawling, and strong persistence

Inactive Publication Date: 2019-01-18
SICHUAN CHANGHONG ELECTRIC CO LTD
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The most important thing is that if the data source website simply generates a custom font file, then by manually establishing a mapping between true and false characters, and then using it in the program to replace false characters, this method can directly solve font anti-climbing However, at present, some websites have adopted custom font generation that changes with the IP, and each IP custom font is regenerated multiple times a day, which is equivalent to the previously mentioned correspondence between A and B. It may be updated at any time and become A Corresponding to C, the establishment of a mapping relationship between A and B can only support the same IP to de-analyze fake data in a short period of time, and then cause data errors, which will be the most difficult problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A crawler method for solving reverse crawling of fonts

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0026] Such as figure 1 As shown, a crawler method to solve font anti-crawling includes the following steps:

[0027] 1. Obtain the custom metabase file of the data source website:

[0028] (1) The crawler grabs the source code of the web page:

[0029] ① Simulate browser crawling (you can wait for the dynamic data loading of the web page to complete);

[0030] ② Analyze the source code to determine the fields and capture values;

[0031] ③Determine which fields use custom fonts.

[0032] (2) Custom font file download:

[0033] ① Determine which element areas use custom fonts by checking the source code of the web page and the actual web page display, and find out the name of the custom font file. Custom fonts can be checked by checking whether these content elements use custom fonts such as font-family in CSS. Font file reference method;

[0034] ②Monitor the font file loading process during the webpage source data loading process, and find out the font file download UR...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a crawler method for solving the reverse crawling of fonts, which comprises the following steps: obtaining a self-defined graphic metadata database file of a data source website; marking a standard unique identification number on the graph metadata in the user-defined graph metadata database file; establishing the mapping relation table between the true character and the unique identification number of metadata standard; Establishing a mapping relationship table between pseudo words and a unique identification number of a graphic metadata standard; Establishing mappingtable between true and false characters; Inversely analizing. The invention has the advantages of high persistence, strong universality and the like in solving the reverse craweling of fonts, realizesthe flexible acquisition and update of font files, flexibly establishes the mapping relationship between true and false words, flexibly reversely analyzes false data, and guarantees the maximum dataaccuracy.

Description

technical field [0001] The invention relates to the field of network technology, in particular to a crawler method for solving font anti-crawling. Background technique [0002] The core problem to be solved in the crawler development process is to break through the anti-crawling technology of data sources (mainly websites). Common anti-crawling technologies include IP access restrictions, user login verification, dynamic loading of front-end data, etc. These anti-crawling technologies The technology has been around for a long time, and there are many current solutions. However, font anti-crawling, as a small anti-crawling technology before, has gradually become popular in large data source websites. This technology will cause the data obtained by crawlers to be lost. value, and there is no stable and reliable solution yet. [0003] Commonly used font files include files conforming to ttf, woff and other protocol specifications. Most of them are composed of a series of ASCII...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/953G06F17/21
CPCG06F40/109
Inventor 陈思言黄元稳漆尧
Owner SICHUAN CHANGHONG ELECTRIC CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products