Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method for auditing the content of bad information buffered by CDN and CACHE

A bad information and content auditing technology, applied in website content management, using information identifiers to retrieve web data, network data query, etc., can solve the problems of inability to import through input methods, inability to actively extract URL link information, false positives and missed negatives advanced questions

Inactive Publication Date: 2019-03-08
CHENGDU SIWEI CENTURY TECH
View PDF5 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] It is necessary to import the list of domain names to be scanned according to a fixed format through the page, and URL link information cannot be actively extracted from massive log files
Because the number of links is too large, it cannot be imported by typing
[0008] (3) Erotic image recognition technology has high false positives
[0009] Use the skin color algorithm to identify pornographic images, identify the skin color ratio through the RGB value of the image, and identify abnormal actions and sensitive parts through modeling. The false positives and false negatives are high, which can no longer meet the current audit needs.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for auditing the content of bad information buffered by CDN and CACHE
  • A method for auditing the content of bad information buffered by CDN and CACHE
  • A method for auditing the content of bad information buffered by CDN and CACHE

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0057] , a method for auditing content of bad information cached by CDN and CACHE, see figure 1 , the method steps are as follows,

[0058] A. Obtain the access log file zip data package generated by CDN / cache through the data transmission interface, and the file parsing module extracts the log file;

[0059] B. The file parsing module analyzes the log file in the ZIP package, obtains field information such as text, picture URL link, domain name, time, IP address, etc., and carries out data formatting;

[0060] C. After the file parsing module deduplicates the URL in a centralized and unified manner, it generates the URL to be scanned and passes it to the webpage content capture module. The webpage capture module simulates the access URL and obtains the corresponding picture and text information;

[0061] D. The data processing module uses the intelligent image recognition model to identify the specific image of the picture, and uses the keyword matching and weight analysis t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for auditing the content of bad information buffered by CDN and CACHE. The method comprises the following steps of A acquiring a data packet through a data transmission interface; B carrying out the data format of a file analysis module; C using that file analysis module to generate URL to be scanned and transmit the URL to the web page content grabbing module; D using the data processing module to use the intelligent image recognition model to recognize the specific image of the picture, and complete the process of identifying and auditing the CDN / cache content. Compared with the prior art, the invention improves the service coverage range of the content audit, supports the bad information audit of the CDN / webcache service, and the crawler module can directly read the URL to be scanned through the file to carry out content crawling and obtain the web page content. By using the depth learning algorithm and simulating a human brain neural network, a model with high-level expressive force is constructed. Through the continuous training of big data and the frequent iteration, the precision can reach 99.5%.

Description

technical field [0001] The present invention relates to one, in particular to a CDN and a method for auditing bad information cached by CACHE. Background technique [0002] At present, the content security audit is mainly carried out for the websites hosted by the IDC computer room. Through the means of web crawlers, the websites are scanned layer by layer and the text and picture content of the web pages are obtained to identify pornography, violence, reactionary, gambling and other types. [0003] With the continuous development of the business of operators, CDN and webcache have emerged as new business forms in order to improve user access experience. Their common feature is that they cache some specific web page content, but they have no actual connection with each other and cannot be accessed by web crawlers. For associated crawling and sweeping, the existing technical means have the following problems: [0004] (1) Insufficient coverage of monitoring objects [0005]...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/958G06F16/953G06F16/955
Inventor 章林光
Owner CHENGDU SIWEI CENTURY TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products