Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Webpage data analyzing and processing method

A web data, analysis and processing technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as useless information, high cost, and difficulty in obtaining accurate results, so as to achieve accuracy, improve efficiency, Ensuring the effectiveness of the compression effect

Active Publication Date: 2017-06-13
ZHANGZHOU COLLEGE OF SCI & TECH
View PDF5 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the past, information mining methods were generally carried out through information retrieval or mathematical statistics. For example, Baidu, Google and other search engines used by ordinary individual users can retrieve relevant content of entries, but most of them are useless information. It is difficult to obtain the desired accurate results under the condition of a large amount of data
And its in-depth mining and analysis functions are often oriented to large enterprises or institutions, which are extremely expensive compared to the majority of small and medium-sized enterprises or ordinary individual users

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage data analyzing and processing method
  • Webpage data analyzing and processing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] The technical solution of the present invention will be specifically described below in conjunction with the accompanying drawings.

[0043] A webpage data analysis and processing method of the present invention is implemented based on a webpage data service platform, the webpage data service platform includes a client, a content server and a word segmentation cloud server, and a webpage grabbing system, A content extraction system, a content analysis system and a database, the specific implementation steps of the method are as follows:

[0044] S1. Web crawling

[0045] The webpage crawling system acquires the crawling task, adds the URL to be crawled to the crawler queue, and crawls the webpage;

[0046] S2, content extraction

[0047] The content extraction system divides the web pages captured in step S1 based on reading habits to generate multiple blocks, the multiple blocks include theme blocks and noise blocks, remove the noise blocks, and extract the core text...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a webpage data analyzing and processing method which is implemented on the basis of a webpage data service platform. The webpage data service platform comprises a client, a content server and a word segmentation cloud server, and a webpage capturing system, a content extraction system, a content analyzing system and a database are installed on the content server. The method specifically includes the steps of S1, webpage capturing; S2, content extracting; S3, Chinese word segmentation; S4, content analyzing; S5, result displaying, namely the client calls a data result form the database and displays to users. By the adoption of the reading habit based webpage content extraction technology, subject contents of a webpage can be recognized rapidly and extracted, Chinese word segmentation is effectively performed by adopting the cloud segmentation technology, fundamental guarantee is provided for big-data analysis, investment of software and hardware resources by the users is not needed, and requirements on low cost and orientation of big-data analysis service of small and medium-sized enterprises and ordinary individual users can be met.

Description

technical field [0001] The invention relates to the technical field of Internet data mining and analysis, in particular to a method for analyzing and processing webpage data. Background technique [0002] At present, the Internet is flooded with a large amount of various information, and people are in an era of huge data and massive information. These large amounts of data require certain discovery methods in order to realize in-depth mining of meaningful information for enterprise or social development. [0003] In the past, information mining methods were generally carried out through information retrieval or mathematical statistics. For example, Baidu, Google and other search engines used by ordinary individual users can retrieve relevant content of entries, but most of them are useless information. It is difficult to obtain the desired accurate results under the condition of a large amount of data. However, its in-depth mining and analysis functions are often oriented ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/374G06F16/9535
Inventor 杨爱华陈林水
Owner ZHANGZHOU COLLEGE OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products