Crawler retrieval and big data intelligent recommendation optimization processing method based on open source framework

A big data and crawler technology, applied in the field of big data platform and resource acquisition, can solve the problems of bulky Java language, increase the difficulty of data crawling, poor asynchronous support, etc., achieve high flexibility and scalability, and improve network resources Acquisition capabilities and intelligent recommendation algorithm functions, saving labor and time costs

Inactive Publication Date: 2020-07-17
上海浩方信息技术有限公司
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] 1) Continuously invest a lot of human resources and time costs in order to obtain data resources;
[0004] 2) Due to the continuous growth of the acquired data, it is necessary to extract and analyze effective resources to invest too much labor cost;
[0008] PHP: Advantages: Low cost, strong expansion and implantability Disadvantages: For multi-threading, asynchronous support is not good enough concurrent processing
[0009] Java: Advantages: Perfect web crawler ecosystem Disadvantages: The Java language itself is cumbersome, with a large amount of code, and the cost of reconstruction is high
[0010] C / C++: Advantages: almost the strongest operating efficiency and performance Disadvantages: labor costs, low development efficiency and heavy post-maintenance
[0012] 1) Relying on inherent technology, the crawled resources are limited, and the impact of anti-crawlers will reduce the amount of resource acquisition
[0013] 2) Diversity, variability, and complexity of website structure increase the difficulty of data crawling
[0014] Crawled data is less effective, or contains irrelevant data such as website source code structure or advertisement images, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Crawler retrieval and big data intelligent recommendation optimization processing method based on open source framework

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] In order to describe the technical content of the present invention more clearly, further description will be given below in conjunction with specific embodiments.

[0037] The method for crawler retrieval and big data intelligent recommendation optimization processing based on an open source framework of the present invention comprises the following steps:

[0038] (1) Perform resource crawling through an open source framework to obtain the required target business resources;

[0039] According to the open source framework and distributed deployment, separate the target business resources that need to be crawled, deploy each application instance, and schedule and connect through the background timing task function;

[0040] (1.1) Configure relevant website source collection parameters and links;

[0041] (1.2) Perform timing tasks to trigger crawler collection;

[0042] (1.3) Establish a multi-thread resource preemption mechanism and add crawler collection tasks;

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a crawler retrieval and big data intelligent recommendation optimization processing method based on an open source framework. The method comprises the steps: carrying out theresource crawler through the open source framework, and obtaining a needed target service resource; performing word segmentation on the obtained target service resources according to an NPL word segmentation technology to realize information word segmentation matching; and performing information screening and recommendation according to preset keywords, fields and weight values. The crawler retrieval and big data intelligent recommendation optimization processing method based on the open source framework is adopted; the network resource collection capability and the intelligent recommendationalgorithm function of the target user are improved; a web crawler technology is realized by combining an open source HttpClient technology and a python algorithm packet, so that part of labor investment and time cost are greatly reduced or even directly saved, and crawler resource management has relatively high flexibility and expandability; and intelligent recommendation algorithm scheduling is executed for the target user, so that on-demand filtering is realized and effective information is screened out.

Description

technical field [0001] The invention relates to the field of big data platforms, in particular to the field of resource acquisition, and specifically refers to a method for crawler retrieval and big data intelligent recommendation optimization processing based on an open source framework. Background technique [0002] With the advent of the era of network big data and the rapid development of business data of various enterprises and companies, in order to deal with the ever-increasing huge data, for the research and analysis of big data, various units need to continuously invest a lot of manpower, material resources and time costs, which are specifically reflected in the following aspects point: [0003] 1) Continuously invest a lot of human resources and time costs in order to obtain data resources; [0004] 2) Due to the continuous growth of the acquired data, it is necessary to extract and analyze effective resources to invest too much labor cost; [0005] 3) Due to the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951G06F16/9535G06F16/33G06F16/335G06F40/289
CPCG06F16/951G06F16/9535G06F16/3344G06F16/337
Inventor 王璐朱广文张建民魏晓泉
Owner 上海浩方信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products