A method for dynamically calculating news gathering service resources

A technology for serving resources and news, applied in the field of data analysis, can solve the problems of missing important data, high collection cost, waste of computing, storage and network resources, etc., to ensure collection quality, ensure dynamic rationality, and increase collection cost Effect

Active Publication Date: 2019-02-12
GLOBAL TONE COMM TECH
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] In order to solve the problems that the existing data collection system statically sets the data collection frequency, which leads to incomplete data collection, omission of important data, or waste of computing, storage and network resources, resulting in high collection costs, etc., the present invention provides a dynamic computing news collection service resource According to the method, the method is based on historical data, extracts features from the data, and determines the data collection frequency of a specific website through dynamic analysis of a logistic regression model, and then dynamically determines the collection resources required for data collection of a specific website, Such as computing, storage, network resources, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0034] A method for dynamically calculating news collection service resources is performed according to the following steps:

[0035] 1) Select input data;

[0036] 2) Extract the input data features;

[0037] 3) Normalize each eigenvalue of the input data;

[0038] 4) Whether to increase the collection frequency is used as the classification mark, and the increased frequency is recorded as 1, and the non-increased frequency is recorded as 0;

[0039] 5) Combine the eigenvalues ​​of the input data with the corresponding classification labels to form a training data set;

[0040] 6) Randomly divide the data set into two categories, one is the training data set and the other is the test data set, of which the training data set accounts for 80% and the test data set accounts for 20%;

[0041] 7) Choose the logistic regression algorithm as the classification algorithm;

[0042] 8) Using the training data sets of each website as input, train the logistic regression algorithm re...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for dynamically calculating news collection service resources. The method is based on the news data collected in the past and the amount of collection resources invested for the collection data, extracts the feature of data, dynamically analyszes by a logistic regression model to determine the frequency of data collection for specific websites, then dynamically determines the collection resources required for data collection of specific websites, and then through the actual data collection and resource input as feedback information, constantly amends the parameters of the logistic regression model, and dynamically revises and optimizes the collection frequency. The method can dynamically adjust and optimize the acquisition frequency and the resource input in the acquisition process, effectively overcomes the problems of leakage mining and high acquisition cost, and greatly reduces the acquisition cost under the premise of ensuring the acquisition quality.

Description

technical field [0001] The invention belongs to the technical field of data analysis, and in particular relates to a method for dynamically calculating news collection service resources. Background technique [0002] News websites update data frequently every day, and there are a large number of sites. For enterprises engaged in website data mining and analysis, a large number of server / bandwidth / IP resources are required to collect data resources of news websites. The use of each type of resource will There are substantial costs involved. If the collection frequency of news websites is too low, it is easy to miss the collection of news; if the collection frequency is high, the cost of server / bandwidth is high. [0003] Existing collection systems generally collect website data resources at a single frequency. Some excellent collection systems use hierarchical management to simply classify websites, and collect data resources at a fixed frequency for each category. With th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/35
Inventor 詹咏松程国艮
Owner GLOBAL TONE COMM TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products